This job is about joining a dynamic Site Reliability Engineering (SRE) team that focuses on managing infrastructure systems, including Storage, Computing, and Databases. The team is dedicated to ensuring reliability, efficiency, and compliance while fostering a culture of diversity and intellectual curiosity. Collaboration and mentorship are key components of the work environment, allowing engineers to thrive and grow in their careers.
You'll be responsible for
๐ง
Ensuring reliability
Ensuring the reliability and efficiency of our core infrastructure, focusing on system capacity and stability; setting up reliability standards and recovery SOP.๐ ๏ธ
Troubleshooting technical issues
Troubleshooting and locating technical issues, bottleneck analysis, managing system high availability architecture transformation and upgrading.โ๏ธ
Building automated solutions
Building automated operation solutions for large-scale systems; partnering with system development teams for system iteration.Skills you'll need
๐ป
Knowledge of computer software
Solid basic knowledge of computer software is essential for understanding and managing infrastructure systems.๐ง
Understanding of Linux
A solid understanding of the Linux operating system, storage, network IO, and related principles is crucial for success in this job.๐
Familiarity with programming languages
Familiarity with one or more programming languages, such as Python, Go, and Java, is important for developing and maintaining systems.View more