This job is about joining a dynamic Site Reliability Engineering (SRE) team that focuses on managing infrastructure systems, including Storage, Computing, and Databases. The team is dedicated to ensuring reliability, efficiency, and compliance while fostering a culture of diversity and intellectual curiosity. Collaboration and mentorship are key components of the work environment, allowing engineers to thrive and grow in their careers.
You'll be responsible for
π§
Ensuring reliability
Ensuring the reliability and efficiency of our core infrastructure, focusing on system capacity and stability; setting up reliability standards and recovery SOP.π οΈ
Troubleshooting technical issues
Troubleshooting and locating technical issues, bottleneck analysis, managing system high availability architecture transformation and upgrading.βοΈ
Building automated solutions
Building automated operation solutions for large-scale systems; partnering with system development teams for system iteration.