Site Reliability Engineering (SRE) at TikTok is an exciting opportunity to blend software and systems engineering while building and running large-scale, massively distributed, and fault-tolerant systems. This job plays a crucial role in managing complex challenges of scale, leveraging expertise in coding, algorithms, and system design. The team thrives on a culture of diversity, intellectual curiosity, and collaboration, all while promoting self-direction and problem-solving.
You'll be responsible for
โ๏ธ
Developing and maintaining automation procedures
Maximizing system efficiency and minimizing human intervention through automation.๐ค
Working closely with software engineering teams
Designing, deploying, and operating elements to ensure that systems are functionally robust.๐
Implementing monitoring tools
Setting up metrics to keep track of system health and performance.Skills you'll need
๐ป
High-level programming languages
Proficient knowledge of languages such as Python, Go, Java, and Shell script.โ๏ธ
Network architecture and cloud systems
Experience in network architecture, database modeling, and large-scale distributed systems.๐ง
Linux operating systems
Strong understanding of Linux operating systems and open-source technologies.View more