Senior Site Reliability Engineer
New
UKFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- DockerPythonKubernetesC++GoTerraformAnsibleHelm
Requirements
- Strong professional experience as a Site Reliability Engineer, DevOps Engineer, or Infrastructure Engineer in cloud-native environments.
- Solid programming skills in languages such as Go, Python, C++, or similar technologies.
- Strong understanding of algorithms, data structures, operating systems, and distributed computing principles.
- Deep hands-on expertise with Unix/Linux systems administration and network technologies.
- Proven experience with containerization and orchestration tools including Docker, Kubernetes, and Helm.
- Experience with configuration management and infrastructure automation tools such as Terraform, Ansible, or Salt.
- Familiarity with CI/CD processes, automation frameworks, and scalable cloud infrastructure operations.
- Strong troubleshooting, analytical, and problem-solving capabilities in high-performance production environments.
- Excellent collaboration and communication skills within distributed engineering teams.
Responsibilities
- Ensure high availability, scalability, fault tolerance, and uninterrupted operation of critical cloud infrastructure and services.
- Design, implement, and improve CI/CD pipelines and automation workflows to enhance deployment efficiency and system reliability.
- Manage and optimize containerized environments and orchestration systems using Kubernetes, Docker, Helm, and related technologies.
- Build and maintain infrastructure-as-code solutions using tools such as Terraform, Ansible, or Salt.
- Monitor system health, troubleshoot production incidents, and proactively improve performance, observability, and resilience.
- Collaborate with cross-functional engineering teams to solve complex infrastructure and backend reliability challenges.
- Contribute to the design and operation of high-load distributed systems supporting AI and machine learning workloads.
- Continuously evaluate and implement modern cloud technologies and operational best practices.
View Full Description & ApplyYou'll be redirected to the employer's site