Senior Site Reliability Engineer

New
UKFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
DockerPythonKubernetesC++GoTerraformAnsibleHelm

Requirements

  • Strong professional experience as a Site Reliability Engineer, DevOps Engineer, or Infrastructure Engineer in cloud-native environments.
  • Solid programming skills in languages such as Go, Python, C++, or similar technologies.
  • Strong understanding of algorithms, data structures, operating systems, and distributed computing principles.
  • Deep hands-on expertise with Unix/Linux systems administration and network technologies.
  • Proven experience with containerization and orchestration tools including Docker, Kubernetes, and Helm.
  • Experience with configuration management and infrastructure automation tools such as Terraform, Ansible, or Salt.
  • Familiarity with CI/CD processes, automation frameworks, and scalable cloud infrastructure operations.
  • Strong troubleshooting, analytical, and problem-solving capabilities in high-performance production environments.
  • Excellent collaboration and communication skills within distributed engineering teams.

Responsibilities

  • Ensure high availability, scalability, fault tolerance, and uninterrupted operation of critical cloud infrastructure and services.
  • Design, implement, and improve CI/CD pipelines and automation workflows to enhance deployment efficiency and system reliability.
  • Manage and optimize containerized environments and orchestration systems using Kubernetes, Docker, Helm, and related technologies.
  • Build and maintain infrastructure-as-code solutions using tools such as Terraform, Ansible, or Salt.
  • Monitor system health, troubleshoot production incidents, and proactively improve performance, observability, and resilience.
  • Collaborate with cross-functional engineering teams to solve complex infrastructure and backend reliability challenges.
  • Contribute to the design and operation of high-load distributed systems supporting AI and machine learning workloads.
  • Continuously evaluate and implement modern cloud technologies and operational best practices.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now