Senior Site Reliability Engineer

New
India, EST working hoursFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
AWSDockerPythonBashKubernetesCI/CDTerraformDatadog

Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles.
  • Strong hands-on experience with AWS and container orchestration platforms such as Kubernetes (EKS/GKE).
  • Solid expertise in Infrastructure as Code using Terraform.
  • Proficiency in CI/CD pipeline management using tools like GitHub Actions, CircleCI, or similar.
  • Strong programming/scripting skills in at least one language such as Python or Bash.
  • Experience with Docker and containerized application deployments.
  • Knowledge of monitoring and observability tools such as Datadog, Sentry, or OpenSearch.
  • Hands-on experience with authentication and authorization systems in distributed environments.
  • Familiarity with incident management, on-call operations, and production support best practices.
  • Strong problem-solving skills, ownership mindset, and ability to work independently with minimal supervision.
  • Experience working in globally distributed teams and supporting EST working hours.

Responsibilities

  • Design, build, and maintain scalable and reliable cloud infrastructure supporting production applications and internal platforms.
  • Manage deployment pipelines and CI/CD workflows using tools such as GitHub Actions, CircleCI, or Argo Workflows.
  • Implement infrastructure as code practices using Terraform to ensure consistency, scalability, and automation.
  • Operate and optimize containerized environments using Docker and Kubernetes (EKS/GKE or similar).
  • Develop and maintain internal DevOps tools that improve deployment speed, reliability, and operational efficiency.
  • Establish and enhance monitoring, logging, and alerting systems using tools like Datadog, OpenSearch, or Sentry.
  • Participate in on-call rotations, incident response, post-mortems, and root cause analysis to ensure system reliability.
  • Collaborate with development teams to improve system design, deployment strategies, and infrastructure architecture.
  • Manage authentication, authorization, and secure gateway solutions across platforms.
  • Continuously optimize cloud environments, including automation, performance tuning, and cost efficiency.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now