Senior Site Reliability Engineer
New
India, EST working hoursFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- AWSDockerPythonBashKubernetesCI/CDTerraformDatadog
Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or cloud infrastructure roles.
- Strong hands-on experience with AWS and container orchestration platforms such as Kubernetes (EKS/GKE).
- Solid expertise in Infrastructure as Code using Terraform.
- Proficiency in CI/CD pipeline management using tools like GitHub Actions, CircleCI, or similar.
- Strong programming/scripting skills in at least one language such as Python or Bash.
- Experience with Docker and containerized application deployments.
- Knowledge of monitoring and observability tools such as Datadog, Sentry, or OpenSearch.
- Hands-on experience with authentication and authorization systems in distributed environments.
- Familiarity with incident management, on-call operations, and production support best practices.
- Strong problem-solving skills, ownership mindset, and ability to work independently with minimal supervision.
- Experience working in globally distributed teams and supporting EST working hours.
Responsibilities
- Design, build, and maintain scalable and reliable cloud infrastructure supporting production applications and internal platforms.
- Manage deployment pipelines and CI/CD workflows using tools such as GitHub Actions, CircleCI, or Argo Workflows.
- Implement infrastructure as code practices using Terraform to ensure consistency, scalability, and automation.
- Operate and optimize containerized environments using Docker and Kubernetes (EKS/GKE or similar).
- Develop and maintain internal DevOps tools that improve deployment speed, reliability, and operational efficiency.
- Establish and enhance monitoring, logging, and alerting systems using tools like Datadog, OpenSearch, or Sentry.
- Participate in on-call rotations, incident response, post-mortems, and root cause analysis to ensure system reliability.
- Collaborate with development teams to improve system design, deployment strategies, and infrastructure architecture.
- Manage authentication, authorization, and secure gateway solutions across platforms.
- Continuously optimize cloud environments, including automation, performance tuning, and cost efficiency.
View Full Description & ApplyYou'll be redirected to the employer's site