Senior Site Reliability Engineer

New
Remote-first work environment within the United StatesFull-TimeSenior
Salary113,300 - 205,520 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
AWSPythonJavaGoGrafanaPrometheusCI/CD

Requirements

  • 5+ years of experience in SRE, software engineering, or production operations
  • Strong hands-on expertise in troubleshooting distributed systems using observability tools
  • Experience operating production workloads on AWS (EC2, S3, EKS, RDS/Aurora, CloudFront)
  • Proficiency in infrastructure-as-code using Python, Go, or Java
  • Experience with observability platforms such as Grafana or Prometheus
  • Understanding of CI/CD pipelines and software delivery practices
  • Experience using AI tools and agentic development environments
  • Ability to design, document, and communicate complex technical systems
  • Strong analytical and problem-solving skills

Responsibilities

  • Define and manage service-level objectives (SLOs), error budgets, and reliability metrics
  • Investigate and resolve complex production incidents across application, infrastructure, data, and network layers
  • Design and implement automation, tooling, and AI-enabled workflows to eliminate operational toil
  • Build and maintain infrastructure-as-code and CI/CD pipelines
  • Develop technical documentation including runbooks and postmortems
  • Collaborate cross-functionally to improve system reliability and technical roadmaps
  • Support the safe use of AI agents in production environments
View Full Description & ApplyYou'll be redirected to the employer's site
113,300 - 205,520 USD per year
Apply Now