Senior Site Reliability Engineer

New

Remote-first work environment within the United StatesFull-TimeSenior

Salary113,300 - 205,520 USD per year

Apply NowOpens the employer's application page

Job Details

5+ years of experience in SRE, software engineering, or production operations
Strong hands-on expertise in troubleshooting distributed systems using observability tools
Experience operating production workloads on AWS (EC2, S3, EKS, RDS/Aurora, CloudFront)
Proficiency in infrastructure-as-code using Python, Go, or Java
Experience with observability platforms such as Grafana or Prometheus
Understanding of CI/CD pipelines and software delivery practices
Experience using AI tools and agentic development environments
Ability to design, document, and communicate complex technical systems
Strong analytical and problem-solving skills

Define and manage service-level objectives (SLOs), error budgets, and reliability metrics
Investigate and resolve complex production incidents across application, infrastructure, data, and network layers
Design and implement automation, tooling, and AI-enabled workflows to eliminate operational toil
Build and maintain infrastructure-as-code and CI/CD pipelines
Develop technical documentation including runbooks and postmortems
Collaborate cross-functionally to improve system reliability and technical roadmaps
Support the safe use of AI agents in production environments

View Full Description & ApplyYou'll be redirected to the employer's site