Senior Site Reliability Engineer
New
Remote-first work environment within the United StatesFull-TimeSenior
Salary113,300 - 205,520 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- AWSPythonJavaGoGrafanaPrometheusCI/CD
Requirements
- 5+ years of experience in SRE, software engineering, or production operations
- Strong hands-on expertise in troubleshooting distributed systems using observability tools
- Experience operating production workloads on AWS (EC2, S3, EKS, RDS/Aurora, CloudFront)
- Proficiency in infrastructure-as-code using Python, Go, or Java
- Experience with observability platforms such as Grafana or Prometheus
- Understanding of CI/CD pipelines and software delivery practices
- Experience using AI tools and agentic development environments
- Ability to design, document, and communicate complex technical systems
- Strong analytical and problem-solving skills
Responsibilities
- Define and manage service-level objectives (SLOs), error budgets, and reliability metrics
- Investigate and resolve complex production incidents across application, infrastructure, data, and network layers
- Design and implement automation, tooling, and AI-enabled workflows to eliminate operational toil
- Build and maintain infrastructure-as-code and CI/CD pipelines
- Develop technical documentation including runbooks and postmortems
- Collaborate cross-functionally to improve system reliability and technical roadmaps
- Support the safe use of AI agents in production environments
View Full Description & ApplyYou'll be redirected to the employer's site