Senior Site Reliability Developer
New
United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- KubernetesCI/CDDistributed Systems
Requirements
- Proven experience operating large-scale distributed systems
- Strong background in SRE practices and production operations
- Hands-on experience with Kubernetes-based services
- Experience building and maintaining cloud-native infrastructure
- Proficiency with automation, CI/CD pipelines, and infrastructure as code
- Experience implementing and managing observability tools (metrics, logging, tracing, alerting)
- Strong problem-solving skills for incident response and root-cause analysis
Responsibilities
- Operate and improve large-scale distributed systems powering Clinical AI Assistant services
- Build automation that improves reliability, scalability, and operational efficiency
- Improve observability across metrics, logging, tracing, and alerting
- Participate in production operations, incident response, and root-cause analysis
- Help build self-healing infrastructure and operational tooling
- Support Kubernetes-based services and cloud-native infrastructure
- Partner with software engineers to improve reliability before systems reach production
- Contribute to CI/CD pipelines, infrastructure as code, and platform engineering standards
- Learn and apply modern SRE practices for AI-powered healthcare systems
View Full Description & ApplyYou'll be redirected to the employer's site