Senior Site Reliability Developer

New
United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
KubernetesCI/CDDistributed Systems

Requirements

  • Proven experience operating large-scale distributed systems
  • Strong background in SRE practices and production operations
  • Hands-on experience with Kubernetes-based services
  • Experience building and maintaining cloud-native infrastructure
  • Proficiency with automation, CI/CD pipelines, and infrastructure as code
  • Experience implementing and managing observability tools (metrics, logging, tracing, alerting)
  • Strong problem-solving skills for incident response and root-cause analysis

Responsibilities

  • Operate and improve large-scale distributed systems powering Clinical AI Assistant services
  • Build automation that improves reliability, scalability, and operational efficiency
  • Improve observability across metrics, logging, tracing, and alerting
  • Participate in production operations, incident response, and root-cause analysis
  • Help build self-healing infrastructure and operational tooling
  • Support Kubernetes-based services and cloud-native infrastructure
  • Partner with software engineers to improve reliability before systems reach production
  • Contribute to CI/CD pipelines, infrastructure as code, and platform engineering standards
  • Learn and apply modern SRE practices for AI-powered healthcare systems
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now