Staff Site Reliability Engineer

New
D
Domino Data LabData Science AI
Remote ArgentinaFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
PythonKubernetesGoLinux

Requirements

  • Deep experience in SRE, platform engineering, or a software engineering role with hands-on operational ownership
  • Fluency with Kubernetes, Linux, cloud platforms, and observability tooling
  • Strong software engineering skills in Python or Go
  • Track record of building internal tools or services
  • Comfort leading technically ambiguous work and influencing direction across teams
  • Experience improving reliability through engineering and automation
  • Strong communication skills
  • Experience mentoring engineers or shaping technical decision-making
  • Sound judgment about AI/LLM tooling in operational workflows

Responsibilities

  • Lead the development of Domino's internal AI-assisted reliability tooling
  • Improve the observability coverage and signal quality for critical systems
  • Own incident response end-to-end, from detection to remediation
  • Guide the development of customer and user-facing observability tools
  • Define and mature SLO/SLI frameworks for priority services
  • Scale cloud operations practices for single-tenant SaaS offering
  • Mentor other engineers and shape SRE practices at Domino
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now