Staff Site Reliability Engineer
New
D
Domino Data LabData Science AI
Remote ArgentinaFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonKubernetesGoLinux
Requirements
- Deep experience in SRE, platform engineering, or a software engineering role with hands-on operational ownership
- Fluency with Kubernetes, Linux, cloud platforms, and observability tooling
- Strong software engineering skills in Python or Go
- Track record of building internal tools or services
- Comfort leading technically ambiguous work and influencing direction across teams
- Experience improving reliability through engineering and automation
- Strong communication skills
- Experience mentoring engineers or shaping technical decision-making
- Sound judgment about AI/LLM tooling in operational workflows
Responsibilities
- Lead the development of Domino's internal AI-assisted reliability tooling
- Improve the observability coverage and signal quality for critical systems
- Own incident response end-to-end, from detection to remediation
- Guide the development of customer and user-facing observability tools
- Define and mature SLO/SLI frameworks for priority services
- Scale cloud operations practices for single-tenant SaaS offering
- Mentor other engineers and shape SRE practices at Domino
View Full Description & ApplyYou'll be redirected to the employer's site