Site Reliability Engineer II

New

Remote - Argentina; Remote - Colombia ; Remote - Costa Rica ; Remote - MexicoFull-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 2–4 years
Required Skills: DockerPythonBashKubernetesGoGrafanaPrometheusLinuxTerraformAnsible

Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience.
2–4 years of experience in site reliability, systems engineering, or operations.
Solid Linux systems administration and troubleshooting skills.
Proficiency in at least one scripting language (Python, Bash, or Go).
Familiarity with container technologies like Kubernetes and Docker.
Understanding of microservices concepts.
Experience with monitoring, alerting, and incident response frameworks.
Exposure to large-scale, production-grade systems.

Support the availability and durability of critical services across production environments.
Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when at risk.
Participate in on-call rotations, incident response, and post-incident reviews.
Develop automation to reduce manual intervention and operational toil.
Contribute to monitoring, logging, and alerting frameworks like Prometheus and Grafana.
Partner with engineering and operations teams to support resilient system design.
Assist in capacity planning and disaster recovery exercises.

View Full Description & ApplyYou'll be redirected to the employer's site