Site Reliability Engineer II

New
Remote - Argentina; Remote - Colombia ; Remote - Costa Rica ; Remote - MexicoFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
2–4 years
Required Skills
DockerPythonBashKubernetesGoGrafanaPrometheusLinuxTerraformAnsible

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience.
  • 2–4 years of experience in site reliability, systems engineering, or operations.
  • Solid Linux systems administration and troubleshooting skills.
  • Proficiency in at least one scripting language (Python, Bash, or Go).
  • Familiarity with container technologies like Kubernetes and Docker.
  • Understanding of microservices concepts.
  • Experience with monitoring, alerting, and incident response frameworks.
  • Exposure to large-scale, production-grade systems.

Responsibilities

  • Support the availability and durability of critical services across production environments.
  • Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when at risk.
  • Participate in on-call rotations, incident response, and post-incident reviews.
  • Develop automation to reduce manual intervention and operational toil.
  • Contribute to monitoring, logging, and alerting frameworks like Prometheus and Grafana.
  • Partner with engineering and operations teams to support resilient system design.
  • Assist in capacity planning and disaster recovery exercises.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now