Site Reliability Engineer II
New
B
Backblaze External WebsiteCloud Storage
Remote - Argentina; Remote - Colombia ; Remote - Costa Rica ; Remote - MexicoFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 2–4 years
- Required Skills
- DockerPythonBashKubernetesGoGrafanaPrometheusLinuxTerraformAnsible
Requirements
- Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience.
- 2–4 years of experience in site reliability, systems engineering, or operations.
- Solid Linux systems administration and troubleshooting skills.
- Proficiency in at least one scripting language (Python, Bash, or Go).
- Familiarity with container technologies like Kubernetes and Docker.
- Understanding of microservices concepts.
- Experience with monitoring, alerting, and incident response frameworks.
- Exposure to large-scale, production-grade systems.
Responsibilities
- Support the availability and durability of critical services across production environments.
- Monitor service health using SLIs, SLOs, and error budgets, and escalate issues when at risk.
- Participate in on-call rotations, incident response, and post-incident reviews.
- Develop automation to reduce manual intervention and operational toil.
- Contribute to monitoring, logging, and alerting frameworks like Prometheus and Grafana.
- Partner with engineering and operations teams to support resilient system design.
- Assist in capacity planning and disaster recovery exercises.
View Full Description & ApplyYou'll be redirected to the employer's site