Site Reliability Engineer

New

Brazil (Remote) / Argentina (Remote) / Colombia (Remote) / Ecuador (Remote) / Mexico (Remote) / Paraguay (Remote) / Peru (Remote)ContractMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: English (C1 or C2)
Required Skills: PythonBashKubernetesAzureGrafanaPrometheusTerraformDatadog

Requirements

Must be based in Latin America
English level - C1 or C2
Proven experience as a Site Reliability Engineer or similar role
Proficiency in logging, metrics, and tracing frameworks (DataDog, Loki, Prometheus, OpenTelemetry)
Experience with cloud platforms (Azure preferred) and infrastructure-as-code tools (e.g., Terraform)
Strong programming and scripting skills (Python, Bash)
Proficiency in containerization technologies and orchestration tools (Docker, Kubernetes)
Understanding of Linux-based systems, networking, and security principles related to containerized applications
Strong problem-solving and troubleshooting skills
Excellent communication and collaboration abilities

Responsibilities

Design, implement, and maintain monitoring and observability solutions using tools like Prometheus, Grafana Stack (Loki/Grafana/Tempo/Alert Manager), Datadog, and OpenTelemetry.
Define and implement SLOs, SLIs, and error budgets to measure system reliability.
Develop and optimize dashboards, alerts, and reports for system performance and business metrics.
Design actionable alerting strategies to minimize noise and improve MTTR.
Integrate alerting systems with Jira.
Establish and refine runbooks for on-call teams to handle alerts efficiently.
Analyze system performance metrics and implement optimizations for scalability.
Develop tools to streamline operational processes such as fail-over and configuration management.

View Full Description & ApplyYou'll be redirected to the employer's site

Similar Jobs

Senior Platform/Site Reliability Engineer

We work with developers from 75+ countries across Europe, Latin America, North America (the U.S. and Canada), selected countries in Asia (Japan, Singapore, South Korea, the Philippines, Indonesia, Malaysia, Vietnam, Thailand, and Israel), Oceania (Australia, New Zealand, and Papua New Guinea), and Africa (including Morocco and South Africa).Contract

View Job

Senior Site Reliability Engineer - Observability

Grupo QuintoAndar

We work from home and can live anywhere in BrazilFull-Time

View Job

Senior Site Reliability Engineer

Wikimedia Foundation

Please note that we are currently able to hire in the following: US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming (*US Territory or Federal District) Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-Time

116,633 - 181,243 USD per year

View Job