Platform Engineer (Site Reliability Engineering)

New

BrazilFull-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: English
Required Skills: PythonJavaKubernetesCI/CDDevOps

Requirements

Proven experience in Site Reliability Engineering, Platform Engineering, DevOps, or similar infrastructure-focused roles.
Hands-on experience with Kubernetes, including deployment, debugging, and production troubleshooting.
Strong understanding of CI/CD pipelines and modern DevOps practices.
Software development experience in any modern language (Python or Java strongly preferred).
Strong automation mindset with a focus on reducing repetitive operational work through tooling.
Experience with observability tools, monitoring systems, and alerting frameworks.
Familiarity with AI/LLM-based workflows or agentic automation is highly desirable.
Ability to manage high-severity incidents and communicate clearly with technical and non-technical stakeholders.
Strong written and verbal communication skills in English.
Self-driven, proactive mindset with the ability to operate independently in ambiguous situations.

Responsibilities

Own and drive end-to-end incident management processes, ensuring rapid response, clear communication, and effective resolution during production incidents.
Lead on-call operations, including incident triage, escalation, coordination, and stakeholder communication across severity levels.
Design and implement automation to improve postmortem workflows, including tracking action items, ownership, and remediation follow-ups.
Build tooling and AI-assisted workflows to reduce operational toil and accelerate incident detection, response, and resolution.
Improve observability systems, including dashboards, alerting strategies, and monitoring systems across distributed systems.
Conduct post-incident analysis to identify root causes and implement long-term reliability improvements.
Collaborate with engineering teams to define preventive measures, improve runbooks, and reduce recurring incidents.
Support change and deployment processes with a focus on risk mitigation and system stability.

View Full Description & ApplyYou'll be redirected to the employer's site

Similar Jobs

Platform Engineer (Site Reliability Engineering)

Bitso

Latin AmericaFull-Time

View Job

Senior Site Reliability Engineer - Wikimedia Enterprise

US States: Arizona, California, Colorado, Connecticut, District of Columbia*, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Jersey, New Mexico, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, Puerto Rico*, Rhode Island, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming. Countries: Brazil, Canada, Colombia, Germany, Ghana, India, Indonesia, Italy, Kenya*, Mexico, Morocco, Netherlands, Poland, Singapore*, South Africa, Spain, Switzerland and the United Kingdom.Full-Time

116,633 - 181,243 USD per year

View Job

Senior Site Reliability Engineer

113,082 - 175,725 USD per year

View Job