Site Reliability Engineer II
New
BrazilFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 2โ4 years
- Required Skills
- AWSDockerPythonBashJenkinsKubernetesGoLinuxTerraformAnsible
Requirements
- 2โ4 years of experience in Site Reliability Engineering, systems engineering, DevOps, or production operations.
- Strong Linux systems administration and troubleshooting skills in production environments.
- Solid understanding of reliability engineering principles, including monitoring, alerting, incident response, and root cause analysis.
- Proficiency in at least one scripting language such as Python, Bash, or Go.
- Experience with containers and orchestration technologies such as Docker and Kubernetes.
- Familiarity with CI/CD tools and infrastructure automation (e.g., Terraform, Ansible, Jenkins).
- Understanding of distributed systems and microservices architectures.
- Experience working with cloud platforms such as AWS, GCP, or Azure.
- Strong problem-solving skills and ownership mindset.
- Excellent collaboration and communication skills.
Responsibilities
- Ensure the availability, reliability, and durability of critical production services across distributed environments.
- Monitor system health using SLIs, SLOs, and error budgets, proactively identifying risks to service performance.
- Participate in on-call rotations, incident response, root cause analysis, and post-incident reviews.
- Build automation to reduce operational toil and improve efficiency of recurring infrastructure and support tasks.
- Develop and maintain observability systems, including monitoring, logging, and alerting frameworks.
- Work with CI/CD pipelines, infrastructure-as-code tools, and configuration management systems.
- Write scripts and tooling to improve system operations and reliability.
- Collaborate with engineering and operations teams to design and maintain resilient systems.
- Contribute to capacity planning, disaster recovery planning, and vendor/SLA management.
- Document systems, create runbooks, and foster a reliability-first engineering culture.
View Full Description & ApplyYou'll be redirected to the employer's site