Site Reliability Engineer II

New
BrazilFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
2โ€“4 years
Required Skills
AWSDockerPythonBashJenkinsKubernetesGoLinuxTerraformAnsible

Requirements

  • 2โ€“4 years of experience in Site Reliability Engineering, systems engineering, DevOps, or production operations.
  • Strong Linux systems administration and troubleshooting skills in production environments.
  • Solid understanding of reliability engineering principles, including monitoring, alerting, incident response, and root cause analysis.
  • Proficiency in at least one scripting language such as Python, Bash, or Go.
  • Experience with containers and orchestration technologies such as Docker and Kubernetes.
  • Familiarity with CI/CD tools and infrastructure automation (e.g., Terraform, Ansible, Jenkins).
  • Understanding of distributed systems and microservices architectures.
  • Experience working with cloud platforms such as AWS, GCP, or Azure.
  • Strong problem-solving skills and ownership mindset.
  • Excellent collaboration and communication skills.

Responsibilities

  • Ensure the availability, reliability, and durability of critical production services across distributed environments.
  • Monitor system health using SLIs, SLOs, and error budgets, proactively identifying risks to service performance.
  • Participate in on-call rotations, incident response, root cause analysis, and post-incident reviews.
  • Build automation to reduce operational toil and improve efficiency of recurring infrastructure and support tasks.
  • Develop and maintain observability systems, including monitoring, logging, and alerting frameworks.
  • Work with CI/CD pipelines, infrastructure-as-code tools, and configuration management systems.
  • Write scripts and tooling to improve system operations and reliability.
  • Collaborate with engineering and operations teams to design and maintain resilient systems.
  • Contribute to capacity planning, disaster recovery planning, and vendor/SLA management.
  • Document systems, create runbooks, and foster a reliability-first engineering culture.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now