Sr. Site Reliability Engineer - SRE
New
SpainFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSPythonBashKubernetesGoCI/CDTerraformDatadog
Requirements
- Demonstrated experience operating and improving production systems at scale.
- Strong troubleshooting skills with a methodical approach to incident response.
- Experience defining and using SLIs, SLOs, and error budgets.
- Proficiency in AWS cloud infrastructure and services.
- Experience with Kubernetes platforms, specifically Amazon EKS.
- Knowledge of identity and access management systems such as Auth0 and AWS IAM.
- Familiarity with networking fundamentals like DNS, load balancing, and TLS.
- Experience with GitOps workflows and infrastructure automation using Terraform and Flux.
- Demonstrated ability to build automation and tooling using Python, Go, or Bash.
- Excellent written and verbal communication skills.
Responsibilities
- Design, implement, and maintain highly available, scalable, and resilient systems.
- Define and enforce best practices for monitoring, alerting, and logging within Datadog.
- Develop robust software and tooling to automate operational tasks and reduce toil.
- Participate in on-call rotations and lead blameless post-mortems for incident response.
- Collaborate with engineering teams to define and track SLIs, SLOs, and error budgets.
- Contribute to infrastructure as code efforts using Terraform and GitHub Actions.
- Provide SRE expertise in system design reviews and architecture.
View Full Description & ApplyYou'll be redirected to the employer's site