Senior Site Reliability Engineer
New
R
RemoteHR Technology
Remote-EMEA; prioritising Europe, Async-firstFull-TimeSenior
Salary$53,300 — $119,850 USD
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Required Skills
- AWSDockerBashKubernetesGoGrafanaPrometheusCI/CDTerraform
Requirements
- Solid professional experience in SRE, DevOps, or Platform Engineering.
- Strong hands-on experience operating and scaling Kubernetes production clusters and Docker.
- Experience building and managing cloud infrastructure on AWS or similar.
- Strong infrastructure-as-code practice with Terraform.
- Experience with reliability frameworks: SLOs, SLIs, error budgets, and alerting strategies.
- Solid observability background using tools like OpenTelemetry, Grafana, or Prometheus.
- Proficiency with CI/CD tools such as GitLab CI or GitHub Actions.
- Comfortable coding with Golang and Bash scripting.
- Practical, embedded use of AI in infrastructure or operations work with agentic workflows.
- Clear communication skills in an async-first, global setting.
- Ability to take ownership of challenges and collaborate across cultures.
Responsibilities
- Lead solution discovery and delivery for reliability and infrastructure problems with real ambiguity, complexity, or scope.
- Contribute to the platform's architecture, tooling, and roadmap, influencing team priorities.
- Define and operate reliability practices such as SLOs/SLIs, error budgets, and alerting strategies.
- Resolve cross-team requests and turn recurring issues into reusable fixes and runbooks.
- Work AI-natively by building reusable prompts, tooling, and agentic workflows to improve team efficiency.
- Mentor less-senior engineers and participate in hiring, onboarding, and RFC discussions.
- Collaborate with security on platform hardening and infrastructure cost-efficiency.
- Participate in incident response and on-call rotations.
View Full Description & ApplyYou'll be redirected to the employer's site