Site Reliability Engineer

New
Australia, Canada, Germany, India, United Kingdom, United StatesFull-TimeSenior
Salary164,382 - 215,050 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
15 + years
Required Skills
PythonBashKubernetesGrafanaPrometheusCI/CDLinuxDistributed Systems

Requirements

  • 15+ years in system reliability, DevSecOps, cloud operations, or infrastructure engineering
  • Bachelor's degree or 4 years of work experience in lieu of degree
  • Strong scripting and automation skills (Python, Bash, PowerShell, etc.)
  • Hands‑on experience with monitoring tools (Prometheus, Grafana, Splunk, ELK, Datadog, etc.)
  • Familiarity with Kubernetes, container orchestration, and modern CI/CD pipelines
  • Understanding of networking, Linux system internals, and distributed systems
  • Ability to troubleshoot complex technical issues across the stack

Responsibilities

  • Build/Design and maintain highly available, scalable systems across cloud and on‑prem environments.
  • Develop automation solutions that improves observability, speeds recovery, and eliminates manual operational work.
  • Implement monitoring, alerting, and performance tuning strategies that ensure system health.
  • Collaborate with development and infrastructure teams to design reliable architectures and CI/CD pipelines.
  • Conduct root cause analysis and drive systemic improvements to prevent future incidents.
  • Champion SRE best practices such as SLIs/SLOs, error budgets, and automated incident response.
  • Provide inputs into proposal operations in area of subject matter expertise.
View Full Description & ApplyYou'll be redirected to the employer's site
164,382 - 215,050 USD per year
Apply Now