Site Reliability Engineer

New

Australia, Canada, Germany, India, United Kingdom, United StatesFull-TimeSenior

Salary164,382 - 215,050 USD per year

Apply NowOpens the employer's application page

Job Details

Experience: 15 + years
Required Skills: PythonBashKubernetesGrafanaPrometheusCI/CDLinuxDistributed Systems

15+ years in system reliability, DevSecOps, cloud operations, or infrastructure engineering
Bachelor's degree or 4 years of work experience in lieu of degree
Strong scripting and automation skills (Python, Bash, PowerShell, etc.)
Hands‑on experience with monitoring tools (Prometheus, Grafana, Splunk, ELK, Datadog, etc.)
Familiarity with Kubernetes, container orchestration, and modern CI/CD pipelines
Understanding of networking, Linux system internals, and distributed systems
Ability to troubleshoot complex technical issues across the stack

Build/Design and maintain highly available, scalable systems across cloud and on‑prem environments.
Develop automation solutions that improves observability, speeds recovery, and eliminates manual operational work.
Implement monitoring, alerting, and performance tuning strategies that ensure system health.
Collaborate with development and infrastructure teams to design reliable architectures and CI/CD pipelines.
Conduct root cause analysis and drive systemic improvements to prevent future incidents.
Champion SRE best practices such as SLIs/SLOs, error budgets, and automated incident response.
Provide inputs into proposal operations in area of subject matter expertise.

View Full Description & ApplyYou'll be redirected to the employer's site