Senior Site Reliability Engineer
New
D
DevsuSoftware Engineering
Peru. Argentina. Brazil. Colombia. Guatemala. Honduras, PSTFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Required Skills
- PythonBashGCPKubernetesGrafanaPrometheusLinuxServiceNow
Requirements
- Strong experience as a Site Reliability Engineer or Reliability Engineer
- Deep hands-on expertise with Grafana
- Solid experience with monitoring and observability systems
- Production experience operating Kubernetes environments
- Experience supporting systems in GCP and on-prem environments
- Strong Linux systems and troubleshooting skills
- Fluent English (written and spoken)
- Ability to work in PST time zone
- Ability to participate in an on-call rotation including one weekend day
Responsibilities
- Own and operate the monitoring and observability stack across on-prem and GCP environments
- Design, build, and maintain Grafana dashboards for infrastructure, Kubernetes, and applications
- Define, tune, and maintain alerts to ensure high signal-to-noise ratio
- Apply SRE principles to improve availability, performance, and resilience
- Participate in on-call rotations and SEV incident response
- Support and monitor Kubernetes environments (GKE and on-prem clusters)
- Provide L2/L3 application support coverage during resource shortages or major incidents
View Full Description & ApplyYou'll be redirected to the employer's site