Site Reliability Engineering (SRE) Lead

New

Mexico, Monday to Friday, 09:00 – 18:00Full-TimeLead

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities.
Hands-on experience with OpenTelemetry.
Expertise with APM tools (New Relic, Datadog, AppDynamics, or Dynatrace).
Proficiency in Terraform.
Understanding of cloud platforms (AWS, GCP, or Azure).
Experience with automation/configuration management (Ansible, Chef, or Puppet).
Knowledge of CI/CD tools (GitHub Actions, Jenkins, or Azure DevOps).
Experience managing Kubernetes and containerized environments (Docker, Helm).
Familiarity with log aggregation platforms (ELK Stack or Splunk).
Advanced English skills.

Lead the strategic development and management of observability and reliability frameworks across the organization.
Design and implementation of monitoring and observability solutions, collaborating with engineering teams.
Manage Infrastructure as Code (IaC) initiatives using Terraform.
Drive automation strategies for monitoring, alerting, and logging pipelines.
Develop and maintain comprehensive observability roadmaps.
Collaborate with product management, sales, and pre-sales teams.
Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability.
Engage with vendors and strategic partners to evaluate and integrate solutions.
Mentor and develop junior engineers and analysts.

View Full Description & ApplyYou'll be redirected to the employer's site