Site Reliability Engineering (SRE) Lead
New
Mexico, Monday to Friday, 09:00 – 18:00Full-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 8-10+ years
- Required Skills
- Cloud ComputingKubernetesCI/CDDevOpsTerraform
Requirements
- 8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities.
- Hands-on experience with OpenTelemetry.
- Expertise with APM tools (New Relic, Datadog, AppDynamics, or Dynatrace).
- Proficiency in Terraform.
- Understanding of cloud platforms (AWS, GCP, or Azure).
- Experience with automation/configuration management (Ansible, Chef, or Puppet).
- Knowledge of CI/CD tools (GitHub Actions, Jenkins, or Azure DevOps).
- Experience managing Kubernetes and containerized environments (Docker, Helm).
- Familiarity with log aggregation platforms (ELK Stack or Splunk).
- Advanced English skills.
Responsibilities
- Lead the strategic development and management of observability and reliability frameworks across the organization.
- Design and implementation of monitoring and observability solutions, collaborating with engineering teams.
- Manage Infrastructure as Code (IaC) initiatives using Terraform.
- Drive automation strategies for monitoring, alerting, and logging pipelines.
- Develop and maintain comprehensive observability roadmaps.
- Collaborate with product management, sales, and pre-sales teams.
- Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability.
- Engage with vendors and strategic partners to evaluate and integrate solutions.
- Mentor and develop junior engineers and analysts.
View Full Description & ApplyYou'll be redirected to the employer's site