Senior DevOps Engineer

New
Based in the United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
8+ years of experience in DevOps, SRE, or platform engineering roles
Required Skills
AWSPythonBashKubernetesPrometheusHelm

Requirements

  • 8+ years of experience in DevOps, SRE, or platform engineering roles
  • Strong hands-on experience with Kubernetes and related ecosystem tools (Helm, Docker, ingress controllers)
  • Solid experience with CI/CD systems, preferably GitLab CI
  • Strong scripting ability in Bash or Python (Go is a plus)
  • Practical experience with AWS services such as IAM, EC2/EKS, S3, CloudWatch, and Secrets Manager
  • Deep understanding of observability concepts including metrics, logs, tracing, and alerting systems
  • Experience with Prometheus, Alertmanager, Thanos, and OpenTelemetry
  • Comfortable working in ticket-driven environments (Jira, ServiceNow)
  • Strong communication skills
  • Bonus: Terraform experience for infrastructure as code
  • Bonus: API integration experience
  • Bonus: Strong Linux and container runtime debugging knowledge

Responsibilities

  • Operate and improve platform tooling to support reliable software delivery, including ticket triage, issue resolution, and service request handling
  • Maintain and evolve self-service workflows, including documentation, templates, and deployment guardrails
  • Manage Kubernetes environments, including Helm deployments, namespace management, rollout troubleshooting, and incident response support
  • Support and enhance CI/CD pipelines (primarily GitLab CI), including job configuration, deployment strategies, and quality gates
  • Monitor and improve observability systems using tools such as Prometheus, Alertmanager, Thanos, and OpenTelemetry
  • Maintain dashboards, alerts, and SLO/SLA indicators while reducing noise
  • Support service instrumentation across metrics, logs, and traces using OpenTelemetry
  • Participate in on-call rotations, incident response, and post-incident documentation
  • Drive automation and cost optimization efforts, including resource right-sizing
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now