DevOps/SRE Team Lead

T
TelestreamDigital Media
Workable locations: United States Location: United States Location: US RemoteFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5-8+ years of experience in DevOps/SRE, with 2-3+ years in a leadership role.
Required Skills
PythonBashJenkinsKubernetesGoGrafanaPrometheusCI/CDTerraformTroubleshootingGitHub ActionsDatadogCloudFormationHelm

Requirements

  • Bachelor’s degree in computer science, Engineering or equivalent
  • 5-8+ years of experience in DevOps/SRE, with 2-3+ years in a leadership role.
  • Hands-on experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or equivalent) with direct integration into Kubernetes deployment workflows
  • Production-level experience with infrastructure as code (Terraform required; CloudFormation or Pulumi a plus), including managing cloud-hosted Kubernetes clusters (EKS, GKE, or AKS)
  • Experience with monitoring, logging, and observability tooling in Kubernetes environments (Prometheus, Grafana, Datadog, ELK/EFK stack, or equivalent); ability to build dashboards and alerts from scratch, not just consume existing ones
  • Demonstrated, hands-on Kubernetes experience in production environments: cluster administration, Helm chart authoring and management, RBAC configuration, persistent storage, horizontal/vertical pod autoscaling, and diagnosing and resolving real production failures (CrashLoopBackOff, OOMKilled, networking issues, etc.)
  • Strong troubleshooting skills with the ability to diagnose infrastructure and application issues live, under pressure, without reference materials
  • Proficiency in scripting languages (Python, Go, Bash, or PowerShell); ability to write and own automation scripts, not just modify existing ones
  • Strong communication, conflict resolution, and the ability to influence without authority
  • Excellent communication and collaboration skills

Responsibilities

  • Design, deploy, and administer production Kubernetes clusters, including workload scheduling, namespace management, RBAC, network policies, and cluster upgrades
  • Design and maintain continuous integration/deployment pipelines to automate testing and deployment, including Kubernetes-native delivery workflows using Helm and ArgoCD or equivalent
  • Track software performance, fixing errors, troubleshooting systems, implement preventative measures to ensure smooth workflows
  • Implement and manage infrastructure. Utilize Terraform or CloudFormation for IaC management
  • Optimize cloud resources by implementing cost-effective solutions
  • Collaborate with various teams to ensure smooth deployment
  • Monitor and create new processes based on performance analysis
  • Implement security best practices, including automated compliance checks and secure code deployment
  • Manage the technical roadmap, architecture while mentoring SRE and DevOps Engineers.
  • Hire, coach, and manage a team of DevOps engineers and Site Reliability Engineers.
  • Define DevOps/Platform roadmap aligned with business goals (e.g., cloud cost optimization, automation maturity).
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now