DevOps/SRE Team Lead
T
TelestreamDigital Media
Workable locations: United States
Location: United States
Location: US RemoteFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5-8+ years of experience in DevOps/SRE, with 2-3+ years in a leadership role.
- Required Skills
- PythonBashJenkinsKubernetesGoGrafanaPrometheusCI/CDTerraformTroubleshootingGitHub ActionsDatadogCloudFormationHelm
Requirements
- Bachelor’s degree in computer science, Engineering or equivalent
- 5-8+ years of experience in DevOps/SRE, with 2-3+ years in a leadership role.
- Hands-on experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or equivalent) with direct integration into Kubernetes deployment workflows
- Production-level experience with infrastructure as code (Terraform required; CloudFormation or Pulumi a plus), including managing cloud-hosted Kubernetes clusters (EKS, GKE, or AKS)
- Experience with monitoring, logging, and observability tooling in Kubernetes environments (Prometheus, Grafana, Datadog, ELK/EFK stack, or equivalent); ability to build dashboards and alerts from scratch, not just consume existing ones
- Demonstrated, hands-on Kubernetes experience in production environments: cluster administration, Helm chart authoring and management, RBAC configuration, persistent storage, horizontal/vertical pod autoscaling, and diagnosing and resolving real production failures (CrashLoopBackOff, OOMKilled, networking issues, etc.)
- Strong troubleshooting skills with the ability to diagnose infrastructure and application issues live, under pressure, without reference materials
- Proficiency in scripting languages (Python, Go, Bash, or PowerShell); ability to write and own automation scripts, not just modify existing ones
- Strong communication, conflict resolution, and the ability to influence without authority
- Excellent communication and collaboration skills
Responsibilities
- Design, deploy, and administer production Kubernetes clusters, including workload scheduling, namespace management, RBAC, network policies, and cluster upgrades
- Design and maintain continuous integration/deployment pipelines to automate testing and deployment, including Kubernetes-native delivery workflows using Helm and ArgoCD or equivalent
- Track software performance, fixing errors, troubleshooting systems, implement preventative measures to ensure smooth workflows
- Implement and manage infrastructure. Utilize Terraform or CloudFormation for IaC management
- Optimize cloud resources by implementing cost-effective solutions
- Collaborate with various teams to ensure smooth deployment
- Monitor and create new processes based on performance analysis
- Implement security best practices, including automated compliance checks and secure code deployment
- Manage the technical roadmap, architecture while mentoring SRE and DevOps Engineers.
- Hire, coach, and manage a team of DevOps engineers and Site Reliability Engineers.
- Define DevOps/Platform roadmap aligned with business goals (e.g., cloud cost optimization, automation maturity).
View Full Description & ApplyYou'll be redirected to the employer's site