Staff Software Engineer, Infrastructure
Based in the United StatesFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ years
- Required Skills
- KubernetesGoGrafanaCI/CDLinuxTerraform
Requirements
- 8+ years of professional software engineering experience in backend, infrastructure, or platform engineering roles.
- Strong hands-on expertise in Go or similar backend languages.
- Proven experience building, scaling, and operating production infrastructure or cloud-based platforms.
- Deep knowledge in at least one of: Kubernetes, cloud infrastructure, networking, reliability engineering, or developer platforms.
- Strong understanding of Linux systems, networking fundamentals, and production operations at scale.
- Experience driving cross-team alignment and influencing technical direction through design documents, RFCs, and architecture reviews.
- Familiarity with modern DevOps practices such as Terraform, CI/CD pipelines, GitOps, and observability tooling (Prometheus, OpenTelemetry, Grafana).
- Strong communication skills for a distributed, remote-first environment.
Responsibilities
- Define and lead the evolution of internal infrastructure platforms by turning ambiguous technical challenges into scalable architectural proposals and driving them through RFCs and cross-team alignment.
- Design and build self-service platform capabilities and APIs (primarily in Go) for provisioning, onboarding, deployment, observability, and operational workflows.
- Establish and improve delivery standards using Terraform, GitOps (Argo CD), CI/CD pipelines, and progressive deployment strategies.
- Architect and evolve multi-region, multi-account infrastructure on Kubernetes (EKS), including networking, ingress, traffic routing, and cross-region connectivity.
- Improve platform reliability and operational maturity through enhanced SLOs, monitoring, alerting, and incident management practices using observability tools.
- Drive adoption of platform capabilities across engineering teams by ensuring solutions are usable and reduce operational friction.
- Participate in on-call rotations while also improving operational health through better alerting, runbooks, and long-term reliability improvements.
View Full Description & ApplyYou'll be redirected to the employer's site