Senior Infrastructure Engineer
New
Based in the United StatesFull-TimeSenior
SalaryUSD $165,750–$195,000
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- PostgreSQLGCPKubernetesCI/CDTerraform
Requirements
- 5+ years of experience in infrastructure, platform engineering, or site reliability engineering roles.
- Strong hands-on experience with Google Cloud Platform services, including GKE, Cloud SQL/AlloyDB, Pub/Sub, GCS, IAM, and Secret Manager.
- Deep expertise in Kubernetes, including cluster management, autoscaling, workload identity, and production operations.
- Strong proficiency in Terraform, including module design, state management, and infrastructure-as-code best practices.
- Experience operating high-traffic, production-grade systems with strict uptime and performance requirements.
- Strong database operations experience, particularly with PostgreSQL and scaling strategies such as read replicas and connection pooling (e.g., PgBouncer).
- Experience designing CI/CD pipelines with zero-downtime deployments and fast rollback strategies.
- Strong understanding of observability systems, including metrics, logging, tracing, and alerting strategies.
- Ability to diagnose complex system issues under pressure and drive long-term reliability improvements.
- Strong collaboration skills with engineering, product, and data teams in fast-paced environments.
- High ownership mindset with the ability to independently drive infrastructure initiatives from design to production.
Responsibilities
- Operate, scale, and optimize cloud infrastructure supporting high-traffic consumer applications, ensuring reliability during extreme seasonal demand spikes.
- Manage and evolve Kubernetes-based environments, including autoscaling strategies, workload orchestration, and cluster performance tuning.
- Design and maintain database infrastructure at scale, including PostgreSQL/AlloyDB systems, read replicas, connection pooling, and performance optimization.
- Build and improve CI/CD pipelines using GitOps principles to enable fast, safe, and zero-downtime deployments.
- Implement and maintain infrastructure-as-code practices using Terraform, ensuring consistency, reliability, and reproducibility across environments.
- Develop and maintain observability systems, including monitoring, alerting, distributed tracing, and SLO/SLI frameworks.
- Secure infrastructure using modern cloud security practices such as IAM controls, secret management, WAF configurations, and network security tooling.
- Partner with engineering and AI teams to provision and support infrastructure for experimentation and production ML workloads.
View Full Description & ApplyYou'll be redirected to the employer's site