Cloud Infrastructure Engineer
New
Based in the United StatesFull-TimeMiddle
Salary85,000 - 100,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 2–5+ years
- Required Skills
- PythonKubernetesGrafanaPrometheusCI/CDLinuxTerraform
Requirements
- 2–5+ years of experience in Cloud Infrastructure Engineering, DevOps, or Site Reliability Engineering roles.
- Strong hands-on experience operating Kubernetes in production environments.
- Proven experience building CI/CD pipelines and working with GitOps methodologies.
- Solid experience with Infrastructure as Code tools such as Terraform or equivalent solutions.
- Strong Linux administration and troubleshooting skills in production environments.
- Proficiency in Python or another scripting language for automation and tooling.
- Good understanding of networking concepts, Kubernetes security, and deployment strategies.
- Experience with observability tools and performance monitoring solutions.
- Familiarity with load testing, system tuning, and reliability engineering practices.
- Strong collaboration and communication skills, with the ability to work across engineering teams.
Responsibilities
- Design, deploy, and manage production-grade Kubernetes clusters, including networking policies, RBAC, workload scheduling, and cluster security configurations.
- Build and maintain CI/CD pipelines using Infrastructure as Code and GitOps practices to ensure reliable and repeatable deployments.
- Provision and automate cloud infrastructure using tools such as Terraform or similar IaC frameworks.
- Develop and manage containerization workflows, including secure image building, versioning, and promotion across environments.
- Implement and maintain observability stacks using tools such as Prometheus, Grafana, and OpenTelemetry to ensure system health and performance visibility.
- Support performance optimization efforts including load testing, capacity planning, and system resilience validation.
- Participate in incident response, root cause analysis, and ongoing reliability engineering improvements.
- Manage and support stateful services such as databases, caching systems, and messaging platforms in production environments.
- Maintain clear and comprehensive technical documentation covering architecture, operations, and recovery procedures.
View Full Description & ApplyYou'll be redirected to the employer's site