Sr. Platform Engineer
New
United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSKubernetesCI/CDTerraformGitHub ActionsDatadogCloudFormation
Requirements
- Strong experience in infrastructure engineering, platform engineering, DevOps, or site reliability engineering roles
- Hands-on expertise with AWS production environments, including infrastructure design and operational management
- Advanced proficiency with Infrastructure as Code tools, particularly Terraform, with practical production-level usage
- Solid experience managing Kubernetes clusters in production, including deployment, configuration, and ongoing maintenance
- Demonstrated ability to design and operate CI/CD pipelines, especially using GitHub Actions
- Experience implementing observability and monitoring solutions such as Datadog, including metrics, logging, and alerting frameworks
- Strong understanding of containerization workflows, including image optimization and efficient build strategies
- Ability to operate effectively in evolving environments where priorities shift and ambiguity is common
- Strong collaboration and communication skills, with a pragmatic, iterative approach to problem-solving
- Experience in startup or high-growth environments and exposure to platform engineering practices is highly valued
Responsibilities
- Design, build, and maintain scalable, secure, and reliable cloud infrastructure in AWS, ensuring strong operational performance and automation across systems
- Develop and manage Infrastructure as Code solutions using tools such as Terraform and CloudFormation to support repeatable and version-controlled deployments
- Deploy, operate, and optimize Kubernetes clusters in production environments, ensuring high availability and efficient workload orchestration
- Build and maintain CI/CD pipelines using tools such as GitHub Actions, with potential exposure to Jenkins or ArgoCD for deployment automation
- Implement and improve observability systems, including monitoring, logging, alerting, and incident response practices (e.g., Datadog or similar tools)
- Support containerized application workflows, including image build pipelines, optimization, and deployment strategies
- Collaborate with engineering teams to troubleshoot infrastructure issues, perform root-cause analysis, and drive long-term system improvements
- Participate in architecture discussions, technical planning, and ongoing platform evolution initiatives to improve reliability and developer experience
View Full Description & ApplyYou'll be redirected to the employer's site