Software Development Engineer III - Infrastructure
New
IndiaFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 4+ years of experience
- Required Skills
- GCPKubernetesTroubleshooting
Requirements
- 4+ years of experience operating large-scale systems
- Experience with GCP or other public cloud platforms
- Experience with Kubernetes (GKE) in production
- Ability to identify systemic issues and propose long-term fixes
- Experience leading incident response or reliability initiatives
- Strong understanding of reliability, security, and operational best practices
- Comfortable working in on-call and incident response environments
- Strong troubleshooting and communication skills
- Experience supporting or operating production systems
- Comfortable mentoring junior engineers and influencing peers
Responsibilities
- Participate in 24/7 on-call rotations for core infrastructure systems
- Execute incident response during production events, including triage, mitigation, and recovery
- Maintain and improve runbooks, operational procedures, and escalation paths
- Help reduce MTTR and prevent repeat incidents through engineering solutions
- Improve reliability of core infrastructure components including: Kubernetes (GKE) clusters, Cloud networking and load balancing & Edge services (Cloudflare)
- Identify systemic reliability issues and drive corrective actions
- Support capacity planning, scaling, and resilience testing
- Execute security remediations across cloud and Kubernetes environments
- Support enforcement of: IAM least-privilege access, Network security controls & Runtime security policies
- Partner with Platform Security on vulnerability management and remediation
- Support security incident response and post-incident reviews
View Full Description & ApplyYou'll be redirected to the employer's site