Software Development Engineer III - Infrastructure

New
IndiaFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
4+ years of experience
Required Skills
GCPKubernetesTroubleshooting

Requirements

  • 4+ years of experience operating large-scale systems
  • Experience with GCP or other public cloud platforms
  • Experience with Kubernetes (GKE) in production
  • Ability to identify systemic issues and propose long-term fixes
  • Experience leading incident response or reliability initiatives
  • Strong understanding of reliability, security, and operational best practices
  • Comfortable working in on-call and incident response environments
  • Strong troubleshooting and communication skills
  • Experience supporting or operating production systems
  • Comfortable mentoring junior engineers and influencing peers

Responsibilities

  • Participate in 24/7 on-call rotations for core infrastructure systems
  • Execute incident response during production events, including triage, mitigation, and recovery
  • Maintain and improve runbooks, operational procedures, and escalation paths
  • Help reduce MTTR and prevent repeat incidents through engineering solutions
  • Improve reliability of core infrastructure components including: Kubernetes (GKE) clusters, Cloud networking and load balancing & Edge services (Cloudflare)
  • Identify systemic reliability issues and drive corrective actions
  • Support capacity planning, scaling, and resilience testing
  • Execute security remediations across cloud and Kubernetes environments
  • Support enforcement of: IAM least-privilege access, Network security controls & Runtime security policies
  • Partner with Platform Security on vulnerability management and remediation
  • Support security incident response and post-incident reviews
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now