Cloud Reliability & Recovery Engineer

Remote - IndiaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years in cloud infrastructure, SRE, or IT disaster recovery engineering roles 3+ years of hands-on AWS experience in production environments at scale
Required Skills
AWSPythonBashKubernetesCI/CDTerraformGitHub ActionsCloudFormationHIPAA

Requirements

  • 5+ years in cloud infrastructure, SRE, or IT disaster recovery engineering roles
  • 3+ years of hands-on AWS experience in production environments at scale
  • Proven delivery of multi-region DR architectures with defined and tested RTO/RPO targets
  • Expert-level proficiency with core AWS resilience services
  • Strong scripting skills: Python, Bash, or PowerShell for automation and orchestration
  • Experience with Infrastructure as Code: Terraform and/or AWS CloudFormation
  • Solid understanding of networking fundamentals: VPC, TGW, Direct Connect, VPN, DNS failover
  • Excellent written and verbal communication; able to produce executive-level DR reports
  • AWS Certified Solutions Architect – Professional or AWS Certified DevOps Engineer – Professional (Preferred)
  • AWS Certified Advanced Networking – Specialty certification (Preferred)
  • Experience with AWS Resilience Hub for automated resilience assessments and policy enforcement (Preferred)
  • Familiarity with CloudEndure / AWS Elastic Disaster Recovery (DRS) for workload replication (Preferred)
  • Knowledge of Kubernetes-based DR (EKS multi-region, Velero backups, ArgoCD GitOps failover) (Preferred)
  • Hands-on experience with serverless DR patterns (Lambda, API Gateway, DynamoDB) (Preferred)

Responsibilities

  • Design and implement multi-region, multi-AZ AWS architectures that meet RTO/RPO targets
  • Engineer active-active and active-passive failover patterns using Route 53, Global Accelerator, and CloudFront
  • Build automated DR runbooks and playbooks using AWS Systems Manager Automation and Step Functions
  • Administer AWS Backup across all services (EC2, EBS, RDS, EFS, FSx, DynamoDB, Aurora) with policy-based automation
  • Author and maintain Terraform/CloudFormation templates for all BCP/DR infrastructure components
  • Automate DR testing pipelines through CI/CD (CodePipeline, CodeBuild, GitHub Actions)
  • Build CloudWatch dashboards, alarms, and composite alarms for availability and DR-readiness indicators
  • Participate in on-call rotations and lead DR incident response; conduct post-incident reviews (PIRs)
  • Conduct regular BCP/DR tabletop exercises and full failover simulations to validate recovery procedures and improve organizational readiness, document results and action items.
  • Ensure DR controls meet SOC 2, ISO 22301, NIST 800-53, and HIPAA/PCI requirements as applicable
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now