Minimum of 5 years of experience managing AWS cloud infrastructure at scale.
Strong understanding of core AWS services (EC2, S3, RDS, Lambda, VPC, etc.) and expertise in designing and managing multi-region, scalable cloud architectures.
Hands-on experience with Infrastructure as Code (IAC) tools like Terraform or CloudFormation.
Proven track record of managing and optimizing cloud costs, using tools like AWS Cost Explorer, Trusted Advisor, or other cost-management platforms.
Experience scaling large data systems (including databases, data lakes, and big data platforms) across distributed cloud environments.
Expertise in disaster recovery planning, implementation, and management within a cloud infrastructure.
Solid understanding of cloud security, including IAM policies, encryption, network security, and proactive threat and vulnerability mitigation strategies.
Experience with monitoring and logging tools (e.g., CloudWatch, ELK stack, Prometheus) to ensure infrastructure health and performance.
Ability to communicate complex technical concepts to a variety of stakeholders, including non-technical team members.
Responsibilities:
Design, implement, and manage highly scalable, secure, and cost-optimized AWS cloud infrastructure.
Lead the automation of Infrastructure as Code (IAC) using tools like Terraform, CloudFormation, or similar technologies.
Ensure high availability and reliability of systems, implementing disaster recovery and failover strategies.
Collaborate with software development and data teams to optimize cloud architecture for large-scale data systems.
Implement and maintain security best practices, including monitoring, threat detection, and vulnerability mitigation.
Work on optimizing AWS costs while ensuring the infrastructure meets performance and scalability requirements.
Stay current with the latest cloud technologies, and continuously improve the cloud environment with new tools and services.
Provide technical leadership and mentorship to other engineers, promoting best practices in cloud operations and architecture.
Monitor and respond to infrastructure incidents, ensuring timely resolutions and minimal downtime.