7+ years of software engineering experience, with at least 5 years focused on reliability, infrastructure, or platform engineering 3+ years experience with AWS and proven ability to build effective monitoring, alerting, and observability solutions Track record of implementing, maintaining, and improving SLOs and uptime KPIs for critical services Expert knowledge of Linux, Docker, and distributed systems principles with their real-world applications Solid programming skills in both application and infrastructure languages (Python, Go, etc.) Strong grasp of security best practices and a data-driven approach to enhancing stability and availability Excellent communication skills with the ability to collaborate effectively across teams and explain complex technical concepts clearly