Senior Manager, Software Engineering (Infrastructure)

Canada (British Columbia, Ontario), London, India (Gujarat, Maharashtra, Bengaluru)Full-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 8+ years of experience in infrastructure, SRE, or cloud engineering roles, with 3+ years leading specialized engineering teams
Required Skills: AWSKubernetesMachine LearningDevOpsTerraformLLMMLOps

8+ years of experience in infrastructure, SRE, or cloud engineering roles
3+ years leading specialized engineering teams
Extensive experience with AWS
Extensive experience with modern infrastructure-as-code (Terraform)
Proven track record of leading teams through production incidents and complex architectural migrations
Understanding of the unique infrastructure needs for machine learning
Proven expertise in managing large-scale containerized environments
Proven expertise in leveraging observability stacks to ensure platform health
Ability to align technical roadmaps with business objectives and advocate for infrastructure investment
Experience with FinOps or managing significant cloud budgets is a plus
Background in supporting AI agentic workflows or autonomous orchestration systems is a plus

Lead and grow multiple teams across SRE, Cloud Infrastructure, and MLOps.
Coach and develop engineering managers and senior individual contributors, fostering a culture of ownership and high craft.
Build a "Platform-as-a-Product" mindset, ensuring that infrastructure and ML tooling serve as enablers for the rest of the engineering organization.
Own the operational health of production systems, including availability, latency, and durability.
Define and evolve SLIs, SLOs, and error budgets, moving the organization toward data-driven reliability decisions.
Lead incident response, driving blameless postmortems and systemic improvements.
Evolve Loopio’s cloud architecture, overseeing capacity planning, disaster recovery, and business continuity.
Drive the MLOps roadmap, establishing standards for model deployment, monitoring, and scaling.
Lead Cloud FinOps, ensuring our infrastructure and AI compute costs are visible, intentional, and optimized.
Partner with Security to ensure "secure-by-default" infrastructure and robust backup/recovery strategies.

View Full Description & ApplyYou'll be redirected to the employer's site