Senior Engineering Manager, Site Reliability

Posted about 1 month agoViewed
55000 - 105000 USD per year
Canada, LATAMFull-TimeSoftware Development
Company:Next League, LLC
Location:Canada, LATAM, EST
Languages:English
Seniority level:Senior, Minimum of 5 years of experience as a Site Reliability Engineer (SRE). At least 2 years of experience managing an SRE team.
Experience:Minimum of 5 years of experience as a Site Reliability Engineer (SRE). At least 2 years of experience managing an SRE team.
Skills:
AWSLeadershipKubernetesPeople ManagementCI/CDLinuxDevOpsTerraformMentoring
Requirements:
Minimum of 5 years of experience as a Site Reliability Engineer (SRE). At least 2 years of experience managing an SRE team. Proven success in Site Reliability Engineering (SRE), DevOps, or a related discipline, with deep expertise in large-scale system architecture, including cloud services and enterprise deployments. Experience with AWS, Cloudwatch, DataDog is required. Proven experience in managing technology platforms, particularly during periods of high traffic. Proven experience in people management, including scheduling, on-call rotations, and fostering team members' professional development. Advanced hands-on knowledge of automation scripting, infrastructure as code, and contemporary cloud orchestration tools. Demonstrated ability to contribute to strategic planning and initiatives in a technology-focused environment. Exceptional problem-solving, organizational, and leadership skills.
Responsibilities:
Lead and mentor a team of 5 site reliability engineers as a 'player-coach'. Guide, mentor, and foster the professional growth of the SRE team. Champion innovation in automation. Implement advanced monitoring to proactively forecast and mitigate system risks. Align SRE goals with senior leadership's business objectives and client needs. Drive a culture of continuous improvement. Oversee the development and implementation of training programs. Oversee and negotiate with technology vendors. Work with clients to define SLAs and procedures for escalation to 3rd party vendors. Engage in SRE planning and execution, including on-call schedule for LiveOps support. Develop and execute a comprehensive site reliability strategy. Partner with Solution Architecture to design, implement, and test production systems. Evolve incident management to include risk assessment and develop long-term mitigation strategies. Direct and oversee root cause analyses (RCAs). Maintain service availability and performance, set and monitor SLAs, and reduce downtime. Drive adoption of best practices in CI/CD, cloud architecture, and system resilience. Hands-on execution with expectation of being 70%+ billable on client work.
Similar Jobs:
Posted 19 days ago
United States, CanadaFull-TimeSoftware Development
Manager, Site Reliability Engineering
Company:Jellyvision
Posted 6 months ago
Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Kingdom, United States of America, UruguayFull-TimeNonprofit, Free Knowledge, Software
Site Reliability Engineering Manager
Posted 19 days ago
BrazilFull-TimeDigital Engineering
DevOps - SRE (Site Reliability Engineering)
Company:Encora