Senior Engineering Manager, Site Reliability

Posted about 1 month agoViewed

55000 - 105000 USD per year

Canada, LATAMFull-TimeSoftware Development

Company:Next League, LLC

Location:Canada, LATAM, EST

Languages:English

Seniority level:Senior, Minimum of 5 years of experience as a Site Reliability Engineer (SRE). At least 2 years of experience managing an SRE team.

Experience:Minimum of 5 years of experience as a Site Reliability Engineer (SRE). At least 2 years of experience managing an SRE team.

Skills:

AWSLeadershipKubernetesPeople ManagementCI/CDLinuxDevOpsTerraformMentoring

Requirements:

Minimum of 5 years of experience as a Site Reliability Engineer (SRE). At least 2 years of experience managing an SRE team. Proven success in Site Reliability Engineering (SRE), DevOps, or a related discipline, with deep expertise in large-scale system architecture, including cloud services and enterprise deployments. Experience with AWS, Cloudwatch, DataDog is required. Proven experience in managing technology platforms, particularly during periods of high traffic. Proven experience in people management, including scheduling, on-call rotations, and fostering team members' professional development. Advanced hands-on knowledge of automation scripting, infrastructure as code, and contemporary cloud orchestration tools. Demonstrated ability to contribute to strategic planning and initiatives in a technology-focused environment. Exceptional problem-solving, organizational, and leadership skills.

Responsibilities:

Lead and mentor a team of 5 site reliability engineers as a 'player-coach'. Guide, mentor, and foster the professional growth of the SRE team. Champion innovation in automation. Implement advanced monitoring to proactively forecast and mitigate system risks. Align SRE goals with senior leadership's business objectives and client needs. Drive a culture of continuous improvement. Oversee the development and implementation of training programs. Oversee and negotiate with technology vendors. Work with clients to define SLAs and procedures for escalation to 3rd party vendors. Engage in SRE planning and execution, including on-call schedule for LiveOps support. Develop and execute a comprehensive site reliability strategy. Partner with Solution Architecture to design, implement, and test production systems. Evolve incident management to include risk assessment and develop long-term mitigation strategies. Direct and oversee root cause analyses (RCAs). Maintain service availability and performance, set and monitor SLAs, and reduce downtime. Drive adoption of best practices in CI/CD, cloud architecture, and system resilience. Hands-on execution with expectation of being 70%+ billable on client work.

Similar Jobs:

Posted 19 days ago

United States, CanadaFull-TimeSoftware Development

Manager, Site Reliability Engineering

Company:Jellyvision

Posted 6 months ago

Australia, Austria, Bangladesh, Belgium, Brazil, Canada, Colombia, Costa Rica, Croatia, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, India, Indonesia, Ireland, Israel, Italy, Kenya, Mexico, Netherlands, Nigeria, Peru, Poland, Singapore, South Africa, Spain, Sweden, Switzerland, Uganda, United Kingdom, United States of America, UruguayFull-TimeNonprofit, Free Knowledge, Software

Site Reliability Engineering Manager

Company:Wikimedia Foundation

Posted 19 days ago

BrazilFull-TimeDigital Engineering

DevOps - SRE (Site Reliability Engineering)

Company:Encora

United StatesCanadaArgentinaBrazilFull-TimeSoftware DevelopmentPosted 2 days ago

Senior Site Reliability Engineer

Company:Laravel(11-50 employees, Developer Tools, Web Development, Enterprise Software)

AWSDockerPHP+7 more