Director, Site Reliability Engineering

Posted 7 days agoViewed

💎 Seniority level: Director, 5+ years

📍 Location: United States, Canada

💸 Salary: 190000.0 - 300000.0 USD per year

🔍 Industry: Software Development

🏢 Company: Invoca👥 201-500💰 $83,000,000 Series F almost 3 years agoDigital Marketing Artificial Intelligence (AI)Advertising Analytics Telecommunications

🗣️ Languages: English

⏳ Experience: 5+ years

🪄 Skills: AWSDockerLeadershipCloud ComputingGCPKafkaKubernetesMySQLPeople ManagementGrafanaPostgresPrometheusCI/CDRESTful APIsLinuxDevOpsTerraformMicroservicesAnsibleScriptingSoftware EngineeringSaaS

Requirements:

5+ years of hands-on experience in an SRE, DevOps, sysadmin, or infrastructure engineering role
Have strong opinions coupled with an open mind for infrastructure design, architecture, and automation based on organizational context, experience, and industry practices
Ability to use understanding of both established systems and general industry direction to help guide strategic decisions
Cloud computing fundamentals, particularly in AWS & GCP
Containerization, specifically Docker and Kubernetes via kops
Linux, especially Debian
Configuration management tooling, particularly Chef
Observability tooling, we use Prometheus, Grafana, Thanos, Karma, and ELK
Telephony with SIP, FreeSWITCH, and Kamailio
Other ownership areas include Kafka, Consul, MySQL
3+ years of experience directly managing SRE, DevOps, sysadmin, or other infrastructure teams

Responsibilities:

Provide direct management to an SRE Tech Lead and a team of 8-10 direct reports across two teams
Build capabilities in your engineers to meet the requirements and competencies of their role
Organize the team around solving challenging problems presented by the team and the business
Draft, evolve, and communicate process, strategy, vision, and goals
Assist or own vendor management for infrastructure and platform tools
Apply a build/borrow/buy framework to technology decisions
Assist with compliance auditing activities for PCI, SOC, and ISO
Set standards and policies for infrastructure usage across the engineering org
Solicit feedback from internal customers on infrastructure challenges and opportunities
Organize and facilitate work in 2-week sprints, initiatives, epics, and stories
Own the post-incident work process for the team to improve following incidents in our service area
Administrative work and facilitation for the team
Participate in an incident commander on-call rotation approximately two days per month

Apply