Staff Site Reliability Engineer

Posted 2 months agoViewed
United StatesFull-TimeSports Media
Company:FloSports, Inc.
Location:United States
Languages:English
Seniority level:Staff, 8-10+ years
Experience:8-10+ years
Skills:
LeadershipNode.jsAWS EKSKubernetesGoCI/CDDevOpsTerraformSoftware Engineering
Requirements:
8-10+ years in SRE, DevOps, or Software Engineering, with a proven track record of operating at a Staff level. History of mentoring other senior engineers, influencing technical direction across multiple teams, and leading large-scale projects to completion. Deep expertise in languages like Node.js or Go and a history of building and maintaining critical automation and services. Expert-level, architectural understanding of Kubernetes (EKS preferred), including networking, custom controllers, and control plane optimization. Terraform expert who has designed and implemented large-scale, reusable, and secure IaC frameworks. Designed and implemented observability strategies from the ground up, leveraging platforms like Datadog to create actionable SLOs and provide deep system insight. Designed, built, and scaled complex CI/CD systems (ideally with GitHub Actions and self-hosted runners) that are used by an entire engineering organization. Can decompose highly ambiguous, complex, cross-functional problems into solvable parts and lead the technical solution from concept to production.
Responsibilities:
Lead the technical architecture and execution of our landmark migration from a legacy GCP environment to a modern, scalable infrastructure on AWS EKS. Architect, design, and drive our core infrastructure, defining the patterns for Terraform and GitOps that the rest of the organization will follow. Champion and drive our SLO-driven culture, setting the strategy for how we define, measure, and implement SLOs for critical user journeys. Lead the design and development of critical tooling and automation in Node.js and Go to solve entire classes of problems for our developers. Lead the architectural evolution of our in-house, K6-based load testing platform, ensuring it can scale to meet future product demands. Act as a primary subject matter expert for our Istio service mesh, driving its architecture, adoption, and optimization. Spearhead and own high-priority initiatives, including the development of agentic workflows and intelligent automation for SRE domains like proactive scaling and automated remediation. Act as a technical leader by participating in our blameless on-call rotation, mentoring other engineers through complex incidents and ensuring all post-mortems lead to systemic, long-term improvements.
About the Company
FloSports, Inc.
View Company Profile
Similar Jobs:
Posted 4 days ago
U.S.Full-TimeSoftware Development
Staff Site Reliability Engineer
Posted 18 days ago
United StatesFull-TimeSoftware Development
Staff Site Reliability Engineer
Company:Gradle Inc.
Posted about 1 month ago
United StatesFull-TimeFintech
Staff Site Reliability Engineer
Company:Stash