Staff Backend Engineer - Application Core Services, Stacks
USA, EST, CSTFull-TimeStaff
Salary174,986 - 209,983 USD per year
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSGCPKubernetesAzureGoTerraformHelm
Requirements
- At least 1 year of fully remote work experience
- Worked on a big SaaS platform and dealt with common distributed systems problems (e.g., scalability, multi-tenancy, data isolation, HA)
- Professional experience with Golang and willing to work across both backend service and application code
- Care deeply about developer and user experience and the quality of the products
- Experience with delivering projects from gathering requirements, brainstorming ideas to shipping a product to the customer’s hands in a self-driven way
- Write clean, robust, well-tested software that other engineers can understand, operate, and maintain
- Experience with mentoring junior engineers in a collaborative but asynchronous environment
- Can take on complex challenges and break them down to achieve tight learning loops
- Willing to work across teams and align work with needs of other squads and external stakeholders
- Strong Kubernetes experience in AWS, GCP, or Azure
- Familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet)
- Experience participating in blameless incident response and writing high-quality post-incident reviews
Responsibilities
- Design, build, and operate reconciliation systems, including the SSS backend, to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration
- Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient
- Improve operational efficiency by reducing deployment complexity (e.g., aiming for single PR regional SSS deployment) and contributing to the Stack Config Reconciliation project
- Manage rollout mechanisms for provisioned plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration
- Support new region and cluster rollouts, including the operational paths required to bring stacks online safely in new Grafana Cloud regions
- Improve incident response and recovery paths for stack misalignment, reconciliation failures, plugin rollout issues, and Hosted Grafana integration failures
- Partner with Product, Hosted Grafana, Infrastructure, Support, and adjacent AppCore squads on customer-impacting stack lifecycle work
- Contribute to roadmap planning, technical design, OnCall improvements, and long-term simplification of stack operations
- Own the production behavior of the systems you build, including improving runbooks, dashboards, alerts, reconciliation safety, rollout controls, and recovery procedures
- Participate in our follow-the-sun OnCall rotation
- Participating in team decisions, such as roadmap planning and prioritization
View Full Description & ApplyYou'll be redirected to the employer's site