Staff Backend Engineer - Adaptive Telemetry

New

USA, United States time zonesFull-TimeStaff

Salary174,986 - 209,983 USD per year

Apply NowOpens the employer's application page

Job Details

Required Skills: PythonKafkaKubernetesC++GoGrafanaPrometheusRustMicroservicesDistributed Systems

Proven delivery of large distributed systems. Experience shipping and operating complex systems that span multiple teams, with clear evidence of technical leadership and impact.
Strong systems-design instincts. Deep understanding of tradeoffs around latency, consistency, availability, scaling and cost.
Hands-on cloud and platform experience. Solid experience with cloud-native architectures (microservices, containers/Kubernetes, IaC) and the operational practices that keep them healthy.
Reliability and performance ownership. Comfortable defining SLOs/SLIs, doing capacity planning, tuning performance, and driving reliability work end-to-end.
Excellent coding and design skills. You write clear, maintainable, well-tested code and can lead technical designs — we use Go, but Python/C/C++/Rust or similar translate well.
Comfort with AI-assisted development. We embrace AI and agentic development so we expect you to be curious and comfortable using AI-powered developer tools and ideally have practical experience folding them into a team’s workflow.
Experience with messaging and telemetry. Familiarity with streaming/messaging systems (e.g., Kafka) and observability tooling (Prometheus/Grafana or equivalents).
Influence without authority. Ability to align cross-functional stakeholders, set priorities and drive outcomes in a remote-first environment.
Strong communicator. Clear written and verbal communication that works across engineers and non-technical stakeholders.

Drive technical strategy and roadmap. Proactively define the architectural vision, prioritize work that unlocks major product or platform improvements, and influence product and engineering decisions.
Lead end-to-end delivery of large, cross-functional projects. Own planning, design, execution, rollout and long-term operation of large initiatives.
Own architecture, reliability, performance and cost for critical systems. Make pragmatic architecture choices that balance scalability, availability, latency and cost while ensuring systems remain maintainable and evolvable.
Define SLOs/SLIs and lead incident response. Establish measurable reliability targets, run high-severity incident response, lead blameless post-mortems, and drive systemic fixes and automation to prevent recurrence.
Improve observability, automation and operational readiness. Champion telemetry, alerting, runbooks, capacity planning and automation efforts that reduce toil, speed debugging and lower MTTR.
Align stakeholders and remove blockers. Coordinate across Product, Design and other teams to align priorities, negotiate tradeoffs, and unblock delivery for large initiatives.
Mentor and grow engineering talent. Coach senior and mid-level engineers, lead design reviews, raise engineering standards, and help teammates make sound technical tradeoffs.
Represent engineering internally and externally. Communicate technical strategy clearly to non-engineering stakeholders and represent the team in cross-team planning.

View Full Description & ApplyYou'll be redirected to the employer's site