Lead Site Reliability Engineer, Observability (Remote, North America)

Posted 3 months agoViewed
North AmericaFull-TimeAI Sales
Company:Vivun Inc.
Location:North America, EST, PST
Languages:English
Seniority level:Lead, 6+ years
Experience:6+ years
Skills:
LeadershipNode.jsPythonSoftware DevelopmentCloud ComputingGrafanaPrometheusCI/CDDevOps
Requirements:
6+ years of experience in SRE, DevOps, or Observability Engineering roles At least 2+ years leading or designing observability initiatives Deep knowledge of observability tooling (e.g., OpenTelemetry, Prometheus, Grafana, Datadog, Honeycomb, Observe) Experience with distributed tracing practices Experience with Agentic / LLM-based systems (e.g., LangChain, Celery, OpenAI APIs) Strong understanding of instrumenting, tracing, and correlating AI/LLM workflows Proven ability to define cross-team standards, influence engineering culture, and establish scalable monitoring patterns Strong collaboration and communication skills
Responsibilities:
Own the end-to-end observability strategy Design and implement correlation models Unify observability tooling Collaborate with engineering and QA on best practices Establish enablement frameworks Partner with teammates on reliability and incident response Contribute to performance and reliability strategy
About the Company
Vivun Inc.
View Company Profile
Similar Jobs:
Posted 7 months ago
United States, Eastern time zones, Central time zonesFull-TimeSoftware Development
Site Reliability Engineer (Senior or Staff), Observability
Company:MongoDB
Posted 5 months ago
United StatesFull-TimeSoftware Development
Lead Site Reliability Engineer (SRE)
Company:Mattermost
Posted about 1 year ago
United States, CanadaFull-TimeSports Technology
Lead Site Reliability Engineer (Wrocław)
Company: