Senior Site Reliability Engineer, Observability

New
W
WebflowDigital Experience Platform
Argentina RemoteFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
5+ years
Required Skills
Node.jsKubernetesTypeScriptGoTerraformDatadogDistributed Systems

Requirements

  • 5+ years of experience building, maintaining, and debugging distributed systems.
  • Hands-on experience with observability platforms (Datadog, Grafana, Prometheus, ElasticSearch).
  • Experience with OpenTelemetry or similar instrumentation frameworks.
  • Experience defining and operationalizing SLOs/SLIs at scale.
  • Experience navigating and scaling cloud environments (AWS or GCP).
  • Experience with container-centric architectures (Docker, Kubernetes, ECS).
  • Experience with infrastructure-as-code tools (Terraform or Pulumi).
  • Experience with full-stack applications (React, Node.js, MongoDB, PostgreSQL).
  • Business-level fluency in English.
  • BS / BA college degree or relevant experience.

Responsibilities

  • Own and evolve Webflow's observability stack, including OpenTelemetry, and Datadog.
  • Drive adoption of SLOs, distributed tracing, and structured logging throughout engineering.
  • Build and maintain AI-powered agents and automation to accelerate incident resolution.
  • Guide and empower engineers on other teams to instrument services effectively.
  • Participate in and continuously improve on-call and incident response processes.
  • Reduce toil by automating common observability workflows.
  • Partner with engineering teams to improve observability practices.
  • Debug production behavior in TypeScript, Node, or Go.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now