Senior Site Reliability Engineer, Observability

New

WebflowDigital Experience Platform

Argentina RemoteFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: English
Experience: 5+ years
Required Skills: Node.jsKubernetesTypeScriptGoTerraformDatadogDistributed Systems

5+ years of experience building, maintaining, and debugging distributed systems.
Hands-on experience with observability platforms (Datadog, Grafana, Prometheus, ElasticSearch).
Experience with OpenTelemetry or similar instrumentation frameworks.
Experience defining and operationalizing SLOs/SLIs at scale.
Experience navigating and scaling cloud environments (AWS or GCP).
Experience with container-centric architectures (Docker, Kubernetes, ECS).
Experience with infrastructure-as-code tools (Terraform or Pulumi).
Experience with full-stack applications (React, Node.js, MongoDB, PostgreSQL).
Business-level fluency in English.
BS / BA college degree or relevant experience.

Own and evolve Webflow's observability stack, including OpenTelemetry, and Datadog.
Drive adoption of SLOs, distributed tracing, and structured logging throughout engineering.
Build and maintain AI-powered agents and automation to accelerate incident resolution.
Guide and empower engineers on other teams to instrument services effectively.
Participate in and continuously improve on-call and incident response processes.
Reduce toil by automating common observability workflows.
Partner with engineering teams to improve observability practices.
Debug production behavior in TypeScript, Node, or Go.

View Full Description & ApplyYou'll be redirected to the employer's site