Senior Site Reliability Engineer, Observability
New
W
WebflowDigital Experience Platform
Argentina RemoteFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 5+ years
- Required Skills
- Node.jsKubernetesTypeScriptGoTerraformDatadogDistributed Systems
Requirements
- 5+ years of experience building, maintaining, and debugging distributed systems.
- Hands-on experience with observability platforms (Datadog, Grafana, Prometheus, ElasticSearch).
- Experience with OpenTelemetry or similar instrumentation frameworks.
- Experience defining and operationalizing SLOs/SLIs at scale.
- Experience navigating and scaling cloud environments (AWS or GCP).
- Experience with container-centric architectures (Docker, Kubernetes, ECS).
- Experience with infrastructure-as-code tools (Terraform or Pulumi).
- Experience with full-stack applications (React, Node.js, MongoDB, PostgreSQL).
- Business-level fluency in English.
- BS / BA college degree or relevant experience.
Responsibilities
- Own and evolve Webflow's observability stack, including OpenTelemetry, and Datadog.
- Drive adoption of SLOs, distributed tracing, and structured logging throughout engineering.
- Build and maintain AI-powered agents and automation to accelerate incident resolution.
- Guide and empower engineers on other teams to instrument services effectively.
- Participate in and continuously improve on-call and incident response processes.
- Reduce toil by automating common observability workflows.
- Partner with engineering teams to improve observability practices.
- Debug production behavior in TypeScript, Node, or Go.
View Full Description & ApplyYou'll be redirected to the employer's site