4+ years of experience in Site Reliability, Platform, or Infrastructure Engineering Fluent in writing and speaking English Strong experience with observability and monitoring tools (Datadog, Grafana, OpenTelemetry) Understand distributed systems: queues, workers, caching, databases, networking Write automation and tooling in Node.js Comfortable designing systems with safety and fail-safe operations in mind Comfortable working across the stack Care deeply about incident management, postmortems, and continuous improvement