Site Reliability Engineer, Infra (Americas)

Posted about 2 months agoViewed

AmericasFull-TimeEmail Platform

Company:Resend

Location:Americas, EST, PST

Languages:English

Seniority level:Senior, 4+ years

Experience:4+ years

Skills:

AWSNode.jsGrafanaPostgresCI/CDLinuxDevOpsTerraformMicroservices

Requirements:

4+ years of experience in Site Reliability, Platform, or Infrastructure Engineering Fluent in writing and speaking English Strong experience with observability and monitoring tools (Datadog, Grafana, OpenTelemetry) Understand distributed systems: queues, workers, caching, databases, networking Write automation and tooling in Node.js Comfortable designing systems with safety and fail-safe operations in mind Comfortable working across the stack Care deeply about incident management, postmortems, and continuous improvement

Responsibilities:

Evolve and shape on-call processes Build automation for recovery, scaling, and self-healing systems Improve observability across the stack Define and track SLOs for core systems Collaborate with engineering teams to design for reliability Codify playbooks, postmortems, and reliability standards Work with infrastructure spanning AWS, queues, databases, and workers