Site Reliability Engineer, Infra (Europe)

Posted about 2 months agoViewed

EuropeFull-TimeEmail Platform

Company:Resend

Location:Europe

Languages:English

Seniority level:Middle, 4+ years

Experience:4+ years

Skills:

AWSNode.jsKubernetesAmazon Web ServicesGrafanaCI/CDLinuxDevOpsProblem SolvingSoftware EngineeringTroubleshooting

Requirements:

4+ years of experience in Site Reliability, Platform, or Infrastructure Engineering Strong experience with observability and monitoring tools (Datadog, Grafana, OpenTelemetry) Understand distributed systems: queues, workers, caching, databases, networking Write automation and tooling in Node.js Know how to design systems with safety and fail-safe operations in mind Comfortable working across the stack Care deeply about incident management, postmortems, and continuous improvement

Responsibilities:

Evolve and shape on-call processes Build automation for recovery, scaling, and self-healing systems Improve observability across the stack Define and track SLOs for core systems Collaborate closely with engineering teams to design for reliability Codify playbooks, postmortems, and reliability standards Work with infrastructure spanning AWS, queues, databases, and workers