Senior Site Reliability Engineer

Posted 7 months agoInactiveViewed

190000.0 - 220000.0 USD per year

United States, CanadaFull-TimeSoftware Development

Company:

Location:United States, Canada

Languages:English

Seniority level:Senior, 5+ years

Experience:5+ years

Skills:

AWSDockerNode.jsPostgreSQLPythonBashKafkaMongoDBReact NativeTypeScriptVue.JsNest.jsReactCI/CDLinuxDevOpsTerraformJSON

Requirements:

5+ years running production workloads on AWS (or GCP/Azure) with infrastructure-as-code (Terraform/CDK/CloudFormation) Hands-on experience operating container orchestration (ECS, EKS, Kubernetes, Nomad, etc.) and designing blue/green or canary rollouts Depth in at least two of our core datastores (Postgres, MongoDB, Kafka) including backup/restore, upgrades, and performance tuning Fluency with CI/CD pipelines (we use Buildkite + GitHub Actions) and a knack for automating everything with shell, Python, or TypeScript Proven track record setting up monitoring/alerting in Datadog, Prometheus, or similar, with clear SLO/SLA ownership Strong grasp of linux networking, load balancing (Cloudflare/ELB), and CDN/edge-security concepts Excellent incident-management and root-cause analysis skills; able to write crisp RCAs and follow through on action items Passion for customer-centric thinking, rapid iteration, and continuous learning

Responsibilities:

Set SLOs/SLIs, build self-healing architectures, and drive incident-prevention projects that keep our APIs and real-time ordering flows <100 ms p95. Level-up dashboards, alerts, and distributed tracing so teams can detect issues before customers do. Evolve our Buildkite pipelines and Terraform modules to give engineers <10-minute, one-click rollouts (and clean rollbacks). Harden infra with least-privilege IAM, threat-model topology changes, and guide SOC 2 / PCI efforts. Tune Postgres for multi-TB workloads, maintain Mongo sharding, and shepherd Kafka topic management as event volume climbs. Rotate with the on-call SREs, run blameless post-mortems, and convert findings into durable fixes. Pair with product engineers on capacity reviews, guide junior devs on Docker best-practices, and evangelize “you build it, you run it.”

Similar Jobs:

Posted 17 minutes ago

WorldwideFull-TimeCrypto Trading

AWSBackend DevelopmentLeadership+12 more

Senior Site Reliability Engineer

Requirements:

Responsibilities:

Similar Jobs:

Senior Full Stack Engineer - Crypto Trading

Senior Software Engineer (IC3)

Senior Software Engineer (IC4)

Similar Jobs