Senior Site Reliability Engineer

New
T
Trigger.devDeveloper Tools
Remote or Hybrid UKFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSPostgreSQLKubernetesGoPrometheusRedisLinuxTerraformDistributed Systems

Requirements

  • Strong observability background with OpenTelemetry or Prometheus.
  • Experience operating distributed systems with non-trivial failure modes.
  • Cloud-native proficiency with Kubernetes, Argo, Crossplane, or eBPF.
  • Experience managing self-hosted Kubernetes in production.
  • Proven performance debugging and scaling instincts.
  • Proficiency in Terraform for infrastructure as code.
  • Knowledge of security principles for multi-tenant environments.
  • Expertise with Postgres and Redis at scale.
  • Experience with Go and Linux environments.
  • Experience with AWS cloud infrastructure.

Responsibilities

  • Own observability across the platform and extend OpenTelemetry instrumentation.
  • Design and operate distributed systems primitives under production load.
  • Architect and tune auto-scaling infrastructure.
  • Identify and resolve bottlenecks from application to kernel levels.
  • Harden security posture including sandbox isolation and secrets management.
  • Manage infrastructure as code using Terraform.
  • Develop and improve runtime internals including cold-start optimization and distributed storage.
  • Design and execute on-call practices including SLOs and postmortems.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now