Senior Site Reliability Engineer
New
T
Trigger.devDeveloper Tools
Remote or Hybrid UKFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSPostgreSQLKubernetesGoPrometheusRedisLinuxTerraformDistributed Systems
Requirements
- Strong observability background with OpenTelemetry or Prometheus.
- Experience operating distributed systems with non-trivial failure modes.
- Cloud-native proficiency with Kubernetes, Argo, Crossplane, or eBPF.
- Experience managing self-hosted Kubernetes in production.
- Proven performance debugging and scaling instincts.
- Proficiency in Terraform for infrastructure as code.
- Knowledge of security principles for multi-tenant environments.
- Expertise with Postgres and Redis at scale.
- Experience with Go and Linux environments.
- Experience with AWS cloud infrastructure.
Responsibilities
- Own observability across the platform and extend OpenTelemetry instrumentation.
- Design and operate distributed systems primitives under production load.
- Architect and tune auto-scaling infrastructure.
- Identify and resolve bottlenecks from application to kernel levels.
- Harden security posture including sandbox isolation and secrets management.
- Manage infrastructure as code using Terraform.
- Develop and improve runtime internals including cold-start optimization and distributed storage.
- Design and execute on-call practices including SLOs and postmortems.
View Full Description & ApplyYou'll be redirected to the employer's site