Senior SRE (Site Reliability Engineer 3) - Blockchain Networks Launch Team

New

P2P. orgBlockchain

Remote EUFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: English B2 minimum
Experience: 5+ years
Required Skills: AWSPythonGCPKubernetesAzureGoGrafanaPrometheusLinuxTerraform

Requirements

5+ years of experience in SRE, DevOps, or infrastructure engineering
Strong experience operating production systems at scale
Hands-on experience with Kubernetes (deployment, troubleshooting, operations)
Hands-on experience with Terraform (infrastructure as code)
Hands-on experience with Linux systems and networking fundamentals
Experience with at least one cloud provider (GCP preferred, AWS, Azure, OCI)
Experience with observability tooling (Prometheus, Grafana, Loki, or similar)
Familiarity with CI/CD systems and GitOps workflows (e.g., ArgoCD)
Solid scripting or programming skills (Go, Python, or similar)
Experience working in distributed systems or high-availability environments
Strong debugging and problem-solving skills under pressure
Good communication skills (English B2 minimum)

Responsibilities

Lead the end-to-end launch of new blockchain networks—from testnet to mainnet
Design and implement deployment architectures for validators, full nodes, RPCs, and supporting services
Ensure all new networks meet production readiness standards—monitoring, alerting, backups, failover, and security
Collaborate with protocol teams to understand network-specific requirements, risks, and failure modes
Create repeatable launch patterns and runbooks to reduce time-to-market for new networks
Build and operate infrastructure across cloud and bare-metal environments
Improve automation and standardisation of deployments using Terraform, Helm, and internal tooling
Contribute to the internal platform by aligning launches with existing Kubernetes, observability, and delivery standards
Implement high-availability and fault-tolerant setups for validator infrastructure
Continuously improve SLOs, SLIs, and alerting for newly launched networks
Ensure all services are fully observable—metrics, logs, and traces
Define and implement alerts that are actionable and low-noise
Participate in on-call rotations and incident response
Lead or contribute to post-incident reviews, focusing on systemic improvements
Proactively identify and fix reliability risks before they impact production
Apply security best practices to all deployments—secrets management, access control, and network isolation
Ensure compliance with internal standards and contribute to SOC 2-aligned practices
Support secure key management practices for validator infrastructure
Work closely with Infrastructure, Core Networks, and Security teams
Take ownership of deliverables - from design to production
Contribute to documentation, runbooks, and knowledge sharing
Support and mentor more junior engineers when needed

View Full Description & ApplyYou'll be redirected to the employer's site