Senior Network Site Reliability Engineer
New
Spain; Flexible remote work options across Europe.Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonGoCI/CDLinux
Requirements
- Strong experience in Site Reliability Engineering, Network Engineering, or Infrastructure Engineering roles within large-scale production environments.
- Solid Linux systems administration expertise and proven ability to troubleshoot complex distributed systems.
- Strong understanding of networking fundamentals, including failure domains, latency, packet loss, control plane/data plane concepts, and high-availability architectures.
- Hands-on experience operating and improving reliable production systems through automation and engineering best practices.
- Proficiency in software development or scripting using Go, Python, or similar programming languages.
- Experience with infrastructure-as-code, CI/CD pipelines, containerized environments, and operational automation tools.
- Familiarity with observability, telemetry, monitoring systems, and incident management practices.
- Ability to work collaboratively across engineering teams while maintaining strong ownership and communication skills.
Responsibilities
- Define and manage reliability objectives for critical network services, including SLIs, SLOs, availability targets, and operational performance standards.
- Lead initiatives to improve overall network reliability across infrastructure, inter-site connectivity, and operational workflows.
- Own incident response processes for networking environments, conduct root cause investigations, and implement long-term corrective solutions.
- Design and enhance observability systems through metrics, logging, tracing, alerting, and monitoring improvements to accelerate troubleshooting and recovery.
- Build and maintain automation, CI/CD pipelines, testing environments, rollback mechanisms, and safe deployment processes for network changes.
- Collaborate with platform engineering and infrastructure teams to improve operability, scalability, and reliability of networking systems.
- Develop tooling and automation solutions using modern programming languages and infrastructure management practices.
- Support operational readiness and scalability initiatives for high-availability and high-throughput networking environments.
View Full Description & ApplyYou'll be redirected to the employer's site