Site Reliability Consultant
New
CanadaFull-TimeSenior
Salary90,000 - 100,000 CAD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- DockerPythonGCPKubernetesGoGrafanaPrometheusLinuxTerraform
Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or infrastructure engineering roles.
- Strong hands-on experience with Google Cloud Platform and Infrastructure-as-Code tools such as Terraform.
- Deep understanding of Kubernetes, Docker, microservices architectures, and service mesh concepts.
- Strong Linux systems administration skills with experience in networking, PKI, and distributed system troubleshooting.
- Proficiency in scripting and automation using Python, Shell, and ideally Go.
- Experience building and maintaining observability and monitoring systems in production environments.
- Strong incident management experience, including root cause analysis and postmortem practices.
- Solid understanding of scalability, reliability engineering principles, and automation-first thinking.
- Strong communication and collaboration skills in cross-functional engineering environments.
Responsibilities
- Operate, optimize, and troubleshoot Kubernetes clusters, service mesh environments (Istio), and Linux-based systems.
- Design and implement automation using Go, Python, and Shell scripting to reduce manual operational workload.
- Build and maintain observability stacks using tools such as Prometheus, Grafana, and Loki for monitoring and alerting.
- Diagnose and resolve complex issues across networking, storage, compute, and application performance layers.
- Support AI/ML workloads by ensuring infrastructure readiness for training pipelines and data-intensive processing.
- Participate in on-call rotations, incident response, and postmortem analysis to improve system reliability.
- Collaborate with engineering teams to implement infrastructure-as-code practices using Terraform and cloud-native tools.
View Full Description & ApplyYou'll be redirected to the employer's site