Senior Software Engineer, Cloud Development
New
CanadaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 4–6+ years
- Required Skills
- PythonGCPKubernetesGrafanaTerraformHelmDistributed Systems
Requirements
- Bachelor’s degree with 4–6+ years of relevant experience, or equivalent hands-on production systems experience.
- Strong Python development skills with experience building maintainable services, libraries, and CLIs.
- Proven experience running production workloads in cloud environments (GCP preferred) and managing infrastructure at scale.
- Deep knowledge of Kubernetes and Helm, including multi-environment deployments and progressive rollouts.
- Experience with infrastructure-as-code tools such as Terraform for provisioning and managing cloud resources.
- Strong understanding of distributed systems, API design, and production-grade service reliability.
- Familiarity with observability tools (e.g., Grafana) and debugging performance or reliability issues in complex systems.
- Excellent communication skills and experience collaborating across engineering, product, and infrastructure teams.
- On-call and incident response experience in production environments.
Responsibilities
- Design, build, and operate scalable platform services and APIs that support production AI and backend workloads.
- Own service reliability end-to-end, improving availability, latency, scalability, and cost efficiency across distributed systems.
- Develop and optimize Kubernetes-based infrastructure, including deployment pipelines, environment configuration, and resource management.
- Improve service lifecycle practices such as packaging, versioning, testing, validation, and automated deployments.
- Implement observability systems (metrics, logging, tracing, alerting) to strengthen operational visibility and incident response.
- Collaborate with cross-functional teams to deliver secure, scalable, and privacy-respecting platform capabilities.
- Participate in architectural discussions, operational processes, on-call rotations, and incident postmortems while mentoring peers.
View Full Description & ApplyYou'll be redirected to the employer's site