Sr./Staff Infrastructure/Site Reliability Engineer

New
O
OscilarAI Fintech
Remote, Remote - CanadaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSPythonKafkaKubernetesClickhouseGoCI/CDTerraformMicroservices

Requirements

  • Proven track record as a senior SRE or Infrastructure Engineer in high-scale environments
  • Expert-level skills in AWS
  • Expert-level skills in Infrastructure as Code (Pulumi, Terraform)
  • Strong programming ability in Go or Python
  • Deep understanding of distributed systems (Kafka, ClickHouse) and microservices architecture
  • Mastery of container orchestration (Kubernetes) and production debugging
  • Strong sense of ownership, and the judgment to balance velocity with reliability

Responsibilities

  • Architect and operate resilient cloud infrastructure (AWS, Pulumi, Kubernetes)
  • Lead initiatives to improve availability, latency, and performance at scale
  • Design and evolve our CI/CD pipelines to optimize for speed, safety, and repeatability
  • Define the metrics, alerts, and runbooks that form our observability backbone
  • Run chaos experiments and failure simulations to harden the platform
  • Mentor engineers and set best practices for SRE across the company
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now