Senior Site Reliability Engineer (SRE)

New
S
SleekFinTech, SaaS
India. VietnamFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
6+ years
Required Skills
AWSNode.jsPythonGCPKubernetesAzurePrometheusCI/CDTerraform

Requirements

  • 6+ years of progressive experience in Site Reliability Engineering (SRE).
  • 6+ years of hands-on experience across multi-cloud environments (AWS, GCP, Azure).
  • 6+ years of deep expertise in containerization and orchestration (e.g., Kubernetes, EKS, ECS).
  • 6+ years of extensive experience with Infrastructure as Code (e.g., Terraform, Pulumi, CloudFormation).
  • Proven ability to design and operate highly reliable production systems with zero-downtime deployment patterns.
  • Expertise in GitOps practices (e.g., ArgoCD, Flux) and building self-service developer platforms.
  • Experience managing multi-cloud API Gateways and edge routing solutions (e.g., Kong, Traefik, Cloudflare).
  • Strong background in platform security, IAM, and runtime hardening (e.g., Falco, eBPF).
  • Practical experience with observability stacks (e.g., Prometheus, OpenTelemetry, ELK).
  • Familiarity with AI/ML infrastructure requirements including model inference and GPU workloads.
  • Familiarity with modern programming languages like Node.js, NestJS, and Python.

Responsibilities

  • Conduct a full review of infrastructure and propose a roadmap for reliability and scalability.
  • Lead upgrades or redesigns of core platform components including networking, containers, orchestration, or databases.
  • Build or refine pipelines for AI model hosting, embeddings, and vector search services.
  • Enhance CI/CD pipelines and infrastructure automation to increase engineering velocity.
  • Strengthen logging, monitoring, tracing, and alerting across all services.
  • Implement automated security scanning, secrets management, and access controls.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now