Senior Site Reliability Engineer (SRE)
New
S
SleekFinTech, SaaS
India. VietnamFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years
- Required Skills
- AWSNode.jsPythonGCPKubernetesAzurePrometheusCI/CDTerraform
Requirements
- 6+ years of progressive experience in Site Reliability Engineering (SRE).
- 6+ years of hands-on experience across multi-cloud environments (AWS, GCP, Azure).
- 6+ years of deep expertise in containerization and orchestration (e.g., Kubernetes, EKS, ECS).
- 6+ years of extensive experience with Infrastructure as Code (e.g., Terraform, Pulumi, CloudFormation).
- Proven ability to design and operate highly reliable production systems with zero-downtime deployment patterns.
- Expertise in GitOps practices (e.g., ArgoCD, Flux) and building self-service developer platforms.
- Experience managing multi-cloud API Gateways and edge routing solutions (e.g., Kong, Traefik, Cloudflare).
- Strong background in platform security, IAM, and runtime hardening (e.g., Falco, eBPF).
- Practical experience with observability stacks (e.g., Prometheus, OpenTelemetry, ELK).
- Familiarity with AI/ML infrastructure requirements including model inference and GPU workloads.
- Familiarity with modern programming languages like Node.js, NestJS, and Python.
Responsibilities
- Conduct a full review of infrastructure and propose a roadmap for reliability and scalability.
- Lead upgrades or redesigns of core platform components including networking, containers, orchestration, or databases.
- Build or refine pipelines for AI model hosting, embeddings, and vector search services.
- Enhance CI/CD pipelines and infrastructure automation to increase engineering velocity.
- Strengthen logging, monitoring, tracing, and alerting across all services.
- Implement automated security scanning, secrets management, and access controls.
View Full Description & ApplyYou'll be redirected to the employer's site