DevOps / Platform Engineer - AI/ML

New
BrazilContractSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
6+ years
Required Skills
AWSPythonBashGCPJenkinsKubernetesAzureGrafanaPrometheusTerraformGitHub ActionsAzure DevOpsDatadog

Requirements

  • 6+ years of experience in DevOps, SRE, or platform engineering roles
  • At least 2 years supporting AI/ML production workloads
  • Strong expertise in infrastructure-as-code tools such as Terraform
  • Knowledge of cloud-native alternatives (Pulumi, CloudFormation, Bicep)
  • Hands-on experience with Kubernetes (EKS, AKS, or GKE), including cluster operations, scaling, and troubleshooting
  • Solid experience designing and maintaining CI/CD pipelines using tools such as GitHub Actions, GitLab CI, Azure DevOps, or Jenkins
  • Strong cloud experience across at least two major providers (AWS, Azure, GCP), including networking, compute, storage, and IAM
  • Proficiency in scripting and automation using Python, Bash, or PowerShell
  • Familiarity with Go or TypeScript is a plus
  • Experience with observability stacks (Prometheus, Grafana, Datadog, ELK, OpenTelemetry) and incident management tools
  • Strong understanding of security best practices, including secrets management, IAM, container security, and compliance automation
  • Experience with GitOps workflows (ArgoCD, Flux) and modern platform engineering practices
  • Excellent communication skills and ability to work effectively in global, multicultural teams

Responsibilities

  • Design, implement, and maintain end-to-end CI/CD pipelines supporting applications, infrastructure-as-code, data pipelines, and AI/ML model deployments.
  • Build and manage cloud-agnostic infrastructure across AWS, Azure, and/or GCP, ensuring scalability, portability, and reliability.
  • Develop and operate Kubernetes-based environments for containerized workloads, including microservices and AI model serving systems.
  • Implement Infrastructure-as-Code solutions using Terraform or similar tools to create reusable, modular, and secure infrastructure components.
  • Lead MLOps initiatives, including model training environments, deployment pipelines, model registries, and production monitoring systems.
  • Establish observability, reliability, and security practices, including monitoring, logging, alerting, and incident response frameworks.
  • Design and support internal developer platforms to improve self-service infrastructure provisioning and developer experience.
  • Drive cloud cost optimization, security automation, and compliance alignment across distributed environments.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now