DevOps / Platform Engineer - Cloud/AI Infrastructure

New
BrazilFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
6+ years
Required Skills
AWSPythonBashGCPJenkinsKubernetesMLFlowTypeScriptAzureGoGrafanaPrometheusTerraformGitHub ActionsAzure DevOpsDatadogCloudFormationMLOps

Requirements

  • 6+ years of experience in DevOps, SRE, or Platform Engineering roles, including hands-on work with production systems.
  • Strong experience with Infrastructure as Code tools such as Terraform, Pulumi, or CloudFormation.
  • Proven experience managing Kubernetes environments (EKS, AKS, or GKE) including deployment, scaling, and troubleshooting.
  • Deep knowledge of CI/CD pipelines using tools such as GitHub Actions, GitLab CI, Jenkins, or Azure DevOps.
  • Hands-on experience with at least two major cloud providers (AWS, Azure, GCP).
  • Strong scripting and automation skills (Python, Bash, or similar), plus familiarity with Go or TypeScript.
  • Experience building observability stacks using tools like Prometheus, Grafana, Datadog, ELK, or OpenTelemetry.
  • Strong understanding of cloud security, IAM, networking, and compliance best practices.
  • Experience with GitOps workflows (ArgoCD, Flux, or equivalent).
  • Exposure to MLOps concepts such as model serving, MLflow, or GPU-based infrastructure is highly valued.
  • Strong communication skills, fluency in English, and experience working in international teams.
  • Ability to mentor engineers and contribute as a technical leader is a strong plus.

Responsibilities

  • Design, build, and maintain end-to-end CI/CD pipelines for applications, infrastructure, data workflows, and AI/ML models with automated testing and security gates.
  • Develop and operate cloud-agnostic infrastructure across AWS, Azure, and GCP using Infrastructure as Code tools such as Terraform or equivalent.
  • Manage Kubernetes-based environments for scalable workloads, including AI model serving and containerized applications.
  • Build and support MLOps infrastructure including model training environments, deployment pipelines, registries, and monitoring systems.
  • Implement robust observability solutions (metrics, logs, traces) and ensure full system visibility across services and AI workloads.
  • Design secure infrastructure with strong IAM, secrets management, network policies, and compliance with industry regulations (SOC 2, ISO 27001, GDPR).
  • Improve developer experience by building internal platforms, self-service tools, and automation workflows.
  • Support reliability engineering through SLOs, incident management, disaster recovery planning, and chaos engineering practices.
  • Collaborate with engineering and product teams to ensure scalable, efficient, and secure delivery pipelines.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now