DevOps / Platform Engineer - Cloud/AI Infrastructure

New

BrazilFull-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: English
Experience: 6+ years
Required Skills: AWSPythonBashGCPJenkinsKubernetesMLFlowTypeScriptAzureGoGrafanaPrometheusTerraformGitHub ActionsAzure DevOpsDatadogCloudFormationMLOps

6+ years of experience in DevOps, SRE, or Platform Engineering roles, including hands-on work with production systems.
Strong experience with Infrastructure as Code tools such as Terraform, Pulumi, or CloudFormation.
Proven experience managing Kubernetes environments (EKS, AKS, or GKE) including deployment, scaling, and troubleshooting.
Deep knowledge of CI/CD pipelines using tools such as GitHub Actions, GitLab CI, Jenkins, or Azure DevOps.
Hands-on experience with at least two major cloud providers (AWS, Azure, GCP).
Strong scripting and automation skills (Python, Bash, or similar), plus familiarity with Go or TypeScript.
Experience building observability stacks using tools like Prometheus, Grafana, Datadog, ELK, or OpenTelemetry.
Strong understanding of cloud security, IAM, networking, and compliance best practices.
Experience with GitOps workflows (ArgoCD, Flux, or equivalent).
Exposure to MLOps concepts such as model serving, MLflow, or GPU-based infrastructure is highly valued.
Strong communication skills, fluency in English, and experience working in international teams.
Ability to mentor engineers and contribute as a technical leader is a strong plus.

Design, build, and maintain end-to-end CI/CD pipelines for applications, infrastructure, data workflows, and AI/ML models with automated testing and security gates.
Develop and operate cloud-agnostic infrastructure across AWS, Azure, and GCP using Infrastructure as Code tools such as Terraform or equivalent.
Manage Kubernetes-based environments for scalable workloads, including AI model serving and containerized applications.
Build and support MLOps infrastructure including model training environments, deployment pipelines, registries, and monitoring systems.
Implement robust observability solutions (metrics, logs, traces) and ensure full system visibility across services and AI workloads.
Design secure infrastructure with strong IAM, secrets management, network policies, and compliance with industry regulations (SOC 2, ISO 27001, GDPR).
Improve developer experience by building internal platforms, self-service tools, and automation workflows.
Support reliability engineering through SLOs, incident management, disaster recovery planning, and chaos engineering practices.
Collaborate with engineering and product teams to ensure scalable, efficient, and secure delivery pipelines.

View Full Description & ApplyYou'll be redirected to the employer's site