Senior AI Platform Engineer
New
BrazilFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- DockerPythonCloud ComputingKubernetesCI/CDTerraformHelmMLOps
Requirements
- Strong hands-on experience with cloud platforms (AWS, Azure, or GCP)
- Solid experience with Kubernetes, containers (Docker), Helm, and cloud-managed services
- Proven experience with CI/CD tools such as GitHub Actions, GitLab CI, Azure DevOps, or Jenkins
- Strong knowledge of Infrastructure as Code tools such as Terraform, Pulumi, or CloudFormation
- Experience implementing observability solutions (logging, monitoring, tracing, dashboards, alerting)
- Solid understanding of networking, load balancing, authentication, authorization, and high availability architectures
- Experience operating production systems, handling incidents, and improving system reliability
- Ability to document technical standards, architecture decisions, and platform practices clearly
- Experience with MLOps/LLMOps, GPU workloads, or AI platforms is a strong plus
- Familiarity with tools such as vLLM, Triton, NVIDIA NIM, MLflow, Kubeflow, or Ray is a plus
Responsibilities
- Design, build, and evolve cloud-native and Kubernetes-based environments for AI workloads, APIs, services, and data pipelines
- Develop and maintain CI/CD pipelines with strong focus on security, traceability, standardization, and deployment efficiency
- Implement Infrastructure as Code practices to automate provisioning and management of cloud resources
- Define and implement observability standards, including logging, metrics, tracing, alerting, and platform health monitoring
- Support AI workloads such as inference services, embeddings, agent orchestration, and model integration components
- Ensure robust security practices including identity and access management, secrets handling, and environment segregation
- Drive platform reliability through performance tuning, scalability improvements, resiliency, and cost optimization (FinOps)
- Create reusable architectural patterns for deployment, monitoring, infrastructure, and operational workflows
- Collaborate with AI, Data, Product, and Engineering teams to translate needs into scalable platform capabilities
View Full Description & ApplyYou'll be redirected to the employer's site