AI/MLOps Engineer

New
9
99x Brazil (formerly Nextly)AI/ML infrastructure
São Paulo, State of São Paulo, BrazilFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSPythonAzureGrafanaPrometheusDevOpsDatadogMLOps

Requirements

  • Experience with DevOps, SRE, MLOps, or AI infrastructure engineering
  • Strong understanding of monitoring and observability concepts
  • Hands-on experience with tools such as Datadog, CloudWatch, Grafana, Prometheus, or similar
  • Experience supporting AI/ML or LLM-based applications in production
  • Familiarity with prompt engineering, model evaluation, and experimentation workflows
  • Knowledge of cloud platforms such as AWS, Azure, or Google Cloud
  • Experience troubleshooting distributed systems and production pipelines
  • Proficiency in Python, scripting, or automation tooling
  • Strong analytical and problem-solving skills
  • Excellent communication and collaboration abilities

Responsibilities

  • Design and maintain monitoring and observability solutions for AI applications and ML pipelines
  • Track logs, metrics, and traces using tools such as CloudWatch, Datadog, or similar platforms
  • Develop evaluation and testing frameworks for prompts, models, and AI workflows
  • Perform regression testing and quality validation for LLM-based systems
  • Manage prompt experimentation, versioning, and A/B testing processes
  • Debug AI workflows, including model outputs, orchestration pipelines, and infrastructure failures
  • Support deployment, scaling, and maintenance of AI/ML infrastructure in production environments
  • Collaborate with engineering and product teams to improve system reliability and performance
  • Analyze production data and user feedback to drive continuous improvement of AI systems
  • Contribute to operational best practices, documentation, and incident response processes
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now