AI/MLOps Engineer
New
9
99x Brazil (formerly Nextly)AI/ML infrastructure
São Paulo, State of São Paulo, BrazilFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSPythonAzureGrafanaPrometheusDevOpsDatadogMLOps
Requirements
- Experience with DevOps, SRE, MLOps, or AI infrastructure engineering
- Strong understanding of monitoring and observability concepts
- Hands-on experience with tools such as Datadog, CloudWatch, Grafana, Prometheus, or similar
- Experience supporting AI/ML or LLM-based applications in production
- Familiarity with prompt engineering, model evaluation, and experimentation workflows
- Knowledge of cloud platforms such as AWS, Azure, or Google Cloud
- Experience troubleshooting distributed systems and production pipelines
- Proficiency in Python, scripting, or automation tooling
- Strong analytical and problem-solving skills
- Excellent communication and collaboration abilities
Responsibilities
- Design and maintain monitoring and observability solutions for AI applications and ML pipelines
- Track logs, metrics, and traces using tools such as CloudWatch, Datadog, or similar platforms
- Develop evaluation and testing frameworks for prompts, models, and AI workflows
- Perform regression testing and quality validation for LLM-based systems
- Manage prompt experimentation, versioning, and A/B testing processes
- Debug AI workflows, including model outputs, orchestration pipelines, and infrastructure failures
- Support deployment, scaling, and maintenance of AI/ML infrastructure in production environments
- Collaborate with engineering and product teams to improve system reliability and performance
- Analyze production data and user feedback to drive continuous improvement of AI systems
- Contribute to operational best practices, documentation, and incident response processes
View Full Description & ApplyYou'll be redirected to the employer's site