- Build and operate the ML lifecycle platform including experiment tracking, model registry, and artifact management.
- Own CI/CD and automated deployment pipelines for moving models from notebooks to production.
- Ensure model observability, reliability, latency monitoring, and drift detection.
- Manage containerized workloads on Kubernetes and codify infrastructure using Terraform.
- Implement infrastructure-level governance including access controls and deployment policies.
- Provide technical guidance and mentorship to junior engineers and define repeatable MLOps practices.