Write high-quality, maintainable software — primarily in Python Strong background in scalable infrastructure, including containerization and orchestration (e.g. Docker, Kubernetes) Experience with Infrastructure-as-code and deployment (e.g. Terraform, CI/CD pipelines) Experience with Monitoring and logging frameworks (e.g. Datadog, Prometheus, OpenTelemetry) Understand and implement ML Ops best practices, including model versioning and rollback strategies Experience with Automated evaluation and drift detection Experience with Scalable model and agent serving infrastructure (e.g. vLLM, Triton, BentoML) Deploy and maintain LLM and agentic workflows in production Experience with Monitoring cost, latency, and performance Experience with Capturing traces for analysis and debugging Experience with Optimizing prompt/response flows with real-time data access Strong ownership and pragmatism