- Architect and scale multi-agent systems using LangGraph, AutoGen, or Semantic Kernel.
- Implement persistent state management and deterministic fallback logic for autonomous agents.
- Design and manage a centralized AI Gateway (using Azure APIM) to handle request routing, rate limiting, and cost-attribution across different business units.
- Provision and manage Azure AI resources (Foundry, Search, CosmosDB) using Terraform or Bicep to ensure reproducible environments.
- Implement end-to-end distributed tracing for LLM calls using tools like Langfuse, Arize Phoenix, or LangSmith integrated with Azure Monitor/Datadog.
- Build automated "Evaluation-as-a-Service" pipelines and use "LLM-as-a-Judge" patterns to score groundedness, relevance, and faithfulness.
- Manage the lifecycle of models (GPT-4o, Llama 3.x, Phi-4) including versioning, blue-green deployments, and A/B testing of system prompts.
- Enforce Zero Trust security for AI—implementing Private Links, Managed Identities, and Virtual Network isolation for all LLM traffic.
- Deploy and tune Azure AI Content Safety and custom jailbreak detection layers to prevent prompt injection and PII leakage.
- Monitor token usage and latency metrics to provide FinOps insights and prevent "runaway" agent costs.
DockerMicrosoft AzureFastAPI+4 more