Lead Azure GenAIOps / LLMOps Engineer

Remote / Hybrid (India)Full-TimeLead

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 10 – 14+ Years
Required Skills: DockerMicrosoft AzureFastAPITerraformGitHub ActionsDatadogGenerative AI

B.Tech/M.Tech in Computer Science or related field (Ph.D. is a plus but not mandatory for this Ops-centric role).
Expert-level Microsoft Azure (AI Foundry, Azure OpenAI, Azure ML).
Deep experience with Azure Kubernetes Service (AKS), Docker, and KEDA for auto-scaling AI workloads.
Mastery of LangGraph, LlamaIndex, and FastAPI for building high-concurrency AI backends.
Hands-on with Vector Stores—Azure AI Search, Pinecone, or Milvus.
Proven experience with GitHub Actions or Azure Pipelines for ML/LLM CI/CD.
Ability to explain the trade-offs between "Latency vs. Accuracy" to non-technical business leaders.
Lead a team of 4–6 engineers, setting the technical standard for code reviews and architectural blueprints.
A track record of moving beyond "Simple RAG" into advanced patterns like GraphRAG and Multi-modal pipelines.
Azure Solutions Architect or Azure AI Engineer Associate certification preferred.

Architect and scale multi-agent systems using LangGraph, AutoGen, or Semantic Kernel.
Implement persistent state management and deterministic fallback logic for autonomous agents.
Design and manage a centralized AI Gateway (using Azure APIM) to handle request routing, rate limiting, and cost-attribution across different business units.
Provision and manage Azure AI resources (Foundry, Search, CosmosDB) using Terraform or Bicep to ensure reproducible environments.
Implement end-to-end distributed tracing for LLM calls using tools like Langfuse, Arize Phoenix, or LangSmith integrated with Azure Monitor/Datadog.
Build automated "Evaluation-as-a-Service" pipelines and use "LLM-as-a-Judge" patterns to score groundedness, relevance, and faithfulness.
Manage the lifecycle of models (GPT-4o, Llama 3.x, Phi-4) including versioning, blue-green deployments, and A/B testing of system prompts.
Enforce Zero Trust security for AI—implementing Private Links, Managed Identities, and Virtual Network isolation for all LLM traffic.
Deploy and tune Azure AI Content Safety and custom jailbreak detection layers to prevent prompt injection and PII leakage.
Monitor token usage and latency metrics to provide FinOps insights and prevent "runaway" agent costs.

View Full Description & ApplyYou'll be redirected to the employer's site