Lead Azure GenAIOps / LLMOps Engineer

Remote / Hybrid (India)Full-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
10 – 14+ Years
Required Skills
DockerMicrosoft AzureFastAPITerraformGitHub ActionsDatadogGenerative AI

Requirements

  • B.Tech/M.Tech in Computer Science or related field (Ph.D. is a plus but not mandatory for this Ops-centric role).
  • Expert-level Microsoft Azure (AI Foundry, Azure OpenAI, Azure ML).
  • Deep experience with Azure Kubernetes Service (AKS), Docker, and KEDA for auto-scaling AI workloads.
  • Mastery of LangGraph, LlamaIndex, and FastAPI for building high-concurrency AI backends.
  • Hands-on with Vector Stores—Azure AI Search, Pinecone, or Milvus.
  • Proven experience with GitHub Actions or Azure Pipelines for ML/LLM CI/CD.
  • Ability to explain the trade-offs between "Latency vs. Accuracy" to non-technical business leaders.
  • Lead a team of 4–6 engineers, setting the technical standard for code reviews and architectural blueprints.
  • A track record of moving beyond "Simple RAG" into advanced patterns like GraphRAG and Multi-modal pipelines.
  • Azure Solutions Architect or Azure AI Engineer Associate certification preferred.

Responsibilities

  • Architect and scale multi-agent systems using LangGraph, AutoGen, or Semantic Kernel.
  • Implement persistent state management and deterministic fallback logic for autonomous agents.
  • Design and manage a centralized AI Gateway (using Azure APIM) to handle request routing, rate limiting, and cost-attribution across different business units.
  • Provision and manage Azure AI resources (Foundry, Search, CosmosDB) using Terraform or Bicep to ensure reproducible environments.
  • Implement end-to-end distributed tracing for LLM calls using tools like Langfuse, Arize Phoenix, or LangSmith integrated with Azure Monitor/Datadog.
  • Build automated "Evaluation-as-a-Service" pipelines and use "LLM-as-a-Judge" patterns to score groundedness, relevance, and faithfulness.
  • Manage the lifecycle of models (GPT-4o, Llama 3.x, Phi-4) including versioning, blue-green deployments, and A/B testing of system prompts.
  • Enforce Zero Trust security for AI—implementing Private Links, Managed Identities, and Virtual Network isolation for all LLM traffic.
  • Deploy and tune Azure AI Content Safety and custom jailbreak detection layers to prevent prompt injection and PII leakage.
  • Monitor token usage and latency metrics to provide FinOps insights and prevent "runaway" agent costs.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now