Forward Deployed Engineer, AI Inference

New
Source API remote eligibility restrictions: United StatesFull-TimeSenior
Salary184,940 - 342,490 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
8+ Years
Required Skills
PythonKubernetesGo

Requirements

  • 8+ years of engineering experience in Backend Systems, SRE, or Infrastructure Engineering.
  • Deep expertise in Kubernetes, including CRDs, Operators, Controllers, and Gateway API.
  • Proficiency in Python and Go.
  • Experience with AI inference concepts like KV Caching, continuous batching, and model performance tuning.
  • Experience with infrastructure as code tools such as Helm or Terraform.
  • Familiarity with GPU hardware (NVIDIA, AMD, TPUs) and cloud/bare-metal deployment environments.
  • Strong systems programming and troubleshooting skills.

Responsibilities

  • Deploy and configure LLM-D and vLLM on Kubernetes clusters using advanced techniques like disaggregated serving and KV-cache offloading.
  • Perform performance benchmarking and tune vLLM parameters to meet latency and throughput SLOs.
  • Write production-quality code in Python and Go to integrate inference engines into customer Kubernetes ecosystems.
  • Debug complex interaction effects between model architectures, hardware accelerators, and network stacks like Envoy/ISTIO.
  • Influence the product roadmap by channeling field feedback and technical requirements back to core engineering teams.
View Full Description & ApplyYou'll be redirected to the employer's site
184,940 - 342,490 USD per year
Apply Now