Forward Deployed Engineer, AI Inference

New

Source API remote eligibility restrictions: United StatesFull-TimeSenior

Salary184,940 - 342,490 USD per year

Apply NowOpens the employer's application page

Job Details

8+ years of engineering experience in Backend Systems, SRE, or Infrastructure Engineering.
Deep expertise in Kubernetes, including CRDs, Operators, Controllers, and Gateway API.
Proficiency in Python and Go.
Experience with AI inference concepts like KV Caching, continuous batching, and model performance tuning.
Experience with infrastructure as code tools such as Helm or Terraform.
Familiarity with GPU hardware (NVIDIA, AMD, TPUs) and cloud/bare-metal deployment environments.
Strong systems programming and troubleshooting skills.

Deploy and configure LLM-D and vLLM on Kubernetes clusters using advanced techniques like disaggregated serving and KV-cache offloading.
Perform performance benchmarking and tune vLLM parameters to meet latency and throughput SLOs.
Write production-quality code in Python and Go to integrate inference engines into customer Kubernetes ecosystems.
Debug complex interaction effects between model architectures, hardware accelerators, and network stacks like Envoy/ISTIO.
Influence the product roadmap by channeling field feedback and technical requirements back to core engineering teams.

View Full Description & ApplyYou'll be redirected to the employer's site