Forward Deployed Engineer, AI Inference
New
Source API remote eligibility restrictions: United StatesFull-TimeSenior
Salary184,940 - 342,490 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ Years
- Required Skills
- PythonKubernetesGo
Requirements
- 8+ years of engineering experience in Backend Systems, SRE, or Infrastructure Engineering.
- Deep expertise in Kubernetes, including CRDs, Operators, Controllers, and Gateway API.
- Proficiency in Python and Go.
- Experience with AI inference concepts like KV Caching, continuous batching, and model performance tuning.
- Experience with infrastructure as code tools such as Helm or Terraform.
- Familiarity with GPU hardware (NVIDIA, AMD, TPUs) and cloud/bare-metal deployment environments.
- Strong systems programming and troubleshooting skills.
Responsibilities
- Deploy and configure LLM-D and vLLM on Kubernetes clusters using advanced techniques like disaggregated serving and KV-cache offloading.
- Perform performance benchmarking and tune vLLM parameters to meet latency and throughput SLOs.
- Write production-quality code in Python and Go to integrate inference engines into customer Kubernetes ecosystems.
- Debug complex interaction effects between model architectures, hardware accelerators, and network stacks like Envoy/ISTIO.
- Influence the product roadmap by channeling field feedback and technical requirements back to core engineering teams.
View Full Description & ApplyYou'll be redirected to the employer's site