Inferact

Private Company

Open Positions1

Fully remote, worldwide.Full-TimeAI InfrastructurePosted

Optimize inference runtime performance for LLM and diffusion models across diverse hardware.
Develop low-level CUDA kernels or equivalent (Triton, TileLang, Pallas) to enhance inference speed.
Design and implement high-performance distributed systems for serving models at scale.
Build operational infrastructure for cluster management, deployment automation, and production monitoring.
Manage KV-cache memory and model serving within the vLLM stack.
Collaborate asynchronously with global team members while maintaining code ownership.

PythonKubernetesPyTorch+5 more