Inferact

Private Company
ShareTweet

Open Positions1

Fully remote, worldwide.Full-TimeAI InfrastructurePosted
  • Optimize inference runtime performance for LLM and diffusion models across diverse hardware.
  • Develop low-level CUDA kernels or equivalent (Triton, TileLang, Pallas) to enhance inference speed.
  • Design and implement high-performance distributed systems for serving models at scale.
  • Build operational infrastructure for cluster management, deployment automation, and production monitoring.
  • Manage KV-cache memory and model serving within the vLLM stack.
  • Collaborate asynchronously with global team members while maintaining code ownership.
PythonKubernetesPyTorch+5 more