Cast AI

Private Company
ShareTweet

Open Positions12

Location: United KingdomFull-TimeCloud InfrastructurePosted
  • Push throughput via continuous batching, speculative decoding, and kernel-level tuning.
  • Cut latency by identifying and fixing compute, memory, or network bottlenecks.
  • Optimize KV cache via paged attention, prefix caching, and quantization.
  • Perform empirical quantization work across weights, activations, and KV.
  • Shrink cold starts and memory footprint.
  • Scale across nodes using distributed inference topologies.
  • Set technical direction for benchmarking and architecture.
PythonKubernetesPyTorch+1 more
Showing 1 of 12 positions

Similar Companies