Senior ML Engineer

New
C
Cast AICloud Infrastructure
Location: United KingdomFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
5+ years
Required Skills
PythonKubernetesPyTorchDistributed Systems

Requirements

  • 5+ years building real ML systems.
  • Experience in inference or training infrastructure.
  • Strong Python skills for production services.
  • Experience with vLLM, SGLang, or TensorRT-LLM.
  • Fluency in quantization tradeoffs.
  • Experience with distributed systems (sharding, collective communication).
  • Bias toward measurement and performance profiling.

Responsibilities

  • Push throughput via continuous batching, speculative decoding, and kernel-level tuning.
  • Cut latency by identifying and fixing compute, memory, or network bottlenecks.
  • Optimize KV cache via paged attention, prefix caching, and quantization.
  • Perform empirical quantization work across weights, activations, and KV.
  • Shrink cold starts and memory footprint.
  • Scale across nodes using distributed inference topologies.
  • Set technical direction for benchmarking and architecture.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now