Senior ML Engineer
New
C
Cast AICloud Infrastructure
Location: United KingdomFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- PythonKubernetesPyTorchDistributed Systems
Requirements
- 5+ years building real ML systems.
- Experience in inference or training infrastructure.
- Strong Python skills for production services.
- Experience with vLLM, SGLang, or TensorRT-LLM.
- Fluency in quantization tradeoffs.
- Experience with distributed systems (sharding, collective communication).
- Bias toward measurement and performance profiling.
Responsibilities
- Push throughput via continuous batching, speculative decoding, and kernel-level tuning.
- Cut latency by identifying and fixing compute, memory, or network bottlenecks.
- Optimize KV cache via paged attention, prefix caching, and quantization.
- Perform empirical quantization work across weights, activations, and KV.
- Shrink cold starts and memory footprint.
- Scale across nodes using distributed inference topologies.
- Set technical direction for benchmarking and architecture.
View Full Description & ApplyYou'll be redirected to the employer's site