5+ years of experience writing high-performance, production-quality code Strong programming skills in C++ or Python (Rust/Go also welcome) Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang) Ability to diagnose and resolve performance bottlenecks across the model execution stack Experience with GPU programming, CUDA, or low-level systems optimization (preferred) Experience with language modeling with transformers (MoE, speculative decoding, KV-cache optimizations) (preferred) Experience scaling performance-critical distributed systems (preferred)