AI Performance Optimization Engineer
New
100% Remote (Continental United States)Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years
- Required Skills
- PythonFGPA ArchitectureC++Deep LearningLLM
Requirements
- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
- Six or more years of experience in performance engineering, ML systems, or HPC.
- Strong proficiency in Python and C++.
- Hands-on experience optimizing deep learning workloads on modern GPUs.
- Deep understanding of distributed training and inference techniques.
- Experience with profiling tools across CPU, GPU, and distributed systems.
- Familiarity with model compression techniques and their accuracy implications.
- Strong grasp of memory hierarchies, communication primitives, and parallelism strategies.
- Excellent measurement, debugging, and analytical reasoning skills.
- Strong communication and collaboration skills.
Responsibilities
- Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost.
- Identify and eliminate bottlenecks across data loading, model compute, communication, and memory.
- Implement and tune quantization, sparsity, and pruning strategies.
- Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding.
- Drive compiler-level optimizations using Triton, XLA, TorchInductor, or TVM.
- Build and maintain rigorous benchmark suites and regression frameworks.
- Drive cost-efficiency improvements through model architecture and hardware selection.
View Full Description & ApplyYou'll be redirected to the employer's site