AI Performance Optimization Engineer

New

100% Remote (Continental United States)Full-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
Six or more years of experience in performance engineering, ML systems, or HPC.
Strong proficiency in Python and C++.
Hands-on experience optimizing deep learning workloads on modern GPUs.
Deep understanding of distributed training and inference techniques.
Experience with profiling tools across CPU, GPU, and distributed systems.
Familiarity with model compression techniques and their accuracy implications.
Strong grasp of memory hierarchies, communication primitives, and parallelism strategies.
Excellent measurement, debugging, and analytical reasoning skills.
Strong communication and collaboration skills.

Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost.
Identify and eliminate bottlenecks across data loading, model compute, communication, and memory.
Implement and tune quantization, sparsity, and pruning strategies.
Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding.
Drive compiler-level optimizations using Triton, XLA, TorchInductor, or TVM.
Build and maintain rigorous benchmark suites and regression frameworks.
Drive cost-efficiency improvements through model architecture and hardware selection.

View Full Description & ApplyYou'll be redirected to the employer's site