AI Performance Optimization Engineer

New
100% Remote (Continental United States)Full-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
6+ years
Required Skills
PythonFGPA ArchitectureC++Deep LearningLLM

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
  • Six or more years of experience in performance engineering, ML systems, or HPC.
  • Strong proficiency in Python and C++.
  • Hands-on experience optimizing deep learning workloads on modern GPUs.
  • Deep understanding of distributed training and inference techniques.
  • Experience with profiling tools across CPU, GPU, and distributed systems.
  • Familiarity with model compression techniques and their accuracy implications.
  • Strong grasp of memory hierarchies, communication primitives, and parallelism strategies.
  • Excellent measurement, debugging, and analytical reasoning skills.
  • Strong communication and collaboration skills.

Responsibilities

  • Profile and optimize end-to-end AI training and inference pipelines for throughput, latency, and cost.
  • Identify and eliminate bottlenecks across data loading, model compute, communication, and memory.
  • Implement and tune quantization, sparsity, and pruning strategies.
  • Optimize distributed training using tensor parallelism, pipeline parallelism, FSDP, and ZeRO-style sharding.
  • Drive compiler-level optimizations using Triton, XLA, TorchInductor, or TVM.
  • Build and maintain rigorous benchmark suites and regression frameworks.
  • Drive cost-efficiency improvements through model architecture and hardware selection.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now