GPU Software Engineer (CUDA)

New
Continental United StatesFull-TimeSenior
Salary100,000 - 150,000 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
6+ years
Required Skills
FGPA ArchitecturePyTorchC++

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field.
  • Six or more years of experience in GPU programming and performance engineering.
  • Deep expertise in CUDA C/C++ and GPU programming models.
  • Strong understanding of modern GPU architectures, memory hierarchies, and execution models.
  • Hands-on experience profiling and optimizing GPU workloads in production.
  • Familiarity with NCCL, MPI, and high-performance interconnect technologies.
  • Experience integrating custom kernels into ML frameworks.
  • Strong C++ skills and familiarity with modern systems programming practices.
  • Solid grounding in linear algebra and numerical methods.
  • Strong communication and collaboration skills with research and engineering teams.

Responsibilities

  • Design and implement high-performance CUDA kernels for compute-intensive workloads across AI and HPC use cases.
  • Profile and optimize GPU code using tools such as Nsight Systems, Nsight Compute, and CUDA profilers.
  • Tune memory access patterns, occupancy, register usage, and shared memory utilization for peak performance.
  • Develop highly optimized libraries for linear algebra, attention, and other ML primitives.
  • Optimize multi-GPU and multi-node training using NCCL, RDMA, and high-performance networking.
  • Implement custom operators and fused kernels in PyTorch, JAX, or Triton.
  • Collaborate with ML engineers to identify performance bottlenecks in training and inference pipelines.
  • Develop benchmarks and regression tests to safeguard performance over time.
View Full Description & ApplyYou'll be redirected to the employer's site
100,000 - 150,000 USD per year
Apply Now