Commercial experience in developing and debugging high-performance GPU and CPU applications with strong focus on latency and throughput Hands-on experience with third-party libraries and designing custom CUDA kernels Proficient with profiling and performance analysis tools (Nsight Systems, Nsight Compute, nvprof) Solid understanding of data structures, algorithms, and object-oriented programming in C++ Proven ability to work effectively in remote or hybrid teams with variable, project-based responsibilities Curiosity and proactive engagement with emerging trends in GPU/HPC/ML