Strong background in machine learning, with hands-on experience in developing and deploying inference models.
Proficiency in Python and machine learning frameworks such as TensorFlow or PyTorch.
Experience with optimization techniques including quantization, pruning, and model compression.
Strong problem-solving skills for troubleshooting complex technical issues.
Excellent communication and collaboration skills in a remote team environment.
Passion for learning and staying updated on advancements in ML inference technologies.
Experience with ML workflow libraries like CUDNN/TensorRT, ROCm, OpenVino, or OpenPPL; knowledge of ML communication frameworks like NCCL is an advantage.
Responsibilities:
Design and implement efficient algorithms and models for real-time inference in Conversation AI applications, with a focus on Nebula.
Collaborate with cross-functional teams to integrate machine learning models into production systems, ensuring scalability, reliability, and performance.
Optimize and fine-tune machine learning models for resource-constrained environments.
Develop monitoring and evaluation mechanisms to assess performance of inference models.
Stay updated on advancements in machine learning inference techniques and incorporate new approaches as needed.
Contribute to the documentation of best practices for implementing and deploying ML inference solutions.