Research Engineer - Distributed Training

Posted 5 months agoViewed
United StatesFull-TimeAI/ML
Company:Prime Intellect
Location:United States
Languages:English
Seniority level:Senior, Extensive experience
Experience:Extensive experience
Skills:
PythonSoftware DevelopmentArtificial IntelligenceCloud ComputingMachine LearningPyTorchCI/CD
Requirements:
Extensive experience in designing and implementing end-to-end pipelines for training and deploying large-scale AI models Deep expertise in distributed training techniques, frameworks (PyTorch Distributed, DeepSpeed, MosaicML’s LLM Foundry), and tools (Ray) Experience in large-scale model training including data, tensor & pipeline parallelism Solid understanding of MLOps best practices (model versioning, experiment tracking, CI/CD) Passion for advancing decentralized AI model training and democratizing AI capabilities
Responsibilities:
Lead and participate in novel research for decentralized training orchestration Optimize AI workload performance, cost, and resource utilization Contribute to open-source libraries and frameworks for distributed model training Publish research in top-tier AI conferences Distill technical project outcomes into layman approachable technical blogs Stay up-to-date with AI/ML infrastructure, tools, and decentralized training research
About the Company
Prime Intellect
View Company Profile
Similar Jobs:
Posted 3 months ago
North AmericaFull-TimeGenerative AI
Research Engineer / Research Scientist (Post-training)
Company:Ideogram
Posted 25 days ago
United StatesFull-TimeAI Infrastructure
Distributed Systems Engineer
Company:LiveKit
Posted 4 days ago
USAContractAI Development
AI Research SME Specialist - AI Training