AI/ML Research Engineer, LLM Post-Training & Evaluation

New

Innodata Inc.Data Engineering AI

Remote - United StatesFull-TimeMiddle

Salary80,000 - 175,000 USD per year

Apply NowOpens the employer's application page

Job Details

BS/MS/PhD in Computer Science, Machine Learning, AI, Applied Mathematics, or a related quantitative technical field (MS/PhD preferred)
2-3 years of relevant industry or research engineering experience in ML/AI systems
Hands-on experience with LLM training / fine-tuning / post-training, including at least one of: supervised fine-tuning (SFT), preference optimization (e.g., DPO), RLHF / RLAIF-style workflows, task- or domain-adaptation of foundation models
Strong programming skills in Python and experience building production-quality ML code
Experience with modern ML frameworks (e.g., PyTorch, JAX, TensorFlow) and model libraries/tooling (e.g., Hugging Face ecosystem, vLLM, distributed training stacks)
Experience designing and implementing evaluation pipelines for LLM/ML systems, including metrics computation, dataset handling, and experiment comparisons
Strong understanding of data pipelines and ML systems engineering, including reproducibility, observability, and debugging
Experience with large-scale distributed ML systems and performance optimization for training/evaluation workloads
Experience with large-scale data processing and workflow orchestration
Ability to collaborate directly with technical stakeholders including research scientists, ML engineers, data engineers, and customer technical leads

Lead or co-lead technically complex ML engineering projects from initial customer discussions through implementation and delivery
Design, build, and improve LLM training and post-training pipelines, including data ingestion, preprocessing, fine-tuning, evaluation, and experiment tracking
Implement and optimize evaluation systems for LLMs and multimodal models, including offline benchmarks and task-specific test harnesses
Integrate human-in-the-loop and AI-augmented evaluation signals into model development workflows
Build robust infrastructure and tooling for reproducible experimentation, metrics logging, and regression monitoring
Diagnose model behavior and pipeline failures, including data issues, training instability, metric inconsistencies, and evaluation drift
Collaborate with Language Data Scientists and Applied Research Scientists to translate evaluation frameworks into executable systems
Work closely with customer technical stakeholders to understand goals, constraints, and success criteria; propose and implement technically sound solutions

View Full Description & ApplyYou'll be redirected to the employer's site