AI/ML Research Engineer, LLM Post-Training & Evaluation
New
I
Innodata Inc.Data Engineering AI
Remote - United StatesFull-TimeMiddle
Salary80,000 - 175,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 2-3 years
- Required Skills
- PythonPyTorchTensorflow
Requirements
- BS/MS/PhD in Computer Science, Machine Learning, AI, Applied Mathematics, or a related quantitative technical field (MS/PhD preferred)
- 2-3 years of relevant industry or research engineering experience in ML/AI systems
- Hands-on experience with LLM training / fine-tuning / post-training, including at least one of: supervised fine-tuning (SFT), preference optimization (e.g., DPO), RLHF / RLAIF-style workflows, task- or domain-adaptation of foundation models
- Strong programming skills in Python and experience building production-quality ML code
- Experience with modern ML frameworks (e.g., PyTorch, JAX, TensorFlow) and model libraries/tooling (e.g., Hugging Face ecosystem, vLLM, distributed training stacks)
- Experience designing and implementing evaluation pipelines for LLM/ML systems, including metrics computation, dataset handling, and experiment comparisons
- Strong understanding of data pipelines and ML systems engineering, including reproducibility, observability, and debugging
- Experience with large-scale distributed ML systems and performance optimization for training/evaluation workloads
- Experience with large-scale data processing and workflow orchestration
- Ability to collaborate directly with technical stakeholders including research scientists, ML engineers, data engineers, and customer technical leads
Responsibilities
- Lead or co-lead technically complex ML engineering projects from initial customer discussions through implementation and delivery
- Design, build, and improve LLM training and post-training pipelines, including data ingestion, preprocessing, fine-tuning, evaluation, and experiment tracking
- Implement and optimize evaluation systems for LLMs and multimodal models, including offline benchmarks and task-specific test harnesses
- Integrate human-in-the-loop and AI-augmented evaluation signals into model development workflows
- Build robust infrastructure and tooling for reproducible experimentation, metrics logging, and regression monitoring
- Diagnose model behavior and pipeline failures, including data issues, training instability, metric inconsistencies, and evaluation drift
- Collaborate with Language Data Scientists and Applied Research Scientists to translate evaluation frameworks into executable systems
- Work closely with customer technical stakeholders to understand goals, constraints, and success criteria; propose and implement technically sound solutions
View Full Description & ApplyYou'll be redirected to the employer's site