AI Data Infrastructure Engineer
New
Within the United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 6+ years
- Required Skills
- PythonJavaGoSparkCI/CDScalaData modelingDistributed Systems
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- 6+ years of experience in data engineering, preferably supporting machine learning or AI systems.
- Strong proficiency in Python and at least one systems or JVM-based language (e.g., Java, Scala, Go).
- Hands-on experience with distributed data processing frameworks such as Spark, Beam, or Ray.
- Experience operating large-scale or petabyte-level data infrastructure systems.
- Strong understanding of distributed systems, data modeling, storage formats, and pipeline architecture.
- Experience with dataset versioning, lineage tracking, and ML reproducibility workflows.
- Strong software engineering practices including testing, CI/CD, and system design.
- Excellent communication skills and ability to work cross-functionally with technical teams.
- Experience with multimodal datasets, privacy-aware systems, or AI training pipelines is a plus.
Responsibilities
- Design, build, and maintain large-scale data pipelines supporting AI training, evaluation, and continuous model improvement workflows.
- Develop ingestion and processing systems for multimodal datasets including text, image, audio, video, and structured data.
- Implement data cleaning, deduplication, validation, and quality assurance processes at petabyte-scale.
- Build dataset versioning, lineage tracking, and reproducibility systems to ensure reliable AI training environments.
- Optimize high-throughput data delivery systems to maximize compute and GPU utilization.
- Collaborate with ML researchers and engineers to support dataset construction, evaluation pipelines, and AI model development needs.
- Design scalable storage architectures and implement observability tools for data quality, performance, and pipeline health.
- Ensure data governance, privacy compliance, and secure handling of sensitive datasets across systems.
View Full Description & ApplyYou'll be redirected to the employer's site