AI Data Infrastructure Engineer

New
Within the United StatesFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
6+ years
Required Skills
PythonJavaGoSparkCI/CDScalaData modelingDistributed Systems

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
  • 6+ years of experience in data engineering, preferably supporting machine learning or AI systems.
  • Strong proficiency in Python and at least one systems or JVM-based language (e.g., Java, Scala, Go).
  • Hands-on experience with distributed data processing frameworks such as Spark, Beam, or Ray.
  • Experience operating large-scale or petabyte-level data infrastructure systems.
  • Strong understanding of distributed systems, data modeling, storage formats, and pipeline architecture.
  • Experience with dataset versioning, lineage tracking, and ML reproducibility workflows.
  • Strong software engineering practices including testing, CI/CD, and system design.
  • Excellent communication skills and ability to work cross-functionally with technical teams.
  • Experience with multimodal datasets, privacy-aware systems, or AI training pipelines is a plus.

Responsibilities

  • Design, build, and maintain large-scale data pipelines supporting AI training, evaluation, and continuous model improvement workflows.
  • Develop ingestion and processing systems for multimodal datasets including text, image, audio, video, and structured data.
  • Implement data cleaning, deduplication, validation, and quality assurance processes at petabyte-scale.
  • Build dataset versioning, lineage tracking, and reproducibility systems to ensure reliable AI training environments.
  • Optimize high-throughput data delivery systems to maximize compute and GPU utilization.
  • Collaborate with ML researchers and engineers to support dataset construction, evaluation pipelines, and AI model development needs.
  • Design scalable storage architectures and implement observability tools for data quality, performance, and pipeline health.
  • Ensure data governance, privacy compliance, and secure handling of sensitive datasets across systems.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now