Member of Technical Staff (Data Intelligence)
New
R
RekaArtificial Intelligence
US, UK, Singapore, RemoteFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonMachine LearningPyTorchAirflowData engineeringSparkCI/CDDeep Learning
Requirements
- Strong ML and deep learning fundamentals with experience building and operating large-scale data and/or compute systems
- Comfortable moving between research questions and production engineering
- Demonstrated research experience with data compositions, quality, and dataset releases
- Ability to design and execute experiments with convincing unbiased outcomes
- Practical experience with distributed processing and orchestration (Spark, Ray, Airflow, or equivalents)
- Solid Python skills
- Familiarity with the tooling around modern model training workflows (datasets, checkpoints, experiment tracking)
- Strong instincts around data quality: how to measure it, how to monitor it, and how to prevent regressions as things scale
- Able to work in a fast-moving environment, prioritize what matters, and communicate clearly with both researchers and engineers
Responsibilities
- Work with model researchers to define what “good data” means for our models, including quality metrics, validation checks, and acceptance thresholds
- Explore open source datasets and create internal ones most suitable to build fundamental World Models
- Build algorithms for automated data quality assessment, data domain mixtures, and domain adaptation from synthetic to real data.
- Track datasets, metadata, provenance, and versions so experiments are reproducible and it’s clear what data went into which training and evaluation runs
- Own CI/CD and development tooling for the data stack (GitHub, Python, PyTorch), and automate repetitive workflows to reduce friction
- Track and optimize throughput, storage, and compute utilization across pipelines and related assets
View Full Description & ApplyYou'll be redirected to the employer's site