Member of Technical Staff (Data Intelligence)

New

RekaArtificial Intelligence

US, UK, Singapore, RemoteFull-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Required Skills: PythonMachine LearningPyTorchAirflowData engineeringSparkCI/CDDeep Learning

Strong ML and deep learning fundamentals with experience building and operating large-scale data and/or compute systems
Comfortable moving between research questions and production engineering
Demonstrated research experience with data compositions, quality, and dataset releases
Ability to design and execute experiments with convincing unbiased outcomes
Practical experience with distributed processing and orchestration (Spark, Ray, Airflow, or equivalents)
Solid Python skills
Familiarity with the tooling around modern model training workflows (datasets, checkpoints, experiment tracking)
Strong instincts around data quality: how to measure it, how to monitor it, and how to prevent regressions as things scale
Able to work in a fast-moving environment, prioritize what matters, and communicate clearly with both researchers and engineers

Work with model researchers to define what “good data” means for our models, including quality metrics, validation checks, and acceptance thresholds
Explore open source datasets and create internal ones most suitable to build fundamental World Models
Build algorithms for automated data quality assessment, data domain mixtures, and domain adaptation from synthetic to real data.
Track datasets, metadata, provenance, and versions so experiments are reproducible and it’s clear what data went into which training and evaluation runs
Own CI/CD and development tooling for the data stack (GitHub, Python, PyTorch), and automate repetitive workflows to reduce friction
Track and optimize throughput, storage, and compute utilization across pipelines and related assets

View Full Description & ApplyYou'll be redirected to the employer's site