Develop and maintain scalable infrastructure for large-scale image and video data acquisition. Manage and coordinate data transfers from various licensing partners. Implement and deploy state-of-the-art ML models for data cleaning, processing, and preparation. Implement scalable and efficient tools to visualize, cluster, and deeply understand the data. Optimize and parallelize data processing workflows to handle billion-scale datasets efficiently. Ensure data quality, diversity, and proper annotation for training readiness. Get training data from alternative sources into trainable format. Work closely in the model development loop to update data as required by training.