Member of Engineering (Pre-training / Data Acquisition)

New
P
PoolsideArtificial Intelligence
Remote (EMEA/East Coast), Secondary Locations: Remote (EMEA), London, UK, EMEA and US East Coast time zonesFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
AWSDockerPythonKubernetesDistributed Systems

Requirements

  • Strong distributed systems background with proven experience building and operating large-scale infrastructure
  • Proficiency in Python
  • Hands-on experience with web crawling or large-scale data extraction
  • Understanding of HTTP protocols, distributed job queues, and data parsing at scale
  • Familiarity with cloud platforms (AWS) and container orchestration (Kubernetes, Docker)
  • Awareness of data privacy, robots.txt adherence, and responsible crawl practices

Responsibilities

  • Design, build, and operate a large-scale web crawler responsible for acquiring all openly accessible data on the internet
  • Develop specialized deep crawlers targeting high-value sources to improve recall and coverage
  • In collaboration with data researchers, own a long-term road map for data acquisition
  • Build observability, monitoring, and debugging tooling to ensure reliability and transparency across crawl infrastructure
  • Collaborate with pre-training, post-training, and evaluations teams to align data acquisition priorities with model training needs
  • Build high-throughput ingestion pipelines for rapidly onboarding partner data and evaluating it for quality
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now