Research Crawling Engineer
New
W
Wynd LabsAI Data Infrastructure
This is a fully remote team.Full-Time
SalaryCompetitive salary, benefits and equity package.
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonJavaC++GoRustDistributed Systems
Requirements
- Strong programming experience in one or more of: Go, Rust, Python, Java, or C++
- Experience building web crawlers or large-scale data pipelines
- Solid understanding of HTTP, networking, and browser behavior
- Familiarity with distributed systems and parallel processing
- Experience working with large datasets (TB–PB scale preferred)
- Ability to debug unstable or adversarial environments
Responsibilities
- Build and maintain large-scale web crawlers across diverse domains
- Design high-throughput, fault-tolerant systems for data collection (millions to billions of URLs/day)
- Handle anti-bot systems, rate limits, and dynamic/JS-heavy sites
- Develop pipelines for cleaning, deduplication, filtering, and normalization
- Construct and maintain datasets for research and model training
- Monitor crawl performance, coverage, and data quality; iterate quickly
- Collaborate with research teams to align data collection with modeling needs
- Optimize infrastructure for cost, latency, and reliability
View Full Description & ApplyYou'll be redirected to the employer's site