Familiarity with handling common web scraping challenges like CAPTCHAs, rate limiting, and bot detection.
Experience with API interaction and extracting data from both public and private APIs.
Strong problem-solving skills, attention to detail, and the ability to handle large-scale scraping projects.
Familiarity with data cleaning and processing best practices.
Fluent English.
Responsibilities:
Develop and maintain a scalable in-house built scraping pipeline using Python.
Implement web scraping solutions using tools like Selenium, BeautifulSoup, or similar libraries.
Troubleshoot, optimize and enhance existing scraping workflows and tools.
Cooperate with data scientists and colleagues in developing in-house built data consolidation tools to clean and organize scraped data to ensure it is accurate, reliable, and ready for analysis.
Manage and utilize third-party proxy services to ensure effective data extraction, bypassing anti-scraping mechanisms.
Apply advanced client-faking techniques (e.g., user-agent rotation, CAPTCHA solving, IP masking) to avoid detection.
Collaborate with data engineers and other team members to integrate data into pipelines or systems.
Stay updated on the latest developments in web scraping, proxies, and anti-scraping techniques.