Apply

Web Scraping Specialist - EU Remote CET

Posted 2024-10-15

View full description

💎 Seniority level: Junior, 2-4 years

📍 Location: Hungary, CET, NOT STATED

🔍 Industry: Fraud prevention and risk detection

🏢 Company: SEON Technologies

🗣️ Languages: English

⏳ Experience: 2-4 years

🪄 Skills: PythonSelenium

Requirements:
  • 2-4 years of experience in web scraping, with a strong focus on data extraction from complex, dynamic websites and unstructured resources.
  • Proficient in Python and libraries such as Selenium, BeautifulSoup, Scrapy, or equivalent frameworks.
  • Experience working with third-party proxy providers and rotating proxies to handle scraping challenges.
  • Knowledge of client faking techniques (e.g., user-agent manipulation, cookie management, header spoofing).
  • Familiarity with handling common web scraping challenges like CAPTCHAs, rate limiting, and bot detection.
  • Experience with API interaction and extracting data from both public and private APIs.
  • Strong problem-solving skills, attention to detail, and the ability to handle large-scale scraping projects.
  • Familiarity with data cleaning and processing best practices.
  • Fluent English.
Responsibilities:
  • Develop and maintain a scalable in-house built scraping pipeline using Python.
  • Implement web scraping solutions using tools like Selenium, BeautifulSoup, or similar libraries.
  • Troubleshoot, optimize and enhance existing scraping workflows and tools.
  • Cooperate with data scientists and colleagues in developing in-house built data consolidation tools to clean and organize scraped data to ensure it is accurate, reliable, and ready for analysis.
  • Manage and utilize third-party proxy services to ensure effective data extraction, bypassing anti-scraping mechanisms.
  • Apply advanced client-faking techniques (e.g., user-agent rotation, CAPTCHA solving, IP masking) to avoid detection.
  • Collaborate with data engineers and other team members to integrate data into pipelines or systems.
  • Stay updated on the latest developments in web scraping, proxies, and anti-scraping techniques.
Apply