Senior Member of Technical Staff, Web Data

New
C
CohereArtificial Intelligence
There are no restrictions on where you can be located for this role. (EST/EU), EST/EUFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Required Skills
PythonMachine LearningPandasSparkNLP

Requirements

  • Strong software engineering skills, with proficiency in Python and experience building data pipelines.
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.
  • Experience working with large-scale web datasets.
  • Knowledge of data quality assessment techniques and experimentation with data mixtures.
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training.

Responsibilities

  • Maintain large-scale pipelines for processing web corpora.
  • Work on filtering and quality-scoring systems to identify high-value web documents.
  • Analyze web data composition across domains, languages and time periods.
  • Develop and maintain highly-performant deduplication pipelines.
  • Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now