Senior Member of Technical Staff, Web Data
New
C
CohereArtificial Intelligence
There are no restrictions on where you can be located for this role. (EST/EU), EST/EUFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- PythonMachine LearningPandasSparkNLP
Requirements
- Strong software engineering skills, with proficiency in Python and experience building data pipelines.
- Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.
- Experience working with large-scale web datasets.
- Knowledge of data quality assessment techniques and experimentation with data mixtures.
- A passion for bridging research and engineering to solve complex data-related challenges in AI model training.
Responsibilities
- Maintain large-scale pipelines for processing web corpora.
- Work on filtering and quality-scoring systems to identify high-value web documents.
- Analyze web data composition across domains, languages and time periods.
- Develop and maintain highly-performant deduplication pipelines.
- Collaborate with cross-functional teams, including researchers and engineers, to ensure data pipelines meet the demands of cutting-edge language models.
View Full Description & ApplyYou'll be redirected to the employer's site