Member of Technical Staff, Synthetic Data

Posted about 2 months agoViewed

Canada, United States, EuropeFull-TimeAI, Software Development

Company:Cohere

Location:Canada, United States, Europe, EST, EU

Languages:English

Seniority level:Staff

Skills:

PythonSoftware DevelopmentArtificial IntelligenceData AnalysisKubernetesMachine LearningPandasCI/CD

Requirements:

Strong software engineering skills, with proficiency in Python and experience building data pipelines. Familiarity with data processing frameworks like Apache Spark, Apache Beam, Pandas, or similar. Experience working with LLMs through projects, contributions, or experimentation. Familiarity with LLM inference frameworks such as vLLM and TensorRT. Experience working with large-scale datasets, including web, code, and multilingual data.

Responsibilities:

Design and build scalable inference pipelines on large GPU clusters. Conduct data ablations to assess data quality and experiment with data mixtures for model performance enhancement. Research and implement innovative synthetic data curation methods. Collaborate with cross-functional teams to ensure data pipelines meet language model demands.