Apply

Member of Technical Staff, Training Performance Engineer

Posted about 1 month agoViewed

View full description

💎 Seniority level: Staff

📍 Location: London

🔍 Industry: Software Development

🏢 Company: Cohere👥 251-500💰 $169,509,482 Grant 4 months ago🫂 Last layoff 8 months agoArtificial Intelligence (AI)Machine LearningGenerative AINatural Language Processing

🗣️ Languages: English

🪄 Skills: PythonPyTorch

Requirements:
  • Extremely strong software engineering skills.
  • Proficiency in Python and related ML frameworks such as JAX, Pytorch and XLA/MLIR.
  • Experience writing kernels for GPUs using CUDA, triton, etc
  • Experience using large-scale distributed training strategies.
  • Familiarity with autoregressive sequence models, such as Transformers.
Responsibilities:
  • Design and write high-performant and scalable software for training.
  • Understand architectural modifications and design choices and their effects on training throughput and quality.
  • Write low-level CUDA, triton kernels to squeeze every last bit of performance from our accelerators.
  • Research, implement, and experiment with ideas on our supercompute and data infrastructure.
  • Learn from and work with the best researchers in the field.
Apply