Staff Machine Learning Systems Engineer

New
R
RedditMachine Learning
Remote - United StatesFull-TimeStaff
Salary230,000 - 322,000 USD per year
Apply NowOpens the employer's application page

Job Details

Experience
8+ years of experience in ML infrastructure
Required Skills
PythonGCPKubernetesMachine LearningPyTorchTensorflowTerraformMLOps

Requirements

  • 8+ years of experience in ML infrastructure, including model training and deployments.
  • Hands-on experience with ML optimization, including memory and GPU profiling.
  • Deep experience with cloud-based technologies (GCP BigQuery, Google Cloud Storage, Terraform).
  • Hands-on experience administering and integrating MLOps tools (MLflow or Wandb).
  • Proficiency with Python, PyTorch, and Tensorflow.
  • Deep experience working with distributed training frameworks, including Ray and Kubernetes.
  • Strong organizational & communication skills.
  • Experience with graph databases (Neo4j, JanusGraph, TigerGraph) is a plus.
  • Experience with graph neural networks (GNNs) and frameworks (PyTorch Geometric, Deep Graph Library) is a plus.

Responsibilities

  • Lead development of a platform for large scale ML models at Reddit.
  • Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers.
  • Zero-to-one development and support of a graph ML codebase and platform.
  • Collaborate with ML engineers on performance tuning and training efficiency.
  • Optimize batch data processing within a data warehouse using Apache Beam, Spark, and Ray Data.
  • Architect pipelines to build and maintain massive graph data structures.
View Full Description & ApplyYou'll be redirected to the employer's site
230,000 - 322,000 USD per year
Apply Now