Staff Machine Learning Systems Engineer
New
R
RedditMachine Learning
Remote - United StatesFull-TimeStaff
Salary230,000 - 322,000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ years of experience in ML infrastructure
- Required Skills
- PythonGCPKubernetesMachine LearningPyTorchTensorflowTerraformMLOps
Requirements
- 8+ years of experience in ML infrastructure, including model training and deployments.
- Hands-on experience with ML optimization, including memory and GPU profiling.
- Deep experience with cloud-based technologies (GCP BigQuery, Google Cloud Storage, Terraform).
- Hands-on experience administering and integrating MLOps tools (MLflow or Wandb).
- Proficiency with Python, PyTorch, and Tensorflow.
- Deep experience working with distributed training frameworks, including Ray and Kubernetes.
- Strong organizational & communication skills.
- Experience with graph databases (Neo4j, JanusGraph, TigerGraph) is a plus.
- Experience with graph neural networks (GNNs) and frameworks (PyTorch Geometric, Deep Graph Library) is a plus.
Responsibilities
- Lead development of a platform for large scale ML models at Reddit.
- Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers.
- Zero-to-one development and support of a graph ML codebase and platform.
- Collaborate with ML engineers on performance tuning and training efficiency.
- Optimize batch data processing within a data warehouse using Apache Beam, Spark, and Ray Data.
- Architect pipelines to build and maintain massive graph data structures.
View Full Description & ApplyYou'll be redirected to the employer's site