Apply

Staff Software Engineer, ML Training Platform

Posted 7 days agoViewed

View full description

💎 Seniority level: Staff, 8+ years

📍 Location: United States

💸 Salary: 230000.0 - 322000.0 USD per year

🔍 Industry: Machine Learning

🏢 Company: Reddit👥 1001-5000💰 $410,000,000 Series F over 3 years ago🫂 Last layoff over 1 year agoNewsContentSocial NetworkSocial Media

⏳ Experience: 8+ years

🪄 Skills: DockerPythonKubernetesMachine LearningMLFlowPyTorchTensorflow

Requirements:
  • 8+ years of work experience in production software development or data systems.
  • Degree in Machine Learning, Engineering, Computer Science or relevant discipline.
  • Experience designing and architecting large-scale ML systems.
  • Familiarity with ML frameworks like TensorFlow, PyTorch, or JAX.
  • Experience in training workflows, hyperparameter tuning, and optimization.
  • Knowledge of MLOps practices and tools such as Ray and MLFlow.
  • Hands-on experience with Kubernetes and Docker.
  • Proficient in building production-quality code with testing and monitoring, mainly using Python or Golang.
  • Comfortable with distributed systems and big data environments.
Responsibilities:
  • Lead the building, testing, and maintenance of ML infrastructure at Reddit.
  • Propose and implement ML platform solutions that enhance model deployment.
  • Design and optimize infrastructure for large-scale machine learning workflows.
  • Analyze and optimize performance and cost-efficiency in distributed systems.
  • Mentor team members on DevOps practices for ML platform components.
Apply