Apply

Staff ML Infrastructure Engineer

Posted 5 days agoViewed

View full description

💸 Salary: 190000.0 - 240000.0 USD per year

🔍 Industry: Software Development

🏢 Company: Engine

Requirements:
  • Hands-on with TensorFlow Serving, TorchServe, or similar frameworks.
  • Build production-grade APIs and integrate model inference into application workflows.
  • Containerize and orchestrate inference services at scale.
Responsibilities:
  • Deploy and operate machine learning models optimized for low-latency, high-throughput inference in production environments.
  • Build and maintain clean gRPC interfaces to expose model predictions to upstream services.
  • Own the production code paths that deliver features to the model—writing maintainable, testable application logic that integrates cleanly with the broader system.
Apply

Related Jobs

Apply

📍 Germany, USA

🔍 Generative image and video models

🏢 Company: Black Forest Labs👥 20-100💰 $30,202,193 Seed 10 months agoArtificial Intelligence (AI)Media and EntertainmentGenerative AISoftware

  • Strong proficiency in cloud platforms (AWS, Azure, or GCP) with focus on ML/AI services.
  • Extensive experience with Kubernetes and Slurm cluster management.
  • Expertise in Infrastructure as Code tools (e.g., Terraform, Ansible).
  • Proven track record in managing and optimizing network-based cloud file systems and object storage.
  • Experience with CI/CD tools and practices (e.g., CircleCI, GitHub Actions, ArgoCD).
  • Strong understanding of security principles and best practices in cloud environments.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Loki).
  • Familiarity with ML workflows and GPU infrastructure management.
  • Demonstrated ability to handle complex migrations and breaking changes in production environments.
  • Design, deploy, and maintain cloud-based ML training (Slurm) and inference (Kubernetes) clusters.
  • Implement and manage network-based cloud file systems and blob/S3 storage solutions.
  • Develop and maintain Infrastructure as Code (IaC) for resource provisioning.
  • Implement and optimize CI/CD pipelines for ML workflows.
  • Design and implement custom autoscaling solutions for ML workloads.
  • Ensure security best practices across the ML infrastructure.
  • Provide developer-friendly tools and practices for efficient ML operations.

AWSGCPKubernetesAzureGrafanaPrometheusCI/CDTerraform

Posted 7 months ago
Apply

Related Articles

Posted about 1 month ago

How to Overcome Burnout While Working Remotely: Practical Strategies for Recovery

Burnout is a silent epidemic among remote workers. The blurred lines between work and home life, coupled with the pressure to always be “on,” can leave even the most dedicated professionals feeling drained. But burnout doesn’t have to define your remote work experience. With the right strategies, you can recover, recharge, and prevent future episodes. Here’s how.



Posted 7 days ago

Top 10 Skills to Become a Successful Remote Worker by 2025

Remote work is here to stay, and by 2025, the competition for remote jobs will be tougher than ever. To stand out, you need more than just basic skills. Employers want people who can adapt, communicate well, and stay productive without constant supervision. Here’s a simple guide to the top 10 skills that will make you a top candidate for remote jobs in the near future.

Posted 9 months ago

Google is gearing up to expand its remote job listings, promising more opportunities across various departments and regions. Find out how this move can benefit job seekers and impact the market.

Posted 10 months ago

Read about the recent updates in remote work policies by major companies, the latest tools enhancing remote work productivity, and predictive statistics for remote work in 2024.

Posted 10 months ago

In-depth analysis of the tech layoffs in 2024, covering the reasons behind the layoffs, comparisons to previous years, immediate impacts, statistics, and the influence on the remote job market. Discover how startups and large tech companies are adapting, and learn strategies for navigating the new dynamics of the remote job market.