Apply

Senior Machine Learning Engineer - (GenAI)

Posted 10 days agoViewed

View full description

💎 Seniority level: Senior

📍 Location: Australia

🔍 Industry: AI

🏢 Company: Leonardo.Ai

🗣️ Languages: English

🪄 Skills: PythonMachine LearningPyTorchCI/CD

Requirements:
  • Proven experience deploying diffusion-based models (e.g. latent diffusion, LoRA, ControlNet) into production environments, ideally across dozens or hundreds of GPUs.
  • Proficiency in Python and PyTorch, with a focus on optimised inference, model tuning, and memory-efficient execution.
  • Familiarity with model deployment tools and practices (e.g. model registries, workflow orchestration, CI/CD for ML).
  • Comfort with performance trade-offs, debugging large-scale systems, and delivering improvements fast.
  • Experience working in fast-moving, cross-functional teams shipping real-world AI products.
  • Ability to pivot quickly between deep technical work, product needs, and cross-functional alignment.
Responsibilities:
  • Build and maintain robust production pipelines that deploy generative models across multiple services, each with 100s of GPUs
  • Contribute to one of the world’s highest-throughput GenAI systems, generating millions of images and videos daily.
  • Utilise quantisation, compilation, caching, distillation, and multi-GPU parallelism to enhance throughput, latency, and stability.
  • Collaborate closely with researchers to productionise new capabilities, such as LoRAs, ControlNets, and custom architectures.
  • Tackle a wide range of problems — from orchestrating massive multi-GPU video pipelines to optimising end-to-end latency and hardening scalable workloads to run at global scale.
Apply

Related Jobs

Apply

📍 Australia

🧭 Full-Time

🔍 AI

🏢 Company: Leonardo.Ai

  • Strong experience building and managing MLOps pipelines using frameworks like Kubeflow, MLflow, or similar.
  • Proficiency in Python, focusing on writing high-performance, maintainable code.
  • Hands-on experience with AWS services (e.g., S3, EC2, SageMaker), and infrastructure-as-code tools like Terraform.
  • Deep understanding of Docker and container orchestration tools like Kubernetes.
  • Experience designing scalable ETL pipelines and working with SQL and NoSQL databases.
  • Design, build, and maintain robust MLOps pipelines to support the end-to-end lifecycle of machine learning models, including data preparation, training, deployment, monitoring, and retraining.
  • Integrate ComfyUI nodes and other workflow tools into the MLOps ecosystem, optimising for performance and scalability.
  • Collaborate with DevOps teams to implement and manage cloud infrastructure, focusing on AWS (e.g., S3, EC2, SageMaker) using tools like Terraform and CloudFormation.
  • Implement CI/CD pipelines tailored for machine learning workflows, ensuring smooth transitions from research to production.
  • Design and maintain scalable data pipelines for collecting, processing, and storing large volumes of data.
  • Automate data acquisition and preprocessing workflows, optimising I/O bandwidth and implementing efficient storage solutions.
  • Manage data integrity and ensure compliance with privacy and security standards.
  • Deploy machine learning models to production, ensuring robustness, scalability, and low latency.
  • Implement monitoring solutions for deployed models to track performance metrics, detect drift, and trigger retraining pipelines.
  • Continuously optimise inference performance using techniques like model quantisation, distillation, or caching strategies.
  • Work closely with cross-functional teams, including AI researchers, data engineers, and software developers, to support ongoing projects and align MLOps efforts with organisational goals.
  • Proactively identify opportunities to streamline and automate workflows, driving innovation and efficiency.
  • Operate independently to manage deadlines, deliverables, and high-quality solutions in a dynamic environment.

AWSDockerPythonSQLETLKubeflowKubernetesMachine LearningMLFlowData engineeringGrafanaPrometheusREST APICI/CDDevOpsTerraform

Posted 16 days ago
Apply