Black Forest Labs

Related companies:

Jobs at this company:

Apply

📍 Germany, USA

🔍 Generative image and video models

  • Familiarity with effective techniques in optimizing inference and training workloads.
  • Knowledge in optimizing for both memory-bound and compute-bound operations.
  • Understanding of GPU memory hierarchy and computation capabilities.
  • Deep understanding of efficient attention algorithms.
  • Experience implementing forward and backward Triton kernels with a focus on correctness and floating-point errors.
  • Ability to integrate custom-written kernels into a PyTorch framework using tools like pybind.

  • Finding ideal training strategies for various model sizes and compute loads.
  • Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight.
  • Reasoning about speed and quality trade-offs of quantization for model inference.
  • Developing and improving low-level kernel optimizations for state-of-the-art inference and training.
  • Innovating new ideas to maximize GPU performance.

PythonSoftware DevelopmentArtificial IntelligenceGitMachine LearningNumpyPyTorchAlgorithmsGoLinux

Posted 2024-11-07
Apply
Apply

📍 Germany, USA

🔍 Generative image and video models

  • Strong proficiency in Python and its ecosystem for machine learning, data analysis, and web development
  • Extensive experience with RESTful API development and deployment for ML tasks
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes)
  • Knowledge of cloud platforms (AWS, GCP, or Azure) for deploying and scaling ML services
  • Proven track record in rapid ML model prototyping using tools like Streamlit or Gradio
  • Experience with distributed task queues and scalable model serving architectures
  • Understanding of monitoring, logging, and observability best practices for ML systems

  • Develop and maintain robust APIs for serving machine learning models
  • Transform research models into production-ready demos and MVPs
  • Optimize model inference for improved performance and scalability
  • Implement and manage user preference data acquisition systems
  • Ensure high availability and reliability of model serving infrastructure
  • Collaborate with ML researchers to rapidly prototype and deploy new models

AWSDockerPythonData AnalysisFrontend DevelopmentGCPKubernetesMachine LearningVue.JsAzureData analysisAngularReactVue.jsCI/CD

Posted 2024-11-07
Apply
Apply

📍 Germany, USA

🔍 Generative image and video models

  • Strong proficiency in cloud platforms (AWS, Azure, or GCP) with focus on ML/AI services.
  • Extensive experience with Kubernetes and Slurm cluster management.
  • Expertise in Infrastructure as Code tools (e.g., Terraform, Ansible).
  • Proven track record in managing and optimizing network-based cloud file systems and object storage.
  • Experience with CI/CD tools and practices (e.g., CircleCI, GitHub Actions, ArgoCD).
  • Strong understanding of security principles and best practices in cloud environments.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Loki).
  • Familiarity with ML workflows and GPU infrastructure management.
  • Demonstrated ability to handle complex migrations and breaking changes in production environments.

  • Design, deploy, and maintain cloud-based ML training (Slurm) and inference (Kubernetes) clusters.
  • Implement and manage network-based cloud file systems and blob/S3 storage solutions.
  • Develop and maintain Infrastructure as Code (IaC) for resource provisioning.
  • Implement and optimize CI/CD pipelines for ML workflows.
  • Design and implement custom autoscaling solutions for ML workloads.
  • Ensure security best practices across the ML infrastructure.
  • Provide developer-friendly tools and practices for efficient ML operations.

AWSGCPKubernetesAzureGrafanaPrometheusCI/CDTerraform

Posted 2024-11-07
Apply
Apply

📍 Germany, USA

🔍 Generative image and video models

  • Training large scale Diffusion models for image and video data.
  • Finetuning Diffusion models for applications like upscalers and in/out painting models.
  • Deep understanding of evaluating image and video generative models.
  • Strong proficiency in PyTorch and other NN architectures.
  • Understanding of training techniques such as FSDP, low precision training, and model parallelism.

  • Training large scale Diffusion (transformer) models for image and video.
  • Rigorously ablating design choices and communicating results & decisions with the broader team.
  • Reasoning about the speed and quality trade-offs of neural network architectures.

PythonSoftware DevelopmentGitJavaJavascriptMachine LearningNumpyPyTorchJavaScriptAlgorithmsGo

Posted 2024-11-07
Apply