fal

Private Company
ShareTweet

Open Positions3

This role will need to be based in IndiaAustraliaOr New ZealandFull-TimeGenerative AIPosted
  • Own availability, latency, and throughput SLOs across a large fleet of generative media model APIs serving production traffic at scale.
  • Build the monitoring, alerting, and observability needed to catch ML-specific failures, output quality degradation, and model regressions.
  • Harden model deployment workflows with canary releases, shadow testing, automated rollbacks, and validation gates.
  • Drive the security posture of the model fleet, including abuse detection, rate limiting, and protection against adversarial usage.
  • Operationalize safety systems for generative media, content moderation pipelines, and guardrails.
  • Lead incident response for model API outages, conduct postmortems, and drive engineering improvements to prevent recurrence.
  • Improve capacity planning, autoscaling, and GPU fleet efficiency for inference workloads.
  • Partner with model and infrastructure teams to integrate reliability and safety requirements into model onboarding.
PythonKubernetesMachine Learning+2 more
Showing 1 of 3 positions

Similar Companies