Senior ML Infrastructure / DevOps Engineer

Posted about 2 hours agoViewed
EU, United States, CanadaFull-TimeAI Startup
Company:Pathway
Location:EU, United States, Canada
Languages:English
Seniority level:Senior, 5+ years
Experience:5+ years
Skills:
AWSDockerPythonBashGCPJenkinsKubeflowKubernetesMLFlowPyTorchAirflowAzureGrafanaPrometheusTensorflowCI/CDLinuxDevOpsTerraform
Requirements:
Former or current Linux / systems / network administrator comfortable living in the shell 5+ years of experience in DevOps/SRE/Platform/Infrastructure roles running production systems Deep familiarity with Linux as a daily driver, including shell scripting Strong experience with workload management, containerization, and orchestration (Slurm, Docker, Kubernetes) in production environments Solid understanding of CI/CD tools and workflows Hands-on cloud infrastructure experience (AWS, GCP, Azure) Proficiency with infrastructure as code (Terraform, CloudFormation, or similar) Experience with monitoring and logging stacks (Grafana, Prometheus, Loki, CloudWatch, or equivalents) Familiarity with ML pipeline and experiment orchestration tools Solid programming skills in Python Ability to read and debug code that uses common ML libraries (PyTorch, TensorFlow)
Responsibilities:
Design, operate, and scale GPU and CPU clusters for ML training and inference Automate infrastructure provisioning and configuration using infrastructure-as-code Build and maintain robust ML pipelines Implement and evolve ML-centric CI/CD Own monitoring, logging, and alerting across training and serving Work with terabyte-scale datasets and associated challenges Partner closely with ML engineers and researchers to productionize their work Participate in on-call rotation for critical ML infrastructure and lead incident response
Similar Jobs:
Posted 37 minutes ago
United StatesFull-TimeSoftware Development
Senior Solutions Engineer | Atlanta | Remote
Company:Grafana Labs
Posted 41 minutes ago
RomaniaFull-TimeSoftware Development
Senior Software Engineer, Connectors team
Company:
Posted about 1 hour ago
PortugalFull-TimeSoftware Development
Senior Software Engineer (React/Node)
Company:YLD