Apply

AI DevOps (Remote)

Posted about 6 hours agoViewed

View full description

📍 Location: Spain

🔍 Industry: Digital businesses, AI solutions

🏢 Company: leadtech

🪄 Skills: DockerPythonBashGitKubeflowKubernetesMLFlowGrafanaPrometheusCI/CDDevOps

Requirements:
  • Proven experience as a DevOps Engineer, preferably with exposure to AI/ML workflows and production environments.
  • Strong knowledge of CI/CD pipelines and automated deployment best practices in cloud environments.
  • Hands-on experience with containerization and orchestration tools, particularly Docker and Kubernetes.
  • Proficiency in cloud infrastructure management with an emphasis on optimizing resource usage and costs.
  • Expertise in monitoring tools such as Prometheus, Grafana, or the ELK stack.
  • Proficiency in scripting and automation using Python, Bash, or similar languages.
  • Deep understanding of version control systems and branching strategies (e.g., Git).
  • Experience with MLOps frameworks like MLflow or Kubeflow for model lifecycle management.
  • Knowledge of security best practices in DevOps and cloud environments.
  • Strong communication skills for collaboration with cross-functional teams and stakeholder alignment.
  • Problem-solving mindset with a proactive approach to optimizing operations.
Responsibilities:
  • Design, implement, and manage scalable CI/CD pipelines for the deployment and lifecycle management of AI models.
  • Conduct technical investigations of AI models, focusing on evaluating performance, scalability, and cost-efficiency.
  • Automate the deployment, monitoring, and maintenance of AI applications in production environments.
  • Continuously improve workflows by implementing automation and monitoring best practices.
  • Optimize infrastructure usage and manage costs while ensuring effective cloud resource utilization.
  • Establish security best practices for AI workloads, including data encryption and access controls.
  • Collaborate with cross-functional teams to ensure seamless AI model integration into production workflows.
  • Take ownership of AI model demonstrations during internal sprints and prepare performance showcases.
  • Develop logging and monitoring frameworks to ensure reliability and uptime of AI services.
  • Participate in incident response processes and troubleshoot issues in AI pipelines.
Apply