ApplyAI DevOps (Remote)
Posted about 6 hours agoViewed
View full description
📍 Location: Spain
🔍 Industry: Digital businesses, AI solutions
🏢 Company: leadtech
🪄 Skills: DockerPythonBashGitKubeflowKubernetesMLFlowGrafanaPrometheusCI/CDDevOps
Requirements:
- Proven experience as a DevOps Engineer, preferably with exposure to AI/ML workflows and production environments.
- Strong knowledge of CI/CD pipelines and automated deployment best practices in cloud environments.
- Hands-on experience with containerization and orchestration tools, particularly Docker and Kubernetes.
- Proficiency in cloud infrastructure management with an emphasis on optimizing resource usage and costs.
- Expertise in monitoring tools such as Prometheus, Grafana, or the ELK stack.
- Proficiency in scripting and automation using Python, Bash, or similar languages.
- Deep understanding of version control systems and branching strategies (e.g., Git).
- Experience with MLOps frameworks like MLflow or Kubeflow for model lifecycle management.
- Knowledge of security best practices in DevOps and cloud environments.
- Strong communication skills for collaboration with cross-functional teams and stakeholder alignment.
- Problem-solving mindset with a proactive approach to optimizing operations.
Responsibilities:
- Design, implement, and manage scalable CI/CD pipelines for the deployment and lifecycle management of AI models.
- Conduct technical investigations of AI models, focusing on evaluating performance, scalability, and cost-efficiency.
- Automate the deployment, monitoring, and maintenance of AI applications in production environments.
- Continuously improve workflows by implementing automation and monitoring best practices.
- Optimize infrastructure usage and manage costs while ensuring effective cloud resource utilization.
- Establish security best practices for AI workloads, including data encryption and access controls.
- Collaborate with cross-functional teams to ensure seamless AI model integration into production workflows.
- Take ownership of AI model demonstrations during internal sprints and prepare performance showcases.
- Develop logging and monitoring frameworks to ensure reliability and uptime of AI services.
- Participate in incident response processes and troubleshoot issues in AI pipelines.
Apply