Apply

AI Tools Site Reliability Engineer - Graphic

Posted 2024-09-12

View full description

💎 Seniority level: Middle, 3+ years

📍 Location: US, China

🔍 Industry: AI tools

🏢 Company: MyShell👥 11-50💰 $5.6m Seed on 2023-10-16

⏳ Experience: 3+ years

🪄 Skills: AWSPythonGCPMachine LearningAzureGrafanaPrometheusCommunication SkillsCollaborationCI/CDLinux

Requirements:
  • Bachelor's degree or higher in Computer Science, Software Engineering, or related fields.
  • 3+ years in operations engineering or related roles, with a preference for AI tools experience.
  • Proficiency in Linux.
  • Experience with monitoring tools (e.g., Prometheus, Grafana, ELK).
  • Experience with automation tools (e.g., Ansible, Puppet, Chef).
  • Scripting skills (e.g., Python, Shell).
  • Familiarity with cloud platforms (e.g., AWS, Azure, GCP).
  • Experience with AI tools (e.g., ControlNet).
  • Strong communication, teamwork, analytical, and problem-solving skills.
  • Ability to work under pressure, strong sense of responsibility, and a proactive attitude towards continuous learning.
Responsibilities:
  • Oversee daily operations of AI tools, including server, database, and network maintenance.
  • Monitor system performance and address issues promptly.
  • Manage deployment and version releases of AI tools.
  • Implement CI/CD processes for automated deployments.
  • Address and resolve issues in AI tools, analyze logs and monitoring data to find root causes, and propose solutions.
  • Enhance deployment architecture, improve efficiency and stability, and implement performance tuning strategies.
  • Ensure tool security, conduct regular assessments, fix vulnerabilities, and implement data protection and backup strategies.
  • Work closely with development teams, contribute to system design and optimization, and maintain operations documentation.
Apply