Software Development Engineer in Test - AI Infrastructure

D
Delos Data IncAI data center clusters
U.S. and CanadaFull-TimeMiddle
Salary140000 - 200000 USD per year
Apply NowOpens the employer's application page

Job Details

Required Skills
DockerPythonJenkinsKubernetesGitHub Actions

Requirements

  • Strong proficiency in Python (for automation and orchestration).
  • Proven experience building or extending test automation frameworks for complex back-end systems.
  • Proven ability to troubleshoot automated test failures within complex large-scale distributed systems, and identify root causes.
  • Experience with containerization (Docker/Kubernetes).
  • Experience with modern CI/CD tools (GitHub Actions, GitLab CI, or Jenkins).
  • Bachelor's or Master's degree in Computer Engineering, Computer Science, or a related field.

Responsibilities

  • Design, develop, and maintain a robust automated testing framework from the ground up that supports distributed AI training and inference workloads.
  • Develop complex test plans that go beyond unit tests, focusing on end-to-end system integration, stress testing, and hardware-software boundary conditions.
  • Partner closely with System Engineers to debug deep-seated issues in distributed clusters, using telemetry and profiling tools to identify bottlenecks.
View Full Description & ApplyYou'll be redirected to the employer's site
140000 - 200000 USD per year
Apply Now