Software Development Engineer in Test - AI Infrastructure
D
Delos Data IncAI data center clusters
U.S. and CanadaFull-TimeMiddle
Salary140000 - 200000 USD per year
Apply NowOpens the employer's application page
Job Details
- Required Skills
- DockerPythonJenkinsKubernetesGitHub Actions
Requirements
- Strong proficiency in Python (for automation and orchestration).
- Proven experience building or extending test automation frameworks for complex back-end systems.
- Proven ability to troubleshoot automated test failures within complex large-scale distributed systems, and identify root causes.
- Experience with containerization (Docker/Kubernetes).
- Experience with modern CI/CD tools (GitHub Actions, GitLab CI, or Jenkins).
- Bachelor's or Master's degree in Computer Engineering, Computer Science, or a related field.
Responsibilities
- Design, develop, and maintain a robust automated testing framework from the ground up that supports distributed AI training and inference workloads.
- Develop complex test plans that go beyond unit tests, focusing on end-to-end system integration, stress testing, and hardware-software boundary conditions.
- Partner closely with System Engineers to debug deep-seated issues in distributed clusters, using telemetry and profiling tools to identify bottlenecks.
View Full Description & ApplyYou'll be redirected to the employer's site