Solutions Architect, AI Infrastructure
New
CanadaFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- DockerFGPA ArchitectureKubernetesLinux
Requirements
- Bachelor’s, Master’s, or PhD in Engineering, Computer Science, Mathematics, or related field (or equivalent experience).
- 5+ years of experience in solution architecture, systems engineering, cloud engineering, or technical pre-sales roles.
- Strong understanding of GPU/CPU server architecture and system-level computing environments.
- Experience with data center networking (Ethernet, InfiniBand), storage, and infrastructure design.
- Knowledge of Linux systems, kernel-level concepts, and system software.
- Familiarity with DevOps/MLOps tools such as Docker, Kubernetes, and containerized environments.
- Experience troubleshooting distributed systems and performance optimization in compute clusters.
- Strong communication, presentation, and stakeholder management skills.
- Ability to manage multiple priorities in fast-paced, customer-driven environments.
- Hands-on experience with AI/HPC infrastructure deployment is highly valued.
- Familiarity with GPU ecosystems, networking technologies, and cluster management tools is a strong asset.
Responsibilities
- Lead end-to-end design and deployment of large-scale GPU-based AI and HPC infrastructure.
- Serve as the primary technical advisor for customers across architecture, deployment, and optimization phases.
- Collaborate with cloud partners to design and support data center deployments, including compute, storage, and networking systems.
- Guide customers through server, cluster, and network bring-up processes, including on-site support when required.
- Troubleshoot and optimize compute and networking performance across GPU clusters.
- Partner with engineering, product, and account teams to align on technical strategy and customer needs.
- Deliver technical presentations, workshops, and reference architectures to customers and stakeholders.
- Support evaluation and adoption of new technologies and contribute feedback to product roadmap discussions.
- Ensure infrastructure designs meet performance, scalability, and reliability requirements in production environments.
View Full Description & ApplyYou'll be redirected to the employer's site