Compiler Architect

Posted 18 days agoViewed

United States, CanadaFull-TimeSoftware Development

Company:d-Matrix

Location:United States, Canada, EST, PST

Languages:English

Seniority level:Lead, 12+ years

Experience:12+ years

Skills:

LeadershipPythonSoftware DevelopmentArtificial IntelligenceCloud ComputingFGPA ArchitectureMachine LearningSoftware ArchitectureC++AlgorithmsData StructuresCI/CDDevOpsMentoring

Requirements:

BS 15+ Yrs / MS 12+ Yrs / PhD 10+ Yrs Computer Science or Electrical Engineering, with 12+ years of experience in Front End Compiler and systems software development, with a focus on ML inference. Deep experience in designing or leading compiler efforts using MLIR, LLVM, Torch-MLIR, or similar frameworks. Strong understanding of model optimization for inference: quantization, fusion, tensor layout transformation, memory hierarchy utilization, and scheduling. Expertise in deploying ML models to heterogeneous compute environments, with specific attention to latency, throughput, and resource scaling in cloud systems. Proven track record working with AI frameworks (e.g., PyTorch, TensorFlow), ONNX, and hardware backends. Experience with cloud infrastructure, including resource provisioning, distributed execution, and profiling tools. Experience targeting inference accelerators (AI ASICs, FPGAs, GPUs) in cloud-scale deployments. Knowledge of cloud deployment orchestration (e.g., Kubernetes, containerized AI workloads). Strong leadership skills with experience mentoring teams and collaborating with large-scale software and hardware organizations. Excellent written and verbal communication; capable of presenting complex compiler architectures and trade-offs to both technical and executive stakeholders.

Responsibilities:

Architect the MLIR-based compiler for cloud inference workloads, focusing on efficient mapping of large-scale AI models onto distributed compute and memory hierarchies. Lead the development of compiler passes for model partitioning, operator fusion, tensor layout optimization, memory tiling, and latency-aware scheduling. Design support for hybrid offline/online compilation and deployment flows with runtime-aware mapping, allowing for adaptive resource utilization and load balancing in cloud scenarios. Define compiler abstractions that interoperate efficiently with runtime systems, orchestration layers, and cloud deployment frameworks. Drive scalability, reproducibility, and performance through well-designed IR transformations and distributed execution strategies. Mentor and guide a team of compiler engineers to deliver high-performance inference-optimized software stacks.