Senior Engineering Manager - Accelerated Compute Memory Systems

Posted 3 months agoViewed

United StatesFull-TimeAI Infrastructure

Location:United States

Languages:English

Seniority level:Senior, 10+ years in software engineering, 5+ years in management

Experience:10+ years in software engineering, 5+ years in management

Skills:

AWSLeadershipPythonCloud ComputingGCPKafkaKubernetesRabbitmqAzureDevOpsMentoringComplianceTeam managementSoftware Engineering

Requirements:

10+ years in software engineering, 5+ years in management roles with large-scale AI/ML systems and infrastructure. Expert-level proficiency in Python and Golang. 5+ years building production distributed systems. Experience with orchestration frameworks (Kubernetes, Ray, Dask). Proficiency with vector databases (Pinecone, Weaviate, Qdrant, or similar). Experience with message queuing systems (Kafka, Pulsar, RabbitMQ). In-depth knowledge and hands on experience building scalable distributed architectures and high-performance compute systems. Proven experience in multimodal ingestion pipelines within RAG platforms. Direct experience in designing, fine-tuning, and optimizing LLMs for ingestion and retrieval workloads. Previous success managing engineering teams delivering production-grade, HPC-scale RAG systems. Deep understanding of infra domains: compute, storage, networking, observability, security, disaster recovery, and cost management. Familiarity with HPC cluster management softwares such as Slurm. Familiarity with cloud platforms (AWS, Azure, GCP) and/or on-prem datacenter operations.

Responsibilities:

Build and lead a team delivering ingestion, retrieval, and inference layers. Architect and deliver horizontally scalable, fault-tolerant systems. Guide implementation of multimodal ingestion pipelines. Oversee design and optimization of LLM-driven data ingestion and retrieval workflows. Own optimization and tuning of high-throughput, low-latency production environments. Establish performance benchmarking, compliance frameworks, and automated testing. Balance technical leadership with people leadership. Collaborate cross-functionally with Product, Executive Leadership, and Customer Success.