Senior AI Infrastructure Engineer - AI Care Platform
Europe based - RemoteFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 5+ years
- Required Skills
- AWSPythonElasticSearchGCPKafkaKibanaKubernetesMySQLAzureGoGrafanaPrometheusRedisWebRTCLinuxTerraformLLM
Requirements
- 5+ years of experience in infrastructure engineering
- At least 2 years focused on AI/ML workloads in production environments
- Strong experience with Kubernetes for orchestrating GPU-accelerated workloads, including scheduling, resource management, and autoscaling for inference services
- Hands-on experience with model serving and inference optimization frameworks for both real-time computer vision and large language model workloads
- Solid understanding of LLM inference optimization techniques, including speculative decoding, batching strategies, quantization, and inference scaling patterns
- Experience provisioning and managing infrastructure for real-time AI systems, including WebRTC clusters and AI agent architectures
- Familiarity with real-time video/computer vision inference pipelines and the infrastructure challenges of processing continuous visual data streams at low latency
- Familiarity with speech-to-text and text-to-speech serving infrastructure and the challenges of running voice AI at low latency
- Experience with Infrastructure as Code (Terraform or similar) and GitOps methodologies for managing complex, GPU-enabled environments
- Working knowledge of GPU infrastructure - NVIDIA CUDA ecosystem, multi-GPU setups, and GPU monitoring/profiling
- Strong Linux systems fundamentals and networking knowledge, particularly for latency-sensitive, real-time workloads
- Fluent in English (written and oral)
Responsibilities
- Design, build, and maintain the inference infrastructure that powers Sword Health's AI products, ensuring models are served with high throughput, low latency, and cost efficiency.
- Own the end-to-end deployment pipeline for AI models - from real-time computer vision powering movement analysis to large language models driving conversational AI experiences.
- Architect and scale Kubernetes clusters for GPU-accelerated workloads, including autoscaling strategies, resource scheduling, and multi-model serving.
- Build and operate the infrastructure behind Sword Health's real-time AI agents, including WebRTC cluster provisioning and deploying speech-to-text and text-to-speech capabilities at low latency.
- Drive inference scaling strategies - evaluate and implement techniques such as speculative decoding, continuous batching, and model parallelism to meet growing demand without proportionally increasing costs.
- Develop and maintain Infrastructure as Code (Terraform) and GitOps workflows tailored to GPU-enabled, AI-specific environments.
- Instrument and monitor AI inference systems, building observability around GPU utilization, model latency, throughput, and error rates to ensure reliability and performance.
- Collaborate closely with ML Engineers, Data Scientists, and Product teams to translate model requirements into robust, production-ready infrastructure.
- Evaluate emerging AI infrastructure tools, frameworks, and hardware to keep Sword Health at the cutting edge of inference performance and efficiency.
- Mentor team members on AI infrastructure best practices, fostering knowledge sharing around GPU workloads, model serving patterns, and production ML systems.
View Full Description & ApplyYou'll be redirected to the employer's site