Senior AI Infrastructure Engineer - AI Care Platform

Europe based - RemoteFull-TimeSenior

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Languages: English
Experience: 5+ years
Required Skills: AWSPythonElasticSearchGCPKafkaKibanaKubernetesMySQLAzureGoGrafanaPrometheusRedisWebRTCLinuxTerraformLLM

Requirements

5+ years of experience in infrastructure engineering
At least 2 years focused on AI/ML workloads in production environments
Strong experience with Kubernetes for orchestrating GPU-accelerated workloads, including scheduling, resource management, and autoscaling for inference services
Hands-on experience with model serving and inference optimization frameworks for both real-time computer vision and large language model workloads
Solid understanding of LLM inference optimization techniques, including speculative decoding, batching strategies, quantization, and inference scaling patterns
Experience provisioning and managing infrastructure for real-time AI systems, including WebRTC clusters and AI agent architectures
Familiarity with real-time video/computer vision inference pipelines and the infrastructure challenges of processing continuous visual data streams at low latency
Familiarity with speech-to-text and text-to-speech serving infrastructure and the challenges of running voice AI at low latency
Experience with Infrastructure as Code (Terraform or similar) and GitOps methodologies for managing complex, GPU-enabled environments
Working knowledge of GPU infrastructure - NVIDIA CUDA ecosystem, multi-GPU setups, and GPU monitoring/profiling
Strong Linux systems fundamentals and networking knowledge, particularly for latency-sensitive, real-time workloads
Fluent in English (written and oral)

Responsibilities

Design, build, and maintain the inference infrastructure that powers Sword Health's AI products, ensuring models are served with high throughput, low latency, and cost efficiency.
Own the end-to-end deployment pipeline for AI models - from real-time computer vision powering movement analysis to large language models driving conversational AI experiences.
Architect and scale Kubernetes clusters for GPU-accelerated workloads, including autoscaling strategies, resource scheduling, and multi-model serving.
Build and operate the infrastructure behind Sword Health's real-time AI agents, including WebRTC cluster provisioning and deploying speech-to-text and text-to-speech capabilities at low latency.
Drive inference scaling strategies - evaluate and implement techniques such as speculative decoding, continuous batching, and model parallelism to meet growing demand without proportionally increasing costs.
Develop and maintain Infrastructure as Code (Terraform) and GitOps workflows tailored to GPU-enabled, AI-specific environments.
Instrument and monitor AI inference systems, building observability around GPU utilization, model latency, throughput, and error rates to ensure reliability and performance.
Collaborate closely with ML Engineers, Data Scientists, and Product teams to translate model requirements into robust, production-ready infrastructure.
Evaluate emerging AI infrastructure tools, frameworks, and hardware to keep Sword Health at the cutting edge of inference performance and efficiency.
Mentor team members on AI infrastructure best practices, fostering knowledge sharing around GPU workloads, model serving patterns, and production ML systems.

View Full Description & ApplyYou'll be redirected to the employer's site