Senior AI Infrastructure Engineer - AI Care Platform

Europe based - RemoteFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
5+ years
Required Skills
AWSPythonElasticSearchGCPKafkaKibanaKubernetesMySQLAzureGoGrafanaPrometheusRedisWebRTCLinuxTerraformLLM

Requirements

  • 5+ years of experience in infrastructure engineering
  • At least 2 years focused on AI/ML workloads in production environments
  • Strong experience with Kubernetes for orchestrating GPU-accelerated workloads, including scheduling, resource management, and autoscaling for inference services
  • Hands-on experience with model serving and inference optimization frameworks for both real-time computer vision and large language model workloads
  • Solid understanding of LLM inference optimization techniques, including speculative decoding, batching strategies, quantization, and inference scaling patterns
  • Experience provisioning and managing infrastructure for real-time AI systems, including WebRTC clusters and AI agent architectures
  • Familiarity with real-time video/computer vision inference pipelines and the infrastructure challenges of processing continuous visual data streams at low latency
  • Familiarity with speech-to-text and text-to-speech serving infrastructure and the challenges of running voice AI at low latency
  • Experience with Infrastructure as Code (Terraform or similar) and GitOps methodologies for managing complex, GPU-enabled environments
  • Working knowledge of GPU infrastructure - NVIDIA CUDA ecosystem, multi-GPU setups, and GPU monitoring/profiling
  • Strong Linux systems fundamentals and networking knowledge, particularly for latency-sensitive, real-time workloads
  • Fluent in English (written and oral)

Responsibilities

  • Design, build, and maintain the inference infrastructure that powers Sword Health's AI products, ensuring models are served with high throughput, low latency, and cost efficiency.
  • Own the end-to-end deployment pipeline for AI models - from real-time computer vision powering movement analysis to large language models driving conversational AI experiences.
  • Architect and scale Kubernetes clusters for GPU-accelerated workloads, including autoscaling strategies, resource scheduling, and multi-model serving.
  • Build and operate the infrastructure behind Sword Health's real-time AI agents, including WebRTC cluster provisioning and deploying speech-to-text and text-to-speech capabilities at low latency.
  • Drive inference scaling strategies - evaluate and implement techniques such as speculative decoding, continuous batching, and model parallelism to meet growing demand without proportionally increasing costs.
  • Develop and maintain Infrastructure as Code (Terraform) and GitOps workflows tailored to GPU-enabled, AI-specific environments.
  • Instrument and monitor AI inference systems, building observability around GPU utilization, model latency, throughput, and error rates to ensure reliability and performance.
  • Collaborate closely with ML Engineers, Data Scientists, and Product teams to translate model requirements into robust, production-ready infrastructure.
  • Evaluate emerging AI infrastructure tools, frameworks, and hardware to keep Sword Health at the cutting edge of inference performance and efficiency.
  • Mentor team members on AI infrastructure best practices, fostering knowledge sharing around GPU workloads, model serving patterns, and production ML systems.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now