Senior Platform Engineer - AI Agent Infrastructure

United KingdomFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
4+ years
Required Skills
DockerPostgreSQLKafkaMongoDBRabbitmqGoRedisTerraformDatadog

Requirements

  • 4+ years of experience in platform engineering, infrastructure engineering, SRE, or backend systems roles
  • Strong expertise in event-driven architecture and messaging systems such as Kafka, RabbitMQ, NATS, or similar
  • Deep AWS experience including EC2, VPC, IAM, S3, RDS, and internal networking concepts
  • Solid experience with SQL databases such as PostgreSQL and NoSQL systems such as MongoDB or Redis
  • Strong Docker knowledge including container lifecycle management, health checks, resource limits, and image optimization
  • Proven experience debugging distributed systems, asynchronous flows, and cascading production failures
  • Hands-on experience with Infrastructure as Code tools such as Terraform or Pulumi
  • Strong observability skills using Datadog or equivalent tools for APM, logging, monitoring, and tracing
  • Experience with Go or similar backend programming languages
  • Strong communication skills and ability to lead technical decisions in remote teams

Responsibilities

  • Own and evolve the cloud infrastructure supporting AI agents running at scale in production environments
  • Design and implement event-driven architectures using durable asynchronous messaging systems
  • Improve inter-service communication by replacing synchronous dependencies with scalable messaging patterns
  • Build and maintain infrastructure as code frameworks for provisioning, deployment, and environment consistency
  • Ensure platform reliability, scalability, and performance across distributed workloads
  • Develop advanced observability capabilities including dashboards, alerts, tracing, logging, and health monitoring
  • Lead incident response analysis and proactively improve system resilience based on production learnings
  • Evaluate emerging technologies and drive architectural decisions as the platform matures
  • Optimize databases, storage systems, and caching layers for speed, availability, and cost efficiency
  • Collaborate with engineering teams to support secure and efficient deployment of AI workloads
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now