Senior Platform Engineer - AI Agent Infrastructure
United KingdomFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 4+ years
- Required Skills
- DockerPostgreSQLKafkaMongoDBRabbitmqGoRedisTerraformDatadog
Requirements
- 4+ years of experience in platform engineering, infrastructure engineering, SRE, or backend systems roles
- Strong expertise in event-driven architecture and messaging systems such as Kafka, RabbitMQ, NATS, or similar
- Deep AWS experience including EC2, VPC, IAM, S3, RDS, and internal networking concepts
- Solid experience with SQL databases such as PostgreSQL and NoSQL systems such as MongoDB or Redis
- Strong Docker knowledge including container lifecycle management, health checks, resource limits, and image optimization
- Proven experience debugging distributed systems, asynchronous flows, and cascading production failures
- Hands-on experience with Infrastructure as Code tools such as Terraform or Pulumi
- Strong observability skills using Datadog or equivalent tools for APM, logging, monitoring, and tracing
- Experience with Go or similar backend programming languages
- Strong communication skills and ability to lead technical decisions in remote teams
Responsibilities
- Own and evolve the cloud infrastructure supporting AI agents running at scale in production environments
- Design and implement event-driven architectures using durable asynchronous messaging systems
- Improve inter-service communication by replacing synchronous dependencies with scalable messaging patterns
- Build and maintain infrastructure as code frameworks for provisioning, deployment, and environment consistency
- Ensure platform reliability, scalability, and performance across distributed workloads
- Develop advanced observability capabilities including dashboards, alerts, tracing, logging, and health monitoring
- Lead incident response analysis and proactively improve system resilience based on production learnings
- Evaluate emerging technologies and drive architectural decisions as the platform matures
- Optimize databases, storage systems, and caching layers for speed, availability, and cost efficiency
- Collaborate with engineering teams to support secure and efficient deployment of AI workloads
View Full Description & ApplyYou'll be redirected to the employer's site