Database Engineering Lead Engineer

New
Based in Mexico, Mexico timezone alignmentFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
8+ years of experience in platform engineering, infrastructure, or database engineering roles
Required Skills
AWSPostgreSQLKubernetesRedisCI/CDTerraformLLM

Requirements

  • 8+ years of experience in platform engineering, infrastructure, or database engineering roles.
  • At least 3+ years of hands-on experience in AI/ML infrastructure or production ML systems.
  • Strong experience with model serving frameworks such as SageMaker, Bedrock, vLLM, or TGI.
  • Deep knowledge of vector databases and search systems, including OpenSearch k-NN indexing and embedding optimization.
  • Strong Kubernetes (EKS) experience, including GPU workloads, autoscaling, and distributed system operations.
  • Experience designing and operating cloud-native databases such as PostgreSQL, Redis, and document stores at scale.
  • Strong understanding of LLM application patterns including retrieval systems, memory management, and agent frameworks (LangChain, LlamaIndex).
  • Experience building infrastructure as code using Terraform, CloudFormation, or similar tools.
  • Strong expertise in monitoring, observability, and incident response in production environments.
  • Solid understanding of database security, encryption, and access control best practices.
  • Ability to operate in a high-responsibility, on-call production environment with global coverage requirements.

Responsibilities

  • Architect, build, and operate production-grade AI/ML and database infrastructure supporting large-scale AI applications.
  • Own the full lifecycle of database systems including OpenSearch, DocumentDB, Aurora PostgreSQL, and Redis across performance, scaling, and disaster recovery.
  • Design and implement infrastructure as code using Terraform, Crossplane, and CloudFormation for cloud-native environments.
  • Develop and maintain CI/CD pipelines for ML systems, including automated testing and model validation workflows.
  • Implement monitoring, logging, and alerting systems using CloudWatch, Grafana, and related observability tools.
  • Optimize vector search and embedding systems for retrieval-augmented generation (RAG) use cases.
  • Support Kubernetes-based ML workloads including GPU scaling, service mesh, and performance tuning.
  • Ensure database security through encryption, IAM policies, TLS configurations, and fine-grained access controls.
  • Participate in rotating on-call support for production systems operating in a 24x7 environment.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now