Database Engineering Lead Engineer
New
Based in Mexico, Mexico timezone alignmentFull-TimeLead
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ years of experience in platform engineering, infrastructure, or database engineering roles
- Required Skills
- AWSPostgreSQLKubernetesRedisCI/CDTerraformLLM
Requirements
- 8+ years of experience in platform engineering, infrastructure, or database engineering roles.
- At least 3+ years of hands-on experience in AI/ML infrastructure or production ML systems.
- Strong experience with model serving frameworks such as SageMaker, Bedrock, vLLM, or TGI.
- Deep knowledge of vector databases and search systems, including OpenSearch k-NN indexing and embedding optimization.
- Strong Kubernetes (EKS) experience, including GPU workloads, autoscaling, and distributed system operations.
- Experience designing and operating cloud-native databases such as PostgreSQL, Redis, and document stores at scale.
- Strong understanding of LLM application patterns including retrieval systems, memory management, and agent frameworks (LangChain, LlamaIndex).
- Experience building infrastructure as code using Terraform, CloudFormation, or similar tools.
- Strong expertise in monitoring, observability, and incident response in production environments.
- Solid understanding of database security, encryption, and access control best practices.
- Ability to operate in a high-responsibility, on-call production environment with global coverage requirements.
Responsibilities
- Architect, build, and operate production-grade AI/ML and database infrastructure supporting large-scale AI applications.
- Own the full lifecycle of database systems including OpenSearch, DocumentDB, Aurora PostgreSQL, and Redis across performance, scaling, and disaster recovery.
- Design and implement infrastructure as code using Terraform, Crossplane, and CloudFormation for cloud-native environments.
- Develop and maintain CI/CD pipelines for ML systems, including automated testing and model validation workflows.
- Implement monitoring, logging, and alerting systems using CloudWatch, Grafana, and related observability tools.
- Optimize vector search and embedding systems for retrieval-augmented generation (RAG) use cases.
- Support Kubernetes-based ML workloads including GPU scaling, service mesh, and performance tuning.
- Ensure database security through encryption, IAM policies, TLS configurations, and fine-grained access controls.
- Participate in rotating on-call support for production systems operating in a 24x7 environment.
View Full Description & ApplyYou'll be redirected to the employer's site