Principal Data Engineer - AI (REMOTE)

Posted 3 months agoViewed

United StatesFull-TimeSoftware Development

Company:Upbound - Job Posting

Location:United States

Languages:English

Seniority level:Principal, 10+ years

Experience:10+ years

Skills:

LeadershipPythonApache AirflowArtificial IntelligenceCloud ComputingElasticSearchKubernetesMachine LearningSoftware ArchitectureData engineeringSpark

Requirements:

10+ years of software/data engineering experience At least 4 years in technical leadership roles Proven track record building data platforms for production systems at scale Deep expertise in traditional data engineering (Spark, Airflow, data lakes) Deep expertise in ML-specific infrastructure (feature stores, model serving) Experience with vector databases (Pinecone, Weaviate, Qdrant, Milvus, pgvector, Opensearch, ElasticSearch) Demonstrated experience with LLM applications (RAG architectures, semantic search) Understanding of Kubernetes, cloud-native architectures, and infrastructure-as-code principles Strong understanding of data requirements for AI/ML systems Hands-on experience building knowledge bases and semantic search systems Experience with embedding models for code and technical documentation Knowledge of time-series data processing Understanding of graph databases

Responsibilities:

Define and drive the technical vision for data platforms supporting AI-powered features. Lead the design of data pipelines for ML model training datasets. Architect vector search and RAG systems using control planes as a knowledge store. Build data infrastructure for semantic search on resources, extensions, and compositions. Establish frameworks for collecting, processing, and analyzing infrastructure configuration data. Design data pipelines for Crossplane-specific data. Create infrastructure for indexing and searching Upbound Marketplace content and documentation. Develop metrics and monitoring for AI features integrated with Upbound's control plane. Design data systems for AI agents for infrastructure provisioning and operations. Create feature engineering platforms for control plane operations data. Implement data infrastructure for training models on infrastructure failures and resource allocation. Drive the development of knowledge graph representations of infrastructure dependencies.