Data Scientist, AI Data Foundations
New
United StatesFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 4–7 years
- Required Skills
- PythonSQLMachine LearningDatabricksLangChainPySpark
Requirements
- 4–7 years of experience in data science, ML engineering, or applied data roles
- Strong experience building vector stores for RAG or semantic search
- Experience designing or operating feature stores
- Hands-on experience with graph databases such as Neo4j, TigerGraph, or Azure Cosmos DB Gremlin
- Strong programming skills in Python (pandas, NumPy, scikit-learn, PySpark) and SQL
- Experience in Databricks environments
- Familiarity with LLM and embedding tooling such as Hugging Face, OpenAI/Azure OpenAI APIs, and LangChain
- Strong analytical mindset with proven ability to explore complex datasets
- Solid understanding of core machine learning concepts including evaluation metrics
- Excellent communication skills
Responsibilities
- Design, build, and maintain vector stores supporting retrieval-augmented generation systems, including embedding pipelines, chunking strategies, indexing approaches, and retrieval evaluation frameworks
- Develop and operate feature store architectures ensuring consistency between offline training and online inference, with strong attention to lineage, freshness, and reuse
- Create and manage graph data models representing relationships across customers, applications, financial products, and outcomes for both AI and analytical use cases
- Conduct advanced data discovery and exploratory analysis on lending, deposit, and behavioral datasets to identify trends, anomalies, and model-driving features
- Build and maintain AI-ready curated datasets with strong governance, documentation, and quality controls to support downstream ML and application teams
- Define and execute evaluation methodologies for vector retrieval quality, embedding performance, feature drift, and graph completeness
- Collaborate closely with ML engineers and applied scientists to ensure data infrastructure aligns with modeling and product needs
- Ensure responsible data usage by partnering with governance and compliance teams to enforce data privacy, security, and regulatory standards
- Communicate insights from data discovery through dashboards, notebooks, and structured narratives for technical and non-technical stakeholders
View Full Description & ApplyYou'll be redirected to the employer's site