Data Scientist, AI Data Foundations

New
United StatesFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
4–7 years
Required Skills
PythonSQLMachine LearningDatabricksLangChainPySpark

Requirements

  • 4–7 years of experience in data science, ML engineering, or applied data roles
  • Strong experience building vector stores for RAG or semantic search
  • Experience designing or operating feature stores
  • Hands-on experience with graph databases such as Neo4j, TigerGraph, or Azure Cosmos DB Gremlin
  • Strong programming skills in Python (pandas, NumPy, scikit-learn, PySpark) and SQL
  • Experience in Databricks environments
  • Familiarity with LLM and embedding tooling such as Hugging Face, OpenAI/Azure OpenAI APIs, and LangChain
  • Strong analytical mindset with proven ability to explore complex datasets
  • Solid understanding of core machine learning concepts including evaluation metrics
  • Excellent communication skills

Responsibilities

  • Design, build, and maintain vector stores supporting retrieval-augmented generation systems, including embedding pipelines, chunking strategies, indexing approaches, and retrieval evaluation frameworks
  • Develop and operate feature store architectures ensuring consistency between offline training and online inference, with strong attention to lineage, freshness, and reuse
  • Create and manage graph data models representing relationships across customers, applications, financial products, and outcomes for both AI and analytical use cases
  • Conduct advanced data discovery and exploratory analysis on lending, deposit, and behavioral datasets to identify trends, anomalies, and model-driving features
  • Build and maintain AI-ready curated datasets with strong governance, documentation, and quality controls to support downstream ML and application teams
  • Define and execute evaluation methodologies for vector retrieval quality, embedding performance, feature drift, and graph completeness
  • Collaborate closely with ML engineers and applied scientists to ensure data infrastructure aligns with modeling and product needs
  • Ensure responsible data usage by partnering with governance and compliance teams to enforce data privacy, security, and regulatory standards
  • Communicate insights from data discovery through dashboards, notebooks, and structured narratives for technical and non-technical stakeholders
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now