Senior Data Pipeline / ML Engineer

New
Remote — Europe-friendly hours, 3–4 hours of daily overlap with Denver (MT, UTC-7)Full-TimeSenior
Salary6500 - 8000 EUR per month
Apply NowOpens the employer's application page

Job Details

Languages
English
Experience
5+ years
Required Skills
PythonSQLGCPAirflowDatabricksNLPPrompt EngineeringLLM

Requirements

  • 5+ years in data engineering
  • Strong Python skills (Pydantic a bonus)
  • Strong SQL skills
  • Experience with cloud data stacks (GCP experience preferred)
  • Experience with orchestration frameworks (Airflow, Dagster, Prefect) or data platforms (Databricks)
  • Production experience designing or integrating AI/LLM agents for data enrichment
  • Experience with structured AI → JSON → database pipelines with error recovery and monitoring
  • Working knowledge of prompt engineering
  • Working knowledge of MCP servers
  • Working knowledge of function calling
  • Working knowledge of embedding-based retrieval
  • Comfort with unstructured data (web pages, PDFs, filings)
  • Experience with NLP-driven structuring pipelines
  • Excellent written communication

Responsibilities

  • Own the data pipeline, entity resolution layer, AI agent orchestration, and quality systems.
  • Manage multi-source data architecture, handling external providers, LLM hygiene agents, and customer-claimed edits with versioning, lineage, and observability.
  • Implement data quality and operations including data contracts, pipeline unit tests, integration testing, confidence scoring, human-in-the-loop validation, anomaly detection, monitoring, alerting, and runbooks.
  • Optimize cost and performance across cloud resources.
  • Develop machine learning and matching systems, including embeddings infrastructure, vector generation, retrieval optimization, semantic search pipelines, reranking, and evaluation frameworks.
  • Design and implement entity resolution and master data management, using deterministic blocking and LLM-based evaluation for match decisions, and handling lifecycle complexity.
  • Model entity relationships and graphs, evaluating and implementing graph query capabilities (Apache AGE, Neo4j, or optimized Postgres patterns).
  • Integrate AI agents, focusing on robust prompting architectures, JSON schema validation, structured AI → JSON → database pipelines with error recovery, and feedback loops.
View Full Description & ApplyYou'll be redirected to the employer's site
6500 - 8000 EUR per month
Apply Now