Senior Data Pipeline / ML Engineer
New
Remote — Europe-friendly hours, 3–4 hours of daily overlap with Denver (MT, UTC-7)Full-TimeSenior
Salary6500 - 8000 EUR per month
Apply NowOpens the employer's application page
Job Details
- Languages
- English
- Experience
- 5+ years
- Required Skills
- PythonSQLGCPAirflowDatabricksNLPPrompt EngineeringLLM
Requirements
- 5+ years in data engineering
- Strong Python skills (Pydantic a bonus)
- Strong SQL skills
- Experience with cloud data stacks (GCP experience preferred)
- Experience with orchestration frameworks (Airflow, Dagster, Prefect) or data platforms (Databricks)
- Production experience designing or integrating AI/LLM agents for data enrichment
- Experience with structured AI → JSON → database pipelines with error recovery and monitoring
- Working knowledge of prompt engineering
- Working knowledge of MCP servers
- Working knowledge of function calling
- Working knowledge of embedding-based retrieval
- Comfort with unstructured data (web pages, PDFs, filings)
- Experience with NLP-driven structuring pipelines
- Excellent written communication
Responsibilities
- Own the data pipeline, entity resolution layer, AI agent orchestration, and quality systems.
- Manage multi-source data architecture, handling external providers, LLM hygiene agents, and customer-claimed edits with versioning, lineage, and observability.
- Implement data quality and operations including data contracts, pipeline unit tests, integration testing, confidence scoring, human-in-the-loop validation, anomaly detection, monitoring, alerting, and runbooks.
- Optimize cost and performance across cloud resources.
- Develop machine learning and matching systems, including embeddings infrastructure, vector generation, retrieval optimization, semantic search pipelines, reranking, and evaluation frameworks.
- Design and implement entity resolution and master data management, using deterministic blocking and LLM-based evaluation for match decisions, and handling lifecycle complexity.
- Model entity relationships and graphs, evaluating and implementing graph query capabilities (Apache AGE, Neo4j, or optimized Postgres patterns).
- Integrate AI agents, focusing on robust prompting architectures, JSON schema validation, structured AI → JSON → database pipelines with error recovery, and feedback loops.
View Full Description & ApplyYou'll be redirected to the employer's site