Data Engineer

New

SpainFull-TimeMiddle

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

4+ years of professional experience in data engineering with strong exposure to large-scale AWS and Spark environments
Advanced proficiency in SQL and Python for data processing and transformation at scale
Strong experience with AWS data services including S3, Glue, Athena, Redshift, EMR, and orchestration tools
Proven experience building and maintaining data models using dbt or similar frameworks
Hands-on experience with data quality, validation, and testing frameworks such as Great Expectations
Strong understanding of data governance, lineage, and reproducibility in production environments
Experience with entity resolution, deduplication, or record linkage across multiple data sources
Familiarity with anonymization and pseudonymization techniques in regulated environments
Experience working in regulated industries such as BFSI, healthcare, or government is highly valued
Ability to work independently or as a lead engineer within a small, fast-moving delivery team
Strong written and verbal communication skills in English, with the ability to document and explain complex systems clearly

Rebuild and validate data pipelines to ensure full reproducibility of reporting and descriptive statistics across all datasets
Profile, reconcile, and harmonize heterogeneous source schemas across multiple business entities into a unified data model
Design and implement dbt-based data models (staging, intermediate, and marts) with strong testing and validation layers
Develop and maintain data quality frameworks using tools such as Great Expectations and dbt tests to enforce reliability
Build and implement entity resolution and record linkage logic across fragmented customer and account datasets
Ensure robust anonymization and pseudonymization processes that meet regulatory and compliance requirements
Optimize large-scale Spark-based processing jobs, including partitioning strategies, file formats, and cost-efficient compute usage
Orchestrate production-grade pipelines using tools such as Airflow or AWS Step Functions
Deliver clean, documented, and feature-ready datasets for downstream data science and risk modelling teams
Create clear technical documentation and runbooks to support operational handover and long-term maintainability

View Full Description & ApplyYou'll be redirected to the employer's site