10+ years of data engineering experience with enterprise-scale systems
Expertise in Apache Spark and Delta Lake, including ACID transactions, time travel, Z-ordering, and compaction
Deep knowledge of Databricks (Jobs, Clusters, Workspaces, Delta Live Tables, Unity Catalog)
Experience building scalable ETL/ELT pipelines using tools like Airflow, Glue, Dataflow, or ADF
Advanced SQL for data modeling and transformation
Strong programming skills in Python (or Scala)
Hands-on experience with data formats such as Parquet, Avro, and JSON
Familiarity with schema evolution, versioning, and backfilling strategies
Working knowledge of at least one major cloud platform: AWS (S3, Athena, Redshift, Glue Catalog, Step Functions), GCP (BigQuery, Cloud Storage, Dataflow, Pub/Sub), or Azure (Synapse, Data Factory, Azure Databricks)
Experience designing data architectures with real-time or streaming data (Kafka, Kinesis)
Consulting or client-facing experience with strong communication and leadership skills
Experience with data mesh architectures and domain-driven data design
Knowledge of metadata management, data cataloging, and lineage tracking tools
Responsibilities:
Shape large-scale data architecture vision and roadmap across client engagements
Establish governance, security frameworks, and regulatory compliance standards
Lead strategy around platform selection, integration, and scaling
Guide organizations in adopting data lakehouse and federated data models
Lead technical discovery sessions to understand client needs
Translate complex architectures into clear, actionable value for stakeholders
Build trusted advisor relationships and guide strategic decisions
Align architecture recommendations with business growth and goals
Design and implement modern data lakehouse architectures with Delta Lake and Databricks
Build and manage ETL/ELT pipelines at scale using Spark (PySpark preferred)
Leverage Delta Live Tables, Unity Catalog, and schema evolution features
Optimize storage and queries on cloud object storage (e.g., AWS S3, Azure Data Lake)
Integrate with cloud-native services like AWS Glue, GCP Dataflow, and Azure Synapse Analytics
Implement data quality monitoring, lineage tracking, and schema versioning
Build scalable pipelines with tools like Apache Airflow, Step Functions, and Cloud Composer
Develop cost-optimized, scalable, and compliant data solutions
Design POCs and pilots to validate technical approaches
Translate business requirements into production-ready data systems
Define and track success metrics for platform and pipeline initiatives