Senior Data Engineer, Data Lakehouse Infrastructure
T
TRM LabsBlockchain Analytics
North America, EST/PSTFull-TimeSenior
Salary190000 - 220000 USD per year
Apply NowOpens the employer's application page
Job Details
- Experience
- 5+ years
- Required Skills
- PythonSQLApache AirflowGCPKafkaSnowflakeSparkBigQuery
Requirements
- 5+ years of experience in data or software engineering, with a focus on distributed data systems and cloud-native architectures
- Proven experience building and scaling data platforms on GCP, including storage, compute, orchestration, and monitoring
- Strong command of one or more query engines such as Trino, Presto, Spark, or Snowflake
- Experience with modern table formats like Apache Hudi, Iceberg, or Delta Lake
- Exceptional programming skills in Python
- Adeptness in SQL or SparkSQL
- Hands-on experience orchestrating workflows with Airflow
- Building streaming/batch pipelines using GCP-native services
Responsibilities
- Design, implement, and scale core components of our lakehouse architecture
- Have ownership over data modeling, ingestion, query performance optimization, and metadata management
- Architect and scale a high-performance data lakehouse on GCP, leveraging technologies like StarRocks, Apache Iceberg, GCS, BigQuery, Dataproc, and Kafka
- Design, build, and optimize distributed query engines such as Trino, Spark, or Snowflake to support complex analytical workloads
- Implement metadata management in open table formats like Iceberg and data discovery frameworks for governance and observability using Iceberg compatible catalogs
- Develop and orchestrate robust ETL/ELT pipelines using Apache Airflow, Spark, and GCP-native tools (e.g., Dataflow, Composer)
- Collaborate across departments, partnering with data scientists, backend engineers, and product managers to design and implement
View Full Description & ApplyYou'll be redirected to the employer's site