Senior Data Engineer, Data Lakehouse Infrastructure

Posted 23 days agoViewed

190000 - 220000 USD per year

United StatesFull-TimeBlockchain Intelligence

Company:TRM Labs

Location:United States, EST, PST

Languages:English

Seniority level:Senior, 5+ years

Experience:5+ years

Skills:

PythonSQLApache AirflowCloud ComputingETLGCPKafkaSnowflakeData modelingSoftware Engineering

Requirements:

5+ years of experience in data or software engineering, with a focus on distributed data systems and cloud-native architectures. Proven experience building and scaling data platforms on GCP, including storage, compute, orchestration, and monitoring. Strong command of one or more query engines such as Trino, Presto, Spark, or Snowflake. Experience with modern table formats like Apache Hudi, Iceberg, or Delta Lake. Exceptional programming skills in Python, as well as adeptness in SQL or SparkSQL. Hands-on experience orchestrating workflows with Airflow and building streaming/batch pipelines using GCP-native services.

Responsibilities:

Architect and scale a high-performance data lakehouse on GCP, leveraging technologies like StarRocks, Apache Iceberg, GCS, BigQuery, Dataproc, and Kafka. Design, build, and optimize distributed query engines such as Trino, Spark, or Snowflake to support complex analytical workloads. Implement metadata management in open table formats like Iceberg and data discovery frameworks for governance and observability using Iceberg compatible catalogs. Develop and orchestrate robust ETL/ELT pipelines using Apache Airflow, Spark, and GCP-native tools (e.g., Dataflow, Composer). Collaborate across departments, partnering with data scientists, backend engineers, and product managers to design and implement