Staff Software Engineer - Data Platform (Python)

New

United StatesFull-TimeStaff

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 7+ years
Required Skills: DockerPythonAirflowDatabricksGitHub ActionsDatadogAWS LambdaPySpark

7+ years building production data systems in Python
Deep expertise in PySpark and distributed data processing (Glue, EMR, or Databricks)
Strong experience with lakehouse architectures (Iceberg, Delta Lake, or Hudi on S3)
Production experience with Airflow or a comparable workflow orchestrator
Solid AWS production experience across S3, Glue, Athena, Lambda, and SQS
Track record of improving data quality, governance, and pipeline reliability at scale
Working knowledge of Java for reading upstream systems (plus)
Experience with Trino or Presto for interactive SQL analytics at scale (plus)
Experience with dbt for data transformation and modelling (plus)
Familiarity with Great Expectations or similar data quality frameworks (plus)
Genuine interest in AI-assisted development and LLM-based tooling (plus)
Familiarity with hospitality data — reservations, rates, inventory, demand signals (plus)

Own the design, performance, and reliability of Duetto's data lakehouse.
Evolve the Python/PySpark pipeline framework across a bronze → silver → gold architecture on AWS, including Glue jobs, Iceberg MERGE operations, schema evolution, and partitioning strategies.
Architect the shift from batch to near-real-time streaming, building SQS-driven stream pipelines with Iceberg sinks and expanding ingestion, normalisation, and analytics layers across the full lakehouse.
Drive data quality and governance at scale, extending the Great Expectations framework and leading adoption of data contracts.
Own the Athena SQL layer that analysts and product teams depend on.
Strengthen observability and reliability through Datadog, Sentry, and Sumo Logic, while optimising Glue job performance.
Build and maintain shared internal Python libraries published to JFrog, and drive improvements to GitHub Actions, Docker-based testing, and CI/CD deployment workflows.
Work AI-first every day using Claude Code and MCP tools, and contributing to AI-assisted pipeline generation, schema inference, and automated data quality.

View Full Description & ApplyYou'll be redirected to the employer's site