Staff Software Engineer - Data Platform (Python)

New
D
Duetto ResearchHospitality
United StatesFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
7+ years
Required Skills
DockerPythonAirflowDatabricksGitHub ActionsDatadogAWS LambdaPySpark

Requirements

  • 7+ years building production data systems in Python
  • Deep expertise in PySpark and distributed data processing (Glue, EMR, or Databricks)
  • Strong experience with lakehouse architectures (Iceberg, Delta Lake, or Hudi on S3)
  • Production experience with Airflow or a comparable workflow orchestrator
  • Solid AWS production experience across S3, Glue, Athena, Lambda, and SQS
  • Track record of improving data quality, governance, and pipeline reliability at scale
  • Working knowledge of Java for reading upstream systems (plus)
  • Experience with Trino or Presto for interactive SQL analytics at scale (plus)
  • Experience with dbt for data transformation and modelling (plus)
  • Familiarity with Great Expectations or similar data quality frameworks (plus)
  • Genuine interest in AI-assisted development and LLM-based tooling (plus)
  • Familiarity with hospitality data — reservations, rates, inventory, demand signals (plus)

Responsibilities

  • Own the design, performance, and reliability of Duetto's data lakehouse.
  • Evolve the Python/PySpark pipeline framework across a bronze → silver → gold architecture on AWS, including Glue jobs, Iceberg MERGE operations, schema evolution, and partitioning strategies.
  • Architect the shift from batch to near-real-time streaming, building SQS-driven stream pipelines with Iceberg sinks and expanding ingestion, normalisation, and analytics layers across the full lakehouse.
  • Drive data quality and governance at scale, extending the Great Expectations framework and leading adoption of data contracts.
  • Own the Athena SQL layer that analysts and product teams depend on.
  • Strengthen observability and reliability through Datadog, Sentry, and Sumo Logic, while optimising Glue job performance.
  • Build and maintain shared internal Python libraries published to JFrog, and drive improvements to GitHub Actions, Docker-based testing, and CI/CD deployment workflows.
  • Work AI-first every day using Claude Code and MCP tools, and contributing to AI-assisted pipeline generation, schema inference, and automated data quality.
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now