Staff Software Engineer - Data Platform (Python)

Remote (US)Full-TimeStaff

Salary not disclosed

Apply NowOpens the employer's application page

Job Details

Experience: 7+ years
Required Skills: DockerPythonJavaAirflowSparkdbtDatabricksGitHub ActionsDatadogAWS LambdaPySpark

7+ years building production data systems in Python
Deep expertise in PySpark and distributed data processing (Glue, EMR, or Databricks)
Strong experience with lakehouse architectures: Iceberg, Delta Lake, or Hudi on S3
Production experience with Airflow or a comparable workflow orchestrator
Solid AWS production experience across S3, Glue, Athena, Lambda, and SQS
A track record of improving data quality, governance, and pipeline reliability at scale
Working knowledge of Java for reading upstream systems (strong candidates)
Experience with Trino or Presto for interactive SQL analytics at scale (strong candidates)
Experience with dbt for data transformation and modelling (strong candidates)
Familiarity with Great Expectations or similar data quality frameworks (strong candidates)
Genuine interest in AI-assisted development and LLM-based tooling (strong candidates)
Familiarity with hospitality data — reservations, rates, inventory, demand signals (strong candidates)

Own the design, performance, and reliability of Duetto's data lakehouse
Evolve the Python/PySpark pipeline framework across a bronze → silver → gold architecture on AWS, including Glue jobs, Iceberg MERGE operations, schema evolution, and partitioning strategies
Architect the shift from batch to near-real-time streaming, building SQS-driven stream pipelines with Iceberg sinks and expanding ingestion, normalisation, and analytics layers across the full lakehouse
Drive data quality and governance at scale — extending the Great Expectations framework, leading adoption of data contracts to formalise schemas between producers and consumers, and owning the Athena SQL layer
Strengthen observability and reliability through Datadog, Sentry, and Sumo Logic, while optimising Glue job performance — worker sizing, DPU allocation, Spark tuning, and cost management
Build and maintain shared internal Python libraries published to JFrog
Drive improvements to GitHub Actions, Docker-based testing, and CI/CD deployment workflows
Work AI-first every day — using Claude Code and MCP tools in your regular workflow, and contributing to AI-assisted pipeline generation, schema inference, and automated data quality

View Full Description & ApplyYou'll be redirected to the employer's site