Staff Software Engineer - Data Platform (Python)
New
D
Duetto ResearchHospitality
United StatesFull-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 7+ years
- Required Skills
- DockerPythonAirflowDatabricksGitHub ActionsDatadogAWS LambdaPySpark
Requirements
- 7+ years building production data systems in Python
- Deep expertise in PySpark and distributed data processing (Glue, EMR, or Databricks)
- Strong experience with lakehouse architectures (Iceberg, Delta Lake, or Hudi on S3)
- Production experience with Airflow or a comparable workflow orchestrator
- Solid AWS production experience across S3, Glue, Athena, Lambda, and SQS
- Track record of improving data quality, governance, and pipeline reliability at scale
- Working knowledge of Java for reading upstream systems (plus)
- Experience with Trino or Presto for interactive SQL analytics at scale (plus)
- Experience with dbt for data transformation and modelling (plus)
- Familiarity with Great Expectations or similar data quality frameworks (plus)
- Genuine interest in AI-assisted development and LLM-based tooling (plus)
- Familiarity with hospitality data — reservations, rates, inventory, demand signals (plus)
Responsibilities
- Own the design, performance, and reliability of Duetto's data lakehouse.
- Evolve the Python/PySpark pipeline framework across a bronze → silver → gold architecture on AWS, including Glue jobs, Iceberg MERGE operations, schema evolution, and partitioning strategies.
- Architect the shift from batch to near-real-time streaming, building SQS-driven stream pipelines with Iceberg sinks and expanding ingestion, normalisation, and analytics layers across the full lakehouse.
- Drive data quality and governance at scale, extending the Great Expectations framework and leading adoption of data contracts.
- Own the Athena SQL layer that analysts and product teams depend on.
- Strengthen observability and reliability through Datadog, Sentry, and Sumo Logic, while optimising Glue job performance.
- Build and maintain shared internal Python libraries published to JFrog, and drive improvements to GitHub Actions, Docker-based testing, and CI/CD deployment workflows.
- Work AI-first every day using Claude Code and MCP tools, and contributing to AI-assisted pipeline generation, schema inference, and automated data quality.
View Full Description & ApplyYou'll be redirected to the employer's site