Staff Software Engineer - Data Platform (Python)
Remote (US)Full-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 7+ years
- Required Skills
- DockerPythonJavaAirflowSparkdbtDatabricksGitHub ActionsDatadogAWS LambdaPySpark
Requirements
- 7+ years building production data systems in Python
- Deep expertise in PySpark and distributed data processing (Glue, EMR, or Databricks)
- Strong experience with lakehouse architectures: Iceberg, Delta Lake, or Hudi on S3
- Production experience with Airflow or a comparable workflow orchestrator
- Solid AWS production experience across S3, Glue, Athena, Lambda, and SQS
- A track record of improving data quality, governance, and pipeline reliability at scale
- Working knowledge of Java for reading upstream systems (strong candidates)
- Experience with Trino or Presto for interactive SQL analytics at scale (strong candidates)
- Experience with dbt for data transformation and modelling (strong candidates)
- Familiarity with Great Expectations or similar data quality frameworks (strong candidates)
- Genuine interest in AI-assisted development and LLM-based tooling (strong candidates)
- Familiarity with hospitality data — reservations, rates, inventory, demand signals (strong candidates)
Responsibilities
- Own the design, performance, and reliability of Duetto's data lakehouse
- Evolve the Python/PySpark pipeline framework across a bronze → silver → gold architecture on AWS, including Glue jobs, Iceberg MERGE operations, schema evolution, and partitioning strategies
- Architect the shift from batch to near-real-time streaming, building SQS-driven stream pipelines with Iceberg sinks and expanding ingestion, normalisation, and analytics layers across the full lakehouse
- Drive data quality and governance at scale — extending the Great Expectations framework, leading adoption of data contracts to formalise schemas between producers and consumers, and owning the Athena SQL layer
- Strengthen observability and reliability through Datadog, Sentry, and Sumo Logic, while optimising Glue job performance — worker sizing, DPU allocation, Spark tuning, and cost management
- Build and maintain shared internal Python libraries published to JFrog
- Drive improvements to GitHub Actions, Docker-based testing, and CI/CD deployment workflows
- Work AI-first every day — using Claude Code and MCP tools in your regular workflow, and contributing to AI-assisted pipeline generation, schema inference, and automated data quality
View Full Description & ApplyYou'll be redirected to the employer's site