Staff Software Engineer - Data Platform (Python)

Remote (US)Full-TimeStaff
Salary not disclosed
Apply NowOpens the employer's application page

Job Details

Experience
7+ years
Required Skills
DockerPythonJavaAirflowSparkdbtDatabricksGitHub ActionsDatadogAWS LambdaPySpark

Requirements

  • 7+ years building production data systems in Python
  • Deep expertise in PySpark and distributed data processing (Glue, EMR, or Databricks)
  • Strong experience with lakehouse architectures: Iceberg, Delta Lake, or Hudi on S3
  • Production experience with Airflow or a comparable workflow orchestrator
  • Solid AWS production experience across S3, Glue, Athena, Lambda, and SQS
  • A track record of improving data quality, governance, and pipeline reliability at scale
  • Working knowledge of Java for reading upstream systems (strong candidates)
  • Experience with Trino or Presto for interactive SQL analytics at scale (strong candidates)
  • Experience with dbt for data transformation and modelling (strong candidates)
  • Familiarity with Great Expectations or similar data quality frameworks (strong candidates)
  • Genuine interest in AI-assisted development and LLM-based tooling (strong candidates)
  • Familiarity with hospitality data — reservations, rates, inventory, demand signals (strong candidates)

Responsibilities

  • Own the design, performance, and reliability of Duetto's data lakehouse
  • Evolve the Python/PySpark pipeline framework across a bronze → silver → gold architecture on AWS, including Glue jobs, Iceberg MERGE operations, schema evolution, and partitioning strategies
  • Architect the shift from batch to near-real-time streaming, building SQS-driven stream pipelines with Iceberg sinks and expanding ingestion, normalisation, and analytics layers across the full lakehouse
  • Drive data quality and governance at scale — extending the Great Expectations framework, leading adoption of data contracts to formalise schemas between producers and consumers, and owning the Athena SQL layer
  • Strengthen observability and reliability through Datadog, Sentry, and Sumo Logic, while optimising Glue job performance — worker sizing, DPU allocation, Spark tuning, and cost management
  • Build and maintain shared internal Python libraries published to JFrog
  • Drive improvements to GitHub Actions, Docker-based testing, and CI/CD deployment workflows
  • Work AI-first every day — using Claude Code and MCP tools in your regular workflow, and contributing to AI-assisted pipeline generation, schema inference, and automated data quality
View Full Description & ApplyYou'll be redirected to the employer's site
View details
Apply Now