Senior Data Engineer
New
C
Ceresti HealthHealth Tech
US-based candidates onlyFull-TimeSenior
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Experience
- 8+ years
- Required Skills
- AWSPostgreSQLPythonSQLAirflowdbtHIPAA
Requirements
- BS/BA degree or higher in Computer Science, Engineering, or a related technical field
- 8+ years of professional data engineering experience
- Mastery of PostgreSQL
- Experience with file-based and API-based ingestion
- Hands-on experience with cloud platforms (AWS preferred)
- Experience with data warehouses and data lakes
- Strong experience with dbt or equivalent SQL-based transformation framework
- Experience with at least one orchestration framework (Dagster, Prefect, or Airflow)
- Strong Python skills
- Experience with data validation and quality frameworks
- Experience with HIPAA-regulated environments
- Comfortable with infrastructure-as-code and CI/CD
- Experience supporting ML workloads
- Experience using AI coding assistants
- Excellent written and verbal communication skills
- Experience working in Agile/Scrum teams
Responsibilities
- Design and own Ceresti’s end-to-end data architecture: a landing zone with secure cloud object storage for raw partner files and API payloads, validated ingestion pipelines into our transactional Postgres, and a curated analytics layer that decouples reporting and AI workloads from production
- Build ingestion pipelines for the data we receive today, including partner data files (CSV/JSON/XML/HL7/X12 as applicable) and REST/SFTP API integrations with schema validation, quarantine of bad records, and full lineage from raw bytes to curated row
- Stand up and operate the curated layer (data warehouse / lakehouse-lite) so analytics and ML models can consume data without slowing down the transactional system
- Choose, integrate, and operate the smallest set of tools needed, including object storage, an orchestrator (Dagster, Prefect, Airflow, etc.), dbt or similar for transformations, a single validation library (Great Expectations / Pandera / Soda)
- Design and enforce data governance for a HIPAA-regulated environment: PHI/PII classification, encryption in transit and at rest, role-based access, audit logging, retention and minimum-necessary policies, and de-identification where appropriate
- Partner with backend, ML, product, and clinical stakeholders to define data contracts with our health plan and ACO partners and hold the line on data quality
- Build and maintain reliable feature data for ML models, including embeddings (e.g., pgvector) and curated feature tables for risk stratification, engagement, and outcomes work
- Instrument the data platform for observability including pipeline SLAs, data freshness, schema drift, quality metrics, and act on what the data tells you
- Participate fully in our Agile process: backlog grooming, sprint planning, demos, and retrospectives
- Mentor engineers across the team on SQL, schema design, and the craft of building data systems that are boring in the best possible way
View Full Description & ApplyYou'll be redirected to the employer's site