- Design and build production-grade data pipelines using batch, streaming, incremental, and CDC-based patterns.
- Build ingestion workflows from operational systems such as MongoDB, PostgreSQL, RDS, APIs, and event streams.
- Design and operate data migration workflows, including full load, incremental sync, CDC replay, cutover, rollback, and reconciliation.
- Convert semi-structured or NoSQL data into reliable relational and analytical models.
- Build and optimize data processing jobs using Python, PySpark, Spark SQL, SQL, and Databricks.
- Orchestrate workflows using Apache Airflow and manage connectors using Airbyte or similar tools.
- Maintain data quality, observability, alerting, backfills, and production reliability across pipelines.
- Work with AWS services such as S3, Lambda, IAM, EC2, RDS, DMS, SQS, Kinesis, or similar services.
- Build modular, testable transformation layers using dbt where appropriate.
- Document data flows, source-to-target mapping, pipeline behavior, data contracts, and operational runbooks.