At least 5 years of experience as a data engineer, software engineer, or similar role and using data to drive business results.
At least 5 years of experience with Python, building modular, testable, and production-ready code.
Solid understanding of SQL, including indexing best practices, and hands-on experience working with large-scale data systems (e.g., Spark, Glue, Athena).
Practical experience with Airflow or similar orchestration frameworks, including designing, scheduling, maintaining, troubleshooting, and optimizing data workflows (DAGs).
Familiarity with AWS cloud services, including S3, Lambda, Glue, RDS, and API Gateway.
Responsibilities:
Design and implement scalable machine learning pipelines with Airflow, enabling efficient parallel execution.
Enhance our data infrastructure by refining database schemas, developing and improving APIs for internal systems, overseeing schema migrations, managing data lifecycles, optimizing query performance, and maintaining large-scale data pipelines.
Implement monitoring and observability, using AWS Athena and QuickSight to track performance, model accuracy, operational KPIs and alerts.
Build and maintain data validation pipelines to ensure incoming data quality and proactively detect anomalies or drift.
Collaborate closely with software architects, DevOps engineers, and product teams to deliver resilient, scalable, production-grade machine learning pipelines.