Design, develop, and optimize ETL/ELT pipelines using AWS EMR, Python, and PySpark Build and manage scalable data processing solutions on AWS, leveraging services such as S3, Lambda, Glue, and Redshift Work with large-scale datasets in distributed computing environments Develop scripts to automate data ingestion, transformation, and validation processes Work closely with clients, data scientists, and business analysts Provide strategic guidance on best practices in data engineering and analytics within the public sector Implement data validation, monitoring, and governance best practices