AI Data Engineer
New
C
C the SignsHealthcare AI
Boston, Massachusetts, United States. New York, New York, United States. New York, United States. New Jersey, United States. New Hampshire, United States. Rhode Island, United StatesFull-TimeMiddle
Salary not disclosed
Apply NowOpens the employer's application page
Job Details
- Required Skills
- AWSPythonETLGCPJavaAzureSparkScalaData modeling
Requirements
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Proven experience as a Data Engineer, with a focus on big data technologies.
- Strong proficiency in programming languages such as Python, Scala, or Java.
- Extensive experience with data warehousing, ETL processes, and data modeling.
- Experience with major cloud providers (e.g., AWS, GCP, Azure) and their data storage and processing services.
- Hands-on experience with big data frameworks like Apache Spark for distributed processing.
- Excellent problem-solving skills and the ability to work independently and as part of a team.
- Strong communication and interpersonal skills.
- Master's degree in a related field preferred.
- Experience with healthcare data and healthcare data standards (e.g., FHIR, HL7) preferred.
- Familiarity with machine learning concepts and LLM fine-tuning processes preferred.
- Experience with data orchestration tools (e.g., Apache Airflow) preferred.
Responsibilities
- Collaborate with data scientists and machine learning engineers to understand data requirements for LLM and machine learning model fine-tuning.
- Design, build, and maintain scalable data pipelines to ingest, process, and store massive and diverse healthcare datasets.
- Implement robust data validation and monitoring to ensure the integrity, accuracy, and consistency of all training datasets.
- Implement robust data cleaning, validation, and transformation processes to ensure data quality and integrity.
- Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models.
- Work with the team to identify and acquire new data sources, ensuring compliance with relevant healthcare regulations (e.g., HIPAA).
- Monitor data pipeline performance, troubleshoot issues, and implement optimizations to improve efficiency and reliability.
- Document data engineering processes, data models, and data dictionaries.
View Full Description & ApplyYou'll be redirected to the employer's site