AI Data Engineer

Posted 4 months agoViewed

United StatesFull-TimeHealthcare AI

Company:C the Signs

Location:United States

Languages:English

Seniority level:Senior, Proven experience as a Data Engineer

Experience:Proven experience as a Data Engineer

Skills:

AWSPythonApache AirflowETLGCPJavaMachine LearningAzureScalaData modeling

Requirements:

Bachelor's degree in Computer Science, Engineering, or a related field. Proven experience as a Data Engineer, with a focus on big data technologies. Strong proficiency in programming languages such as Python, Scala, or Java. Extensive experience with data warehousing, ETL processes, and data modeling. Experience with major cloud providers (e.g., AWS, GCP, Azure) and their data storage and processing services. Hands-on experience with big data frameworks like Apache Spark for distributed processing. Excellent problem-solving skills and the ability to work independently and as part of a team. Strong communication and interpersonal skills. Experience with healthcare data and a good understanding of healthcare data standards (e.g., FHIR, HL7). Familiarity with machine learning concepts and LLM fine-tuning processes. Experience with data orchestration tools (e.g., Apache Airflow).

Responsibilities:

Collaborate with data scientists and ML engineers on data requirements for LLM and ML model fine-tuning. Design, build, and maintain scalable data pipelines to ingest, process, and store diverse healthcare datasets. Implement robust data validation and monitoring for training dataset integrity and accuracy. Implement robust data cleaning, validation, and transformation processes. Develop and optimize data structures and schemas for efficient LLM and ML model utilization. Identify and acquire new data sources, ensuring compliance with healthcare regulations. Monitor data pipeline performance, troubleshoot issues, and implement optimizations. Document data engineering processes, data models, and data dictionaries. Stay up-to-date with advancements in data engineering, big data, and machine learning.