Apply📍 United States
🔍 Artificial Intelligence and Data Engineering
🏢 Company: Halo Media
- Master's degree in Computer Science, Data Science, or a related field.
- 3-5 years of work experience in data engineering, preferably in AI/ML contexts.
- Proficiency in Python, JSON, HTTP, and related tools.
- Strong understanding of LLM architectures, training processes, and data requirements.
- Experience with RAG systems, knowledge base construction, and vector databases.
- Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts.
- Hands-on experience with data cleaning, tagging, and annotation processes.
- Knowledge of data crawling techniques and associated ethical considerations.
- Familiarity with Snowflake and its integration in AI/ML pipelines.
- Experience with various vector store technologies and their applications in AI.
- Understanding of data lakehouse concepts and architectures.
- Excellent communication, collaboration, and problem-solving skills.
- Ability to translate business needs into technical solutions.
- Passion for innovation and commitment to ethical AI development.
- Experience building LLMs pipeline using frameworks like LangChain, etc.
- Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes.
- Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform.
- Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data.
- Benchmark and implement various vector stores, embedding techniques, and retrieval methods.
- Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types.
- Implement and maintain auto-tagging systems and data preparation processes for LLMs.
- Develop tools for text and image data crawling, cleaning, and refinement.
- Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models.
- Work with data lakehouse architectures to optimize data storage and processing.
- Integrate and optimize workflows using Snowflake and various vector store technologies.
AWSPythonGCPSnowflakeAlgorithmsAzureData engineeringData scienceSparkCollaboration
Posted 2024-11-07
Apply