Apply

Data Engineer (United States)

Posted 2024-09-20

View full description

💎 Seniority level: Junior, 1-3 years

📍 Location: United States

🔍 Industry: Data Technology

🏢 Company: Demyst

🗣️ Languages: English

⏳ Experience: 1-3 years

🪄 Skills: AWSPythonSQLETLGitSnowflakeSoftware ArchitectureAirflowPandasCommunication Skills

Requirements:
  • Bachelor's in Computer Science, Data Science, Engineering or similar technical discipline (or commensurate work experience); Master's degree preferred.
  • 1-3 years of Python programming (with Pandas experience).
  • Experience with CSV, JSON, parquet, and other common formats.
  • Data cleaning and structuring (ETL experience).
  • Knowledge of API (REST and SOAP), HTTP protocols, API Security and best practices.
  • Experience with SQL, Git, and Airflow.
  • Strong written and oral communication skills.
  • Excellent attention to detail.
  • Ability to learn and adapt quickly.
Responsibilities:
  • Collaborate with internal project managers, sales directors, account managers, and clients’ stakeholders to identify requirements and build external data-driven solutions.
  • Perform data appends, extracts, and analyses to deliver curated datasets and insights to clients to help achieve their business objectives.
  • Understand and keep current with external data landscapes such as consumer, business, and property data.
  • Engage in projects involving entity detection, record linking, and data modelling projects.
  • Design scalable code blocks using Demyst’s APIs/SDKs that can be leveraged across production projects.
  • Govern releases, change management and maintenance of production solutions in close coordination with clients' IT teams.
Apply

Related Jobs

Apply

📍 United States

🔍 Artificial Intelligence and Data Engineering

🏢 Company: Halo Media

  • Master's degree in Computer Science, Data Science, or a related field.
  • 3-5 years of work experience in data engineering, preferably in AI/ML contexts.
  • Proficiency in Python, JSON, HTTP, and related tools.
  • Strong understanding of LLM architectures, training processes, and data requirements.
  • Experience with RAG systems, knowledge base construction, and vector databases.
  • Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts.
  • Hands-on experience with data cleaning, tagging, and annotation processes.
  • Knowledge of data crawling techniques and associated ethical considerations.
  • Familiarity with Snowflake and its integration in AI/ML pipelines.
  • Experience with various vector store technologies and their applications in AI.
  • Understanding of data lakehouse concepts and architectures.
  • Excellent communication, collaboration, and problem-solving skills.
  • Ability to translate business needs into technical solutions.
  • Passion for innovation and commitment to ethical AI development.
  • Experience building LLMs pipeline using frameworks like LangChain, etc.

  • Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes.
  • Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform.
  • Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data.
  • Benchmark and implement various vector stores, embedding techniques, and retrieval methods.
  • Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types.
  • Implement and maintain auto-tagging systems and data preparation processes for LLMs.
  • Develop tools for text and image data crawling, cleaning, and refinement.
  • Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models.
  • Work with data lakehouse architectures to optimize data storage and processing.
  • Integrate and optimize workflows using Snowflake and various vector store technologies.

AWSPythonGCPSnowflakeAlgorithmsAzureData engineeringData scienceSparkCollaboration

Posted 2024-11-07
Apply