Applyđź“Ť South Africa
🔍 Charity / Non-profit
- 5+ years in Data Engineering roles with a strong background in Python (Pandas, NumPy, PyTorch).
- Proven track record working with large language models (e.g., Llama 2) and vector databases (e.g., ChromaDB).
- Familiarity with containerization (Docker) and CI/CD pipelines (e.g., Jenkins, GitHub Actions).
- Skilled in setting up AI/ML workflows in cloud environments (AWS, GCP, or Azure).
- Experience with distributed computing frameworks (Spark, Dask) and additional vector search systems (Milvus, Pinecone) is a plus.
- Comfortable integrating RESTful APIs, fine-tuning models, and optimizing performance at scale.
- Strong analytical and troubleshooting abilities with effective communication skills to collaborate across multidisciplinary teams.
- Design and implement vector-based search systems (e.g., ChromaDB) and optimize performance for large-scale datasets, supporting both real-time and batch queries.
- Install, fine-tune, and deploy large language models like Llama 2 and develop workflows for generating high-quality text summarizations and embeddings.
- Train and adapt LLMs using domain-specific datasets, continuously evaluating and improving model accuracy, scalability, and efficiency.
- Develop and maintain robust ETL pipelines in Python, use Docker for containerization, and implement CI/CD pipelines to streamline integration and delivery.
- Thoroughly document workflows, codebases, and best practices to ensure long-term maintainability and scalability.
AWSDockerPythonETLGCPNumpyPyTorchAzurePandasSparkCI/CDRESTful APIs
Posted 2 months ago
Apply