Multi-Modal RAG: Chat with Text and Images in Documents
Prompt Engineering Prompt Engineering
169K subscribers
8,920 views
283

 Published On Jul 12, 2024

In this video, I'll show you how to build an end-to-end multi-modal RAG system using GPT-4 and LLAMA Index. We'll cover data collection, creating vector stores for text and images, and building a retrieval pipeline. Perfect for those interested in enhancing large language models with multi-modal data.

LINKS:
Colabl: https://tinyurl.com/25sb2rtu
Architecture: https://tinyurl.com/4x9x9bsc
Multi-modal RAG - Previous Video:    • Multi-modal RAG: Chat with Docs conta...  

💻 RAG Beyond Basics Course:
https://prompt-s-site.thinkific.com/c...

Let's Connect:
🦾 Discord:   / discord  
☕ Buy me a Coffee: https://ko-fi.com/promptengineering
|🔴 Patreon:   / promptengineering  
💼Consulting: https://calendly.com/engineerprompt/c...
📧 Business Contact: [email protected]
Become Member: http://tinyurl.com/y5h28s6h

💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off).

Signup for Newsletter, localgpt:
https://tally.so/r/3y9bb0

TIMESTAMPS:
00:00 Introduction to Multi-Modal RAG Systems
00:23 Overview of the Architecture
02:57 Setting Up the Environment
03:54 Data Collection and Preparation
04:28 Generating Image Descriptions with GPT-4
08:10 Creating Multi-Modal Vector Stores
09:41 Implementing the Retrieval Pipeline
11:05 Generating Final Responses


All Interesting Videos:
Everything LangChain:    • LangChain  

Everything LLM:    • Large Language Models  

Everything Midjourney:    • MidJourney Tutorials  

AI Image Generation:    • AI Image Generation Tutorials  

show more

Share/Embed