Llama 3 RAG: How to Create AI App using Ollama?
HTML-код
- Опубликовано: 8 июн 2024
- 🚀 Join me as I dive into the world of AI with LLaMA 3! In this video, we'll explore how to create a powerful RAG (retrieval-augmented generation) app using LLaMA 3 to enhance your projects with intelligent data retrieval. 🧠💻
Advanced Chunking Strategies: • Chunking Strategies in...
🔍 What You Will Learn:
Downloading and Setting Up LLaMA 3: Get started by installing the necessary libraries and downloading the LLaMA 3 model.
Creating the RAG App: Step-by-step process of building the app, from loading data from a URL to saving it in a vector database.
Designing a User Interface: Implement a UI where users can interact by asking questions to retrieve contextually relevant responses.
Enhancing Performance with Nomic Embeddings: Upgrade your app by integrating specialised embedding models for improved accuracy.
🔗 Components:
Ollama to Download LLaMA 3
Vector Databases: Chroma DB
Gradio: An easy way to build custom UIs for your projects
👍 Why Watch This Video?
Gain hands-on experience in AI application development, from basic setups to advanced data handling techniques, all tailored to empower your software development and data science skills.
🔗 Resources:
Sponsor a Video: mer.vin/contact/
Do a Demo of Your Product: mer.vin/contact/
Patreon: / mervinpraison
Ko-fi: ko-fi.com/mervinpraison
Discord: / discord
Twitter / X : / mervinpraison
Code: mer.vin/2024/04/llama-3-rag-u...
📌 Timestamps:
0:00 - Introduction to LLaMA 3 and RAG App
0:35 - Setup and Downloads
1:10 - Building the RAG App Core Functionality
3:00 - Embedding Generation and Storage
4:05 - Creating and Integrating the User Interface
5:25 - Final Testing and Demonstration
Make sure to subscribe and hit the bell icon to get notified about our latest uploads! Smash the like button if you find this tutorial helpful and share it to help others in the tech community. 🌟
#LLaMA3 #RAG #OLLaMA #AIApp #LLaMA3App #LLaMA3AIApp #LLaMA3RAG #LLaMA3RAG #RetrievalAugmentedGeneration #RetrievalAugmentedGenerationLangchain #RetrievalAugmentedGenerationLLaMA3 #LLaMA3 #RetrievalAugmented #LLaMARag #LamaRag #OLLaMARag #LLaMA3OLLaMA #LLaMA3OLLaMARag #OLLaMALLaMA3Rag - Хобби
This is my 1st ever comment, but I feel it is necessary. Excellent presentation without resorting to Click Baits and Time wasting segments. Thanks 👍
Amazing video in only 7 minutes!!! Straight to the points. Great!
I like how you break every line down. Subbed. Looking forward to new videos.
Thank very much 👍🏴
Gradio and Streamlit are both great UIs. I will try this .
Excellent content, superb efforts, kudos bro
Absolute king, thanks for the great tutorial and code
Great video again, I love to see how can I use llama 3 with TensorRT as well, I believe I can be awesome!
excellent tutorial
So amazing! Now something like this setup but with custom tools to automate with agents 😂
You said you wanted to put a link for a video of yours about chunking into the description? I’m especially interested in advanced chunking strategies like semantic and agentic chunking!
very nice thanks
Great Video! Why didn't you use agentic chunking?
Great video!
How do you compare LLM to evaluate which LLMs are going to be better for your usecase?
Hi, Mervin. Thank you for your excellent presentation and tutorial. Could you please do the procedures in docker compose?
Thanks you very much. It is working perfectly. Question. concerning : Create Ollama embeddings and vector store. How to save it, and how to load it to avoid again the embedding process. (If I want to have a model for a specific url). It is to win time. Thank for your answer. BR.
what about uploading document instead of webpage for retrivial?
Great video! And I am confused by the prompt format in LlaMa 3: 8B instruct (I followed the Meta document), but it always generates errors like keep repeating for some words and some symbols like in the generation. Is there an example of prompt engineering? Thanks!
vectorising the document(s) and generating the response based on the prompt takes time for me, either using Chroma or FAISS, but yours went fast, any walkaround to ensure more efficiency and less runtime?
Can you share with us the Llama 3 resource details and which compute instance you were using?
Great and exciting! That is very useful as well. The database is always created new, isn’t it? Is there a way to anyhow save / collect the embeddings for reuse? Or are the embeddings exclusively to the current prompt?
You can. I used Chroma DB to save it and load it for next time.
please tell me how to stream the response?
to have better interaction with the user
Thanks for the video. Can you make a guide how to make a ui for llama3 + RAG with data from local file? Maybe by using privategpt
this needs to be updated for the 70b :)
Hi does this work for fine-tuned local llama 3 model?
do u have something which includes streamlit
great tut, i try it but it seems doest work with js html
Legend, does it support memory?
How to do it for local document instead of url?
ChromaDb won't install 😢
can you show tutorial on how to deploy these types of programs ?
Can I ask a question? I wanted to know how to interact with chat locally in Excel using the anthropic type. I hope the question is clear. I want to receive answers in Excel, through queries to the local Jan ai, interacting through anthropic formulas in Excel. Maybe there are free solutions for such integrations for Api work with Excel. I would be very grateful for your answer! more precisely like in Claude for Sheets
same as llama1 amd 2?
if my llama3 on other server. how can i use it remotly ?
We would like you to build an ai assistant that learns from past experiences using crewai
dumb quwstion but when you say "touch app.py" what code editor are you using because im not getting the same results. im very very very new at this.
touch is a unix/linux command and ports are available to other OSs. It's run in a terminal/shell. It simply creates an empty file or updates the existing file's timestamp or other functionality based on parameters passed to it.
@@chhil thank you.
@@chhil im using mac, how would i get it to work the same to follow tutorials? ty for you help
While run the code it stopped here for a longer period of time "embeddings = OllamaEmbeddings(model="llama3")", can you please help on this?
Try nomic-embed-text instead of llama3
ollama.com/library/nomic-embed-text
In that line
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
In this line i am getting this error please help👇
raise ValueError(f"Expected IDs to be a non-empty list, got {ids}")
ValueError: Expected IDs to be a non-empty list, got []
@@gouravpatil7412 did you solve this? I get the same error even with nomic-embed-text
Wow! great response guys, thank you all
@@UmutErhan nope i guess if we run it in linux the error will go
5:01 still can see llama3
Mervyn you are full of energy telling us about Llama 3 and that's great but could you spare your precious time and tell us how to properly package a trained model into an exe file with the ability to upload it remotely) I probably want too much, sorry))) but maybe someone can tell us where to find the treasure chest) thank you for your time.
You are talking about something that doesn’t exist… you can’t just “pack” a model on a .exe and upload it somewhere… it does not work that way
2:30 yet again we allreaddy know answer xD ai is so cool. it not know anything else than what is programmed todo, but sure it know MOM jokes. stupid things like that should remove
I'm curious why you didn't use Ollama / OllamaWebUI to just upload and embed the documents? Would that save time, is it necessary to have the code to load/split/embed, if you can just drag and drop? What are the advantages to using this code?
Using custom code means , it’s more extendable and easy to integrate with any application.
It’s more customisable
what's the point of this question? You can also use copilot in bing for ai needs... The point here is to learn the concept and apply it to your own code design
@@UmutErhan the point is that I can ask questions. So not sure what the point is that you're making. I'm specifically exploring why you'd upload in code, when you can just drag and drop into the UI. Then you can focus on one key step... how to retrieve and provide the chunks in the prompt. Please don't try to edit people's point of view. I started with, "I'm curious".... Aren't we all?