Chat with Documents is Now Crazy Fast thanks to Groq API and Streamlit
HTML-код
- Опубликовано: 25 июл 2024
- Learn how to build an RAG pipeline with the world's fastest LLM API via Groq API. We will build an RAG application that will enable you to chat with a website and will wrap everything in a Streamlit App.
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
Signup for Advanced RAG:
tally.so/r/3y9bb0
LINKS:
Getting Started with Groq: • Getting Started with G...
How to chunk: • LangChain: How to Prop...
Code: github.com/PromtEngineer/Yout...
TIMESTAMPS:
[00:00] Introduction
[00:39] Setting Up: Installing Packages and Importing Libraries
[01:32] Designing the RAC Pipeline: From Data to Response
[02075] Implementing the RAC Pipeline with Groq API
[06:00] RAG with Streamlit and Groq API
[10:07] Streamlit App in Action: Real-Time Responses
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu... Наука
If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag
Excellent demo, thank you for choosing Groq.
Really fast :) Thanks for the video
your content never disappointing, love it!
Glad you enjoy it!
Excellent demo.
If you are interested in leanring more about Advanced RAG techniques, signup here: tally.so/r/3y9bb0
I already signed up! Can't wait to start learning advanced RAG techniques 😀
I thought you're going to show how Crazy FAST the RAG system will be set up (which is at the startup of your streamlit app)! BUT You're actually showing the response time from the LLM which obviously is fast when you call the API.
how to do that
thank you very much! How to use without in-memory vector store such AWS opensearch or Pinecone? I have alot of documents to search in
Can you please give an example on how to do reranking? Your style of teaching is just absolutely fantastic.
Thanks bro 🙏🙏
So, we can speed up the response from the local LLM by using GROQ? Also, would creating embeddings for text chunks be faster also?
thanks. how can i rag for my document such as pdf, instead of web site as in your example?
Is it possible for you to create a video which also has Deepgram so now it becomes a conversational AI?
Great work Sir 🌟🌟🌟 But I have a question How can I add more than one links and also add PDF files...??!
Thank you very much for your efforts. Your videos have been incredibly helpful to me! I have a question: In my experience, RAG's performance in extracting information from tables or images in PDFs is quite poor. Is there any way to improve this?
Look into unstructured io for correctly parsing tables. Llamaindex also released a new tool called llama-parse for parsing tables. You might want to explore that as well.
What app did you use to create your flow diagram? Thank you so much for these videos, learn a lot from them!
I have same question!!
it's called Ecalidraw
@@NightSpyderTech thanks 😊
¡Gracias!
Thank you 😊
this is really awsome , and it will be great if you can deploy it on hugging face or any other suggested platform , eventually we need to deploy the app not only on local machine
I agree with that, I would like to deploy this on Hugging Face Spaces. Is the free version enough for this, or is a paid version necessary?
@@scitechtalktv9742 i think using small llm can be fit on the free version
Thanks for the very intersting video. I tried to run your example on window 11. Unfortunalety I get an error when trying to use FAISS. How do I have to run the faiss server ? My error looks like so:
Error raised by inference endpoint: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url
I changed one line to embeddings = OllamaEmbeddings(model="llama2:7b")
Then calling vector = FAISS.from_documents(documents, embeddings) seems to work (took a couple of minutes !!)
So I think I had no problem with FAISS but instead the default value for model "llama:7b" was not correct.
But another question: What has mixtral.... to do with llama2? Is it the same?
why do you use ollama , llama2 embedding model? instead of something else like nomic-embed?
there are way too many options. This is to just show what is possible :)
i want to make and deploy this type of application, but for this i have to run ollama in background, is there any other way, any one can help me
I would like to store / serialize the vector store / embeddings because on my PC it takes a very long time to generate those! I mean extremely long: more than 4 hours! How can I do that?
You can use external API for doing the same. Hugging face offers free embedding APIs.
1. After running the program, it is required to install llama2. There are many other ollama models on my computer, but it seems that the program uses llama2 by default:
2. The initial run takes a long time (the URL in the example takes about 3 minutes, M1 max 32G). After the vector is completed, the search is very fast:
3. Provide news links, which can be quickly parsed and searched to obtain answers. RAG has a good effect;
4. The RAG effect of arxiv html papers is poor
Thank you for sharing
You can use any other embedding model in it. This example was using the llama2 embedding in ollama.
is it possibile to read pdf?
Yes, look at my langchain tutorials.
The notebook doesn't work. I get an error with ValueError: Error raised by inference endpoint: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/embeddings (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
Do you have ollama running?
@@engineerprompt what is this
thank you very much! How to use without in-memory vector store such AWS opensearch or Pinecone? I have alot of documents to search in