Chat with Documents is Now Crazy Fast thanks to Groq API and Streamlit

Prompt Engineering

Просмотров 20 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 25 июл 2024
Learn how to build an RAG pipeline with the world's fastest LLM API via Groq API. We will build an RAG application that will enable you to chat with a website and will wrap everything in a Streamlit App.
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
Signup for Advanced RAG:
tally.so/r/3y9bb0
LINKS:
Getting Started with Groq: • Getting Started with G...
How to chunk: • LangChain: How to Prop...
Code: github.com/PromtEngineer/Yout...
TIMESTAMPS:
[00:00] Introduction
[00:39] Setting Up: Installing Packages and Importing Libraries
[01:32] Designing the RAC Pipeline: From Data to Response
[02075] Implementing the RAC Pipeline with Groq API
[06:00] RAG with Streamlit and Groq API
[10:07] Streamlit App in Action: Real-Time Responses
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...
Наука

Комментарии • 45

@engineerprompt Месяц назад
If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag
@GroqInc 4 месяца назад ⁺¹³
Excellent demo, thank you for choosing Groq.
@2vadlamani 4 месяца назад
Really fast :) Thanks for the video
@RickySupriyadi 4 месяца назад ⁺²
your content never disappointing, love it!
@engineerprompt 4 месяца назад
Glad you enjoy it!
@chyldstudios 4 месяца назад
Excellent demo.
@engineerprompt 4 месяца назад ⁺⁶
If you are interested in leanring more about Advanced RAG techniques, signup here: tally.so/r/3y9bb0
@samcavalera9489 4 месяца назад ⁺¹
I already signed up! Can't wait to start learning advanced RAG techniques 😀
@hadi-yeg 4 месяца назад ⁺⁶
I thought you're going to show how Crazy FAST the RAG system will be set up (which is at the startup of your streamlit app)! BUT You're actually showing the response time from the LLM which obviously is fast when you call the API.
@rajeevchourasiya3889 Месяц назад
how to do that
@user-cq7iu4ws6q 4 месяца назад ⁺¹
thank you very much! How to use without in-memory vector store such AWS opensearch or Pinecone? I have alot of documents to search in
@zubinbalsara8414 4 месяца назад
Can you please give an example on how to do reranking? Your style of teaching is just absolutely fantastic.
@sayyedraza1895 4 месяца назад ⁺²
Thanks bro 🙏🙏
@limjuroy7078 4 месяца назад
So, we can speed up the response from the local LLM by using GROQ? Also, would creating embeddings for text chunks be faster also?
@stanTrX 2 месяца назад
thanks. how can i rag for my document such as pdf, instead of web site as in your example?
@cynthiarohr8560 4 месяца назад
Is it possible for you to create a video which also has Deepgram so now it becomes a conversational AI?
@yusufersayyem7242 4 месяца назад
Great work Sir 🌟🌟🌟 But I have a question How can I add more than one links and also add PDF files...??!
@MikewasG 4 месяца назад ⁺²
Thank you very much for your efforts. Your videos have been incredibly helpful to me! I have a question: In my experience, RAG's performance in extracting information from tables or images in PDFs is quite poor. Is there any way to improve this?
@engineerprompt 4 месяца назад ⁺²
Look into unstructured io for correctly parsing tables. Llamaindex also released a new tool called llama-parse for parsing tables. You might want to explore that as well.
@r0f115L4m 4 месяца назад ⁺²
What app did you use to create your flow diagram? Thank you so much for these videos, learn a lot from them!
@truliapro7112 4 месяца назад
I have same question!!
@NightSpyderTech 4 месяца назад ⁺⁴
it's called Ecalidraw
@truliapro7112 4 месяца назад
@@NightSpyderTech thanks 😊
@felipesanchez5823 4 месяца назад
¡Gracias!
@engineerprompt 4 месяца назад
Thank you 😊
@ubaisalih2987 4 месяца назад ⁺²
this is really awsome , and it will be great if you can deploy it on hugging face or any other suggested platform , eventually we need to deploy the app not only on local machine
@scitechtalktv9742 4 месяца назад ⁺¹
I agree with that, I would like to deploy this on Hugging Face Spaces. Is the free version enough for this, or is a paid version necessary?
@ubaisalih2987 4 месяца назад
@@scitechtalktv9742 i think using small llm can be fit on the free version
@uwegenosdude 3 месяца назад
Thanks for the very intersting video. I tried to run your example on window 11. Unfortunalety I get an error when trying to use FAISS. How do I have to run the faiss server ? My error looks like so:
Error raised by inference endpoint: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url
I changed one line to embeddings = OllamaEmbeddings(model="llama2:7b")
Then calling vector = FAISS.from_documents(documents, embeddings) seems to work (took a couple of minutes !!)
So I think I had no problem with FAISS but instead the default value for model "llama:7b" was not correct.
But another question: What has mixtral.... to do with llama2? Is it the same?
@stanTrX 2 месяца назад
why do you use ollama , llama2 embedding model? instead of something else like nomic-embed?
@engineerprompt 2 месяца назад ⁺¹
there are way too many options. This is to just show what is possible :)
@ABHINAVKUMAR-tu4ry Месяц назад
i want to make and deploy this type of application, but for this i have to run ollama in background, is there any other way, any one can help me
@scitechtalktv9742 4 месяца назад ⁺¹
I would like to store / serialize the vector store / embeddings because on my PC it takes a very long time to generate those! I mean extremely long: more than 4 hours! How can I do that?
@engineerprompt 4 месяца назад ⁺¹
You can use external API for doing the same. Hugging face offers free embedding APIs.
@kate-pt2ny 4 месяца назад ⁺¹
1. After running the program, it is required to install llama2. There are many other ollama models on my computer, but it seems that the program uses llama2 by default:
2. The initial run takes a long time (the URL in the example takes about 3 minutes, M1 max 32G). After the vector is completed, the search is very fast:
3. Provide news links, which can be quickly parsed and searched to obtain answers. RAG has a good effect;
4. The RAG effect of arxiv html papers is poor
Thank you for sharing
@engineerprompt 4 месяца назад
You can use any other embedding model in it. This example was using the llama2 embedding in ollama.
@fabriziocasula 4 месяца назад ⁺¹
is it possibile to read pdf?
@engineerprompt 4 месяца назад ⁺¹
Yes, look at my langchain tutorials.
@BrandonLee-ik8kw 4 месяца назад ⁺¹
The notebook doesn't work. I get an error with ValueError: Error raised by inference endpoint: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/embeddings (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
@engineerprompt 4 месяца назад
Do you have ollama running?
@qurious474 Месяц назад
@@engineerprompt what is this
@tofipie3432 4 месяца назад
thank you very much! How to use without in-memory vector store such AWS opensearch or Pinecone? I have alot of documents to search in

Следующие

Автовоспроизведение

Claude 3 Release and The Problem with Benchmarks