This is the best crispy clear RAG walkthrough that I have seen until now ! Congratulations and thank you for putting effort into educational videos on GenAI use
DUDE!! this is the best resource I have found so far. The part I truly appreciate is that you focused on the conceptual parts and NOT just here is the code
I’m only 15 minutes into this and was mostly looking for a RAG example, but I must say as a beginner to all this I really appreciate how much you’re breaking down the entire process. I haven’t used langchain for invoking models and you have me excited to do so now. The workflow seems much more logical this way.
You have sorted most of my understandings for RAG which i was missing here and there, this is your zero shot prompt for me understanding RAG basics Thank You !!
Looking forward to upcoming videos! Also it would be interesting if you can cover the following topics: 1. Langchain - saving context of conversation and using in next questions 2. Dialogflow intents or if it can be built in LangChain. 3. the real-time data to be added based on conversation. 4. Streaming audio to LLM, and the way to implement conversations like chatGPT by holding button, with less latency.
I wanted to ask something related to 1:11:20. If I have understood correctly, the cell being run at this time is showing the records present in the pinecone database that match closely with the user prompt and the answer would be generated based on this. My database consists of a large number of code files (HTML,typescript,python) and I wish to make a chatbot on these files so the user is able to get answers related to each file. I wish to know the best approach to store the vector embeddings of many different files in pinecone so that I can also provide the exact reference to that code file name. Also if the answer is generated from multiple files, it should be able to refer that too.
whose dad is this. i thank the lord for him because wow, i have watched so many rag videos but this is perfection 100/10 it is just so simple yet in depth thank you
The most informative and helpful video I have found. Working through each of the concepts and building on to the previous lesson has been very well executed here, unlike other content I have seen that tends to skip over detail or push forward too quickly. In addition, the fact you demonstrate concepts like vector stores using a memory based examples first is so much more insightful than jumping straight into Pinecone or Chroma. Thank you!
Nice - looking forward to dive into this. Could be nice to see a setup for Application with local LLM for private docs. I think a lot of people are looking for such a video 🤞🏻💪🏻 TY for your Nice work!
I just started my MSc in Data Science, and one of our projects is creating a RAG API similar to the one in this video. Your content is extremely helpful! Thank you very much!
Fantastic tutorial! Thanks a lot! I already saw other tutorials on LangChain and also I purchased a Udemy course. This tutorial here is the best of all. Everything is well explained and can easily be understood.
Wow. Great teaching! Explaining the "why" on your lessons, in this case (RAG) is what you dont find in other teaching material . Thank you, this has been very insightful! 👍
Great tutorial! Thank you Santiago! I actually tried it myself (first time using the OpenAI API). I was able to build a RAG application to get answers based on a PDF document. And it works! However, I see that there are some more stuff to think about, such as storing the embedded pages in the vectorstore might not be the best approach. I am looking forward to new videos!
I wanted to ask something related to 1:11:20. If I have understood correctly, the cell being run at this time is showing the records present in the pinecone database that match closely with the user prompt and the answer would be generated based on this. My database consists of a large number of code files (HTML,typescript,python) and I wish to make a chatbot on these files so the user is able to get answers related to each file. I wish to know the best approach to store the vector embeddings of many different files in pinecone so that I can also provide the exact reference to that code file name. Also if the answer is generated from multiple files, it should be able to refer that too.
Thank you, this is top-notch content! Many new for me llm rag concepts are clearly explained, and most importantly, it shows how they are interconnected and reinforce each other.
man this is great, keep it up. also it will be appreciated if you make a video for a full RAG system including: 1. adding the most suitable way of using memory . 2. how to embed and deploy these application for production(web, mobile, saas...etc)
I wanted to ask something related to 1:11:20. If I have understood correctly, the cell being run at this time is showing the records present in the pinecone database that match closely with the user prompt and the answer would be generated based on this. My database consists of a large number of code files (HTML,typescript,python) and I wish to make a chatbot on these files so the user is able to get answers related to each file. I wish to know the best approach to store the vector embeddings of many different files in pinecone so that I can also provide the exact reference to that code file name. Also if the answer is generated from multiple files, it should be able to refer that too.
"All you need is Attention" to the details this GREAT teacher is providing , he could have picked up a simple context within the limits and would have been done BUT no his zeal to share and show every single road block that we need to know is OUTSTANDING. God Bless You. Could you please do a Speech To Speech Application as well?
excellent tutorial explaining theory as well as code in very beginner friendly manner, subscribed! I found you via twitter feed and started this first video.
Thanks for the video. Very clear explanation of all concepts involved. Looking forward to your next videos. If possible, please can you cover the topic of production deployments and operations for a LLM project.
Thank you. It is the perfect video when you want start doing RAG just after knowing what RAG is for. Now it is easy to do chatbot for technical documentation queries.
I already have experience with Langchain, but wanted to learn more about Pinecone. You are just exceptional at teaching. Thank you so much for this. :)
This tutorial was the perfect level of theory + code for a beginner like me to get enough knowledge to start tinkering myself. Subscribed and looking forward to more content!
Santiago, thank you for the valuable RAG videos. Your insights are well-explained and informative. I would be interested in seeing more of your work on this topic.
What can I say, its just awesome what the world of possibilities you open to us, now I've more clear idea on how to make what I want with sort ideas, thank you I hope to join soon to your school , gracias :)
Great video. The jupyter notebook was very helpful! Interesting that, by default and only given the context of "Patricia likes white cars," the parser came to the conclusion that Patricia's car was white even though she might not actually own a white car. I added instructions to tell me when it was inferring an answer but makes me wonder what other things it might be inferring without telling me why.
I don't know why but I get the response "I don't know" for that question even though my code and the prompts are identical and I'm also using the OpenAI API.
@@freerider6300 Given that the template for the prompt says to respond as "I don't know" that seems appropriate. Are you by chance using gtp-4 vs gtp3.5-turbo? I changed my model to use gtp-4 and I get the response as "I don't know" where, with gtp-3.5-turbo, I get the response as "White" just like it shows in the video.
Excellent introduction to Pinecone use for RAG using Langchain. Looking forward to more...I had an error 2/3rds way through AttributeError: 'builtin_function_or_method' object has no attribute '__func__'. Eventually, I installed Python 3.11.8, re-cloned the repo, installed the same dependencies and it worked again, no problem!
First of all, thanks for providing this awesome video on RAG. I wanted to ask something related to 1:11:20. If I have understood correctly, the cell being run at this time is showing the records present in the pinecone database that match closely with the user prompt and the answer would be generated based on this. My database consists of a large number of code files (HTML,typescript,python) and I wish to make a chatbot on these files so the user is able to get answers related to each file. I wish to know the best approach to store the vector embeddings of many different files in pinecone so that I can also provide the exact reference to that code file. Also if the answer is generated from multiple files, it should be able to refer that too.
Amazing, You were just Amazing. Thank you so much for Explaining with this level of detail. Subscribed! btw, I really think your project should be used to build a Rag system with 'your video' as the knowledge base
Thank you for the great video! This is very insightful and helpful. I have one question: What if there are multiple documents input, and the answer you want to retrieve exists across different chunks? In that case, only fetching the most similar chunk will generate incomplete answer. However, if we search the top N most similar chunks, how can we make sure that they can be connected meaningfully (it could be the case that the top 3 most similar chunks are on Page 3, 80, 100, and there is no meaning connecting them together)? Thanks again for the effort you put in this video!
Hello Santiago! Thanks for this amazing video and for the explanation. Is it possible to provide an updated version of this RAG system to have be a chatbot with a conversational behaviour not only Q&A system? so it takes into consideration the previous questions and answers as part of the context to be able to answer questions that refer to something already mentioned before?
if anybody had the same problem with activating the virtenvir venv as i: source command didnt work - it’s just different based on the os so source works for linux/macos and the .venv\scripts\activate for windows
What value do you set to PINECONE_API_ENV? If I try to use text-embedding-3-small it wont work. For some reason it deploys ada. Do you know why? btw, nice intro to RAGs and its details.
I am building a RAG application, but I am just looking for some advice on issues: 1. How to make it answer general questions like greetings, farewell, etc., directly without trying to retrieve them as there is no need to do so? I know I can use LLM to classify the question but I will need an additional API call for that (if the model is openai) 2. I implemented RBAC if a single question from a user contains two queries and the user only has access to the document that is sufficient to answer one of the two queries, I retrieve the context of whatever is close to that (here the context is sufficient to answer one of the two queries) but the response from LLM is I don't know, which means it is failing to extract the context and answer at least whatever it can, how do I address this? 3. If I ask a specific question it answers, but if I ask in an overall sense say, what is the summary of XYZ doc, it fails to answer it would be great if someone could help me with the possible approaches to fix this thanks 🙏
Love your video! I you allow me a small suggestion, I would have put your face in the top left corner in order to avoid the overlapping with the code. Great Work
How would you scale to more than 1 document (in your case 1 transcript). Would you continue to use 1 vector store for all the documents or are there other methods? Great video btw!
This video is excellent. how can we manage when we have lack of memory during splitting to document(loading in tempfile) when we have 10GB/100GB/more, of pdf or any format documents ? Please suggest as it is more problem with memory issue. Also please make a video on different types of RAG(like DR RAG...etc) and RAG + Fine tune combination method along with how will you evaluate the response is reliable or not.
Nice video! Very helpful for beginners! Just one question, in terms of performance isn't it better to create one prompt and one call to model with the context, answer and translation? It would be just one call to OpenAI server instead of two..
For that particular example, yes. One call would be better. But think beyond that. You may have two separate chains using different models and processes. One call might not be possible, and that’s where chaining different chains might be helpful.
Impresionante. Que gran tutorial, es el tercero que veo para aprender RAG, y los conceptos y el procedimiento están muy bien explicados. Muchas gracias!!
This is the best crispy clear RAG walkthrough that I have seen until now ! Congratulations and thank you for putting effort into educational videos on GenAI use
Thanks!
DUDE!! this is the best resource I have found so far. The part I truly appreciate is that you focused on the conceptual parts and NOT just here is the code
This will be a great resource! I'd love to see a dedicated video diving deeper into LangGraphs and their applications. Keep up the great work!
I’m only 15 minutes into this and was mostly looking for a RAG example, but I must say as a beginner to all this I really appreciate how much you’re breaking down the entire process. I haven’t used langchain for invoking models and you have me excited to do so now. The workflow seems much more logical this way.
That’s the goal!
For me, good mentor is someone that can explain difficult topics with elegant simplicity and you doing it superb! Looking forward to Cohort 12 :)
I’ll see you in class!
This is a great tutorial for someone who wants a hands-on approach to run basic gen AI projects
You have sorted most of my understandings for RAG which i was missing here and there, this is your zero shot prompt for me understanding RAG basics Thank You !!
Looking forward to upcoming videos! Also it would be interesting if you can cover the following topics:
1. Langchain - saving context of conversation and using in next questions
2. Dialogflow intents or if it can be built in LangChain.
3. the real-time data to be added based on conversation.
4. Streaming audio to LLM, and the way to implement conversations like chatGPT by holding button, with less latency.
I wanted to ask something related to 1:11:20.
If I have understood correctly, the cell being run at this time is showing the records present in the pinecone database that match closely with the user prompt and the answer would be generated based on this.
My database consists of a large number of code files (HTML,typescript,python) and I wish to make a chatbot on these files so the user is able to get answers related to each file. I wish to know the best approach to store the vector embeddings of many different files in pinecone so that I can also provide the exact reference to that code file name.
Also if the answer is generated from multiple files, it should be able to refer that too.
Köszönjük!
whose dad is this. i thank the lord for him because wow, i have watched so many rag videos but this is perfection 100/10 it is just so simple yet in depth thank you
Brother the most comprehensive resource for learning RAG. No doubt this is the ultimate tutorial video of RAG so far.
The most informative and helpful video I have found. Working through each of the concepts and building on to the previous lesson has been very well executed here, unlike other content I have seen that tends to skip over detail or push forward too quickly. In addition, the fact you demonstrate concepts like vector stores using a memory based examples first is so much more insightful than jumping straight into Pinecone or Chroma. Thank you!
Nice - looking forward to dive into this.
Could be nice to see a setup for Application with local LLM for private docs. I think a lot of people are looking for such a video 🤞🏻💪🏻 TY for your Nice work!
you are exactly what ive been searching for , for days. so amazing such a great tutor and amazing person
I just started my MSc in Data Science, and one of our projects is creating a RAG API similar to the one in this video. Your content is extremely helpful! Thank you very much!
You are a truly amazing instructor of Machine Learning and LLMs, Truly inspired!
Fantastic tutorial! Thanks a lot! I already saw other tutorials on LangChain and also I purchased a Udemy course. This tutorial here is the best of all. Everything is well explained and can easily be understood.
its so far the best content available across youtube on RAG. keep it up!
Wow. Great teaching! Explaining the "why" on your lessons, in this case (RAG) is what you dont find in other teaching material . Thank you, this has been very insightful! 👍
Brilliant explanation. I understand the what, how and why of RAG applications a lot more clearly now than before I watched this video. Thanks a lot!
Great tutorial! Thank you Santiago!
I actually tried it myself (first time using the OpenAI API). I was able to build a RAG application to get answers based on a PDF document. And it works!
However, I see that there are some more stuff to think about, such as storing the embedded pages in the vectorstore might not be the best approach.
I am looking forward to new videos!
I have implemented your example in node js and it's working like a charm. Thanks!
But I really hate the langChain api. I'm trying to implement it as plain old functions.
I wanted to ask something related to 1:11:20.
If I have understood correctly, the cell being run at this time is showing the records present in the pinecone database that match closely with the user prompt and the answer would be generated based on this.
My database consists of a large number of code files (HTML,typescript,python) and I wish to make a chatbot on these files so the user is able to get answers related to each file. I wish to know the best approach to store the vector embeddings of many different files in pinecone so that I can also provide the exact reference to that code file name.
Also if the answer is generated from multiple files, it should be able to refer that too.
Great content! Extremely helpful to get hands-on introduction into LLM and RAG. Thank you!
Thank you, this is top-notch content!
Many new for me llm rag concepts are clearly explained, and most importantly, it shows how they are interconnected and reinforce each other.
Can't explain how much i have learned from this GREAT video. So far the BEST !!! thanks for such amazing explanation 👍👍👍
Dude, you made my life much easier after this video!! Keep doing this!! Thank you so much
Love it how you explain everything step by step!
man this is great, keep it up.
also it will be appreciated if you make a video for a full RAG system including:
1. adding the most suitable way of using memory .
2. how to embed and deploy these application for production(web, mobile, saas...etc)
I wanted to ask something related to 1:11:20.
If I have understood correctly, the cell being run at this time is showing the records present in the pinecone database that match closely with the user prompt and the answer would be generated based on this.
My database consists of a large number of code files (HTML,typescript,python) and I wish to make a chatbot on these files so the user is able to get answers related to each file. I wish to know the best approach to store the vector embeddings of many different files in pinecone so that I can also provide the exact reference to that code file name.
Also if the answer is generated from multiple files, it should be able to refer that too.
By far you are one of the top three the best instructor I have ever seen. You remind me Richard Phillips Feynman.
Genuinely found this to be hyper-valuable, after watching this, tons of concepts now actually make sense to me, Thank you!
"All you need is Attention" to the details this GREAT teacher is providing , he could have picked up a simple context within the limits and would have been done BUT no his zeal to share and show every single road block that we need to know is OUTSTANDING. God Bless You. Could you please do a Speech To Speech Application as well?
excellent tutorial explaining theory as well as code in very beginner friendly manner, subscribed! I found you via twitter feed and started this first video.
Thanks for the video. Very clear explanation of all concepts involved. Looking forward to your next videos.
If possible, please can you cover the topic of production deployments and operations for a LLM project.
Million stars to this video. The content is clear, precise and well explained. Thank you.
Thank you. It is the perfect video when you want start doing RAG just after knowing what RAG is for. Now it is easy to do chatbot for technical documentation queries.
I already have experience with Langchain, but wanted to learn more about Pinecone. You are just exceptional at teaching. Thank you so much for this. :)
Thanks!
Best Video on RUclips regarding RAG. Must watch....
This tutorial was the perfect level of theory + code for a beginner like me to get enough knowledge to start tinkering myself. Subscribed and looking forward to more content!
You're amazing!! Thank you for covering both code and conceptual aspects of the concept.
👏Congrats and thanks Bro for this incredible video! What every student needs: knowledge and passion in one place. Very clear.💪
Awesome explanation and content. Your content is by far the best I found on RAG... hats off!👏
Santiago, thank you for the valuable RAG videos. Your insights are well-explained and informative. I would be interested in seeing more of your work on this topic.
Thanks for taking the effort to explain the concepts one by one.
it is not 15 minute and i am sure it is one of the best video on this topic
This is an amazingly thorough introduction thank you
Hi on 47:20 i think it grouped number five to the left because of the word "drives" not "audio"
What can I say, its just awesome what the world of possibilities you open to us, now I've more clear idea on how to make what I want with sort ideas, thank you I hope to join soon to your school , gracias :)
So, langchains are pipes for LLMs ?
Do they provide any of the output splitting/redirection that you get from classic stdio ?
Hi Santiago, at 31:33 you mentioned that 1000 words is 750 Tokens. Isn't it the other way round? 1 word around 3/4 Tokens?
Ha! Yeah, more tokens than words. I’m always getting this wrong (notice you also made the same mistake in your comment.)
1 word is about 1.3 tokens
@@underfitted😅
Excellent walk through capturing all the required details
You're the Legend, Santiago!
Love the way you explain :)
Such an amazing indepth video! Thank you so much brother
Awesome video with great explanation, step by step with examples, images and code!! Love it.
I definitely would super like this video if I have the option. Amazing class. Thank you!
Loved watching it... You explain every thing so nicely!!!❤
Great video. The jupyter notebook was very helpful!
Interesting that, by default and only given the context of "Patricia likes white cars," the parser came to the conclusion that Patricia's car was white even though she might not actually own a white car. I added instructions to tell me when it was inferring an answer but makes me wonder what other things it might be inferring without telling me why.
I don't know why but I get the response "I don't know" for that question even though my code and the prompts are identical and I'm also using the OpenAI API.
@@freerider6300 Given that the template for the prompt says to respond as "I don't know" that seems appropriate. Are you by chance using gtp-4 vs gtp3.5-turbo? I changed my model to use gtp-4 and I get the response as "I don't know" where, with gtp-3.5-turbo, I get the response as "White" just like it shows in the video.
Best beginner level explanation of RAG with Langchain on Internet. Great one Santiago!
Thanks!
Amazing video. Thanks for the work you do in putting all this together.
Amazing video! Great work, really helped a lot in my understanding of RAG
This video got to me on the perfect day, you just solved a whole project for me with this. Thanks!
Keep uploading this kind of content
Love your method of teaching!!!
Excellent introduction to Pinecone use for RAG using Langchain. Looking forward to more...I had an error 2/3rds way through AttributeError: 'builtin_function_or_method' object has no attribute '__func__'. Eventually, I installed Python 3.11.8, re-cloned the repo, installed the same dependencies and it worked again, no problem!
First of all, thanks for providing this awesome video on RAG.
I wanted to ask something related to 1:11:20.
If I have understood correctly, the cell being run at this time is showing the records present in the pinecone database that match closely with the user prompt and the answer would be generated based on this.
My database consists of a large number of code files (HTML,typescript,python) and I wish to make a chatbot on these files so the user is able to get answers related to each file. I wish to know the best approach to store the vector embeddings of many different files in pinecone so that I can also provide the exact reference to that code file.
Also if the answer is generated from multiple files, it should be able to refer that too.
Thanks for the video. The content and the explanation is top notch.
Amazing, You were just Amazing. Thank you so much for Explaining with this level of detail. Subscribed!
btw, I really think your project should be used to build a Rag system with 'your video' as the knowledge base
Thank you for the great video! This is very insightful and helpful. I have one question: What if there are multiple documents input, and the answer you want to retrieve exists across different chunks? In that case, only fetching the most similar chunk will generate incomplete answer. However, if we search the top N most similar chunks, how can we make sure that they can be connected meaningfully (it could be the case that the top 3 most similar chunks are on Page 3, 80, 100, and there is no meaning connecting them together)? Thanks again for the effort you put in this video!
Such an amazing Resource !! Thanks Santiago ❤
WOOWWW!! an amazing effort thank you so much. Look forward for more
Thanks for the detailed, clearly explained tutorial. Nice
Thank you for this, your explanation on embeddings was superb!!
Amazing video. You are such a wealth of knowledge. Thankyou for these great video tutorials. Please keep them coming!!😃
Thank you a lot Santiago, best tutorial I have watched so far
Finally a video that i can follow through and understand what embedding and vector is. Thanks!
Beautiful video sir! Great work!
Wonderful video! I made my post-graduate final project exactly like this.
Superb explanation, Thank you for such great informative videos!!!
you have not use the opensource vector database and opensource language model in the above video. please try to make video on that.
Could you also do a video on how you can show the source documents that the context comes from?
best class period. you need to launch LLM course!
Great work here! Thanks a lot. 😊
You gain a new follower. Great job!
So great video, you alway explain complex concepts very well!
Hello Santiago!
Thanks for this amazing video and for the explanation. Is it possible to provide an updated version of this RAG system to have be a chatbot with a conversational behaviour not only Q&A system? so it takes into consideration the previous questions and answers as part of the context to be able to answer questions that refer to something already mentioned before?
if anybody had the same problem with activating the virtenvir venv as i: source command didnt work - it’s just different based on the os so source works for linux/macos and the .venv\scripts\activate for windows
What value do you set to PINECONE_API_ENV? If I try to use text-embedding-3-small it wont work. For some reason it deploys ada. Do you know why? btw, nice intro to RAGs and its details.
We don't need that one any more, actually. I apologize. When I wrote the code, that variable was needed, but not anymore.
I am building a RAG application, but I am just looking for some advice on issues:
1. How to make it answer general questions like greetings, farewell, etc., directly without trying to retrieve them as there is no need to do so? I know I can use LLM to classify the question but I will need an additional API call for that (if the model is openai)
2. I implemented RBAC if a single question from a user contains two queries and the user only has access to the document that is sufficient to answer one of the two queries, I retrieve the context of whatever is close to that (here the context is sufficient to answer one of the two queries) but the response from LLM is I don't know, which means it is failing to extract the context and answer at least whatever it can, how do I address this?
3. If I ask a specific question it answers, but if I ask in an overall sense say, what is the summary of XYZ doc, it fails to answer
it would be great if someone could help me with the possible approaches to fix this thanks 🙏
Love your video! I you allow me a small suggestion, I would have put your face in the top left corner in order to avoid the overlapping with the code. Great Work
True
I must say its a superb video. thanks for the clarification.
How would you scale to more than 1 document (in your case 1 transcript). Would you continue to use 1 vector store for all the documents or are there other methods? Great video btw!
Thank you for the really great video!! I think Pinecone updated its console because I could not find the ENV variable value beside my API key.
This video is excellent.
how can we manage when we have lack of memory during splitting to document(loading in tempfile) when we have 10GB/100GB/more, of pdf or any format documents ? Please suggest as it is more problem with memory issue.
Also please make a video on different types of RAG(like DR RAG...etc) and RAG + Fine tune combination method along with how will you evaluate the response is reliable or not.
Thanks for the video! Perfectly explained. However I did get an error when I tried to transcribe the YT Video:
URLError:
Why that?
Thanks!
Great, thanks, Santiago. What about using gpt4?
Thanks Santiago ! As always , high value content ! Just one quick question: why not match the LLM max content size with the chunk document size ?
Super helpful walkthrough. Thank you !
Nice video! Very helpful for beginners! Just one question, in terms of performance isn't it better to create one prompt and one call to model with the context, answer and translation? It would be just one call to OpenAI server instead of two..
For that particular example, yes. One call would be better.
But think beyond that. You may have two separate chains using different models and processes. One call might not be possible, and that’s where chaining different chains might be helpful.
Oh, I see.. Valid argument! Thanks for explanation, it's really great explanation of the concept itself!
Thanks for your valuable insights! It has been immensely helpful.
Impresionante. Que gran tutorial, es el tercero que veo para aprender RAG, y los conceptos y el procedimiento están muy bien explicados. Muchas gracias!!