Hi Eduardo, this is a really nice video, thank you. Do you think you could add a citation functionality, such that the user get's reaffirmed, where the information was taken from? Thanks
Very nice, the only challenge with this approach is the total cost of answering each query, and it could run forever in some cases till both llms agree or till you get thr eight relevant information from the search. I think of customers want 100% gurantee and are not worried about latency, this will work really well.
Surely this approach becomes more and more viable as the cost of newly released models keep on decreasing by 5x, 10x est as we are currently seeing? So the cost of this multi-shot RAG approach with a new model 5x cheaper is still less expensive than a single-shot of its more expensive predecessor?
Great video! But how to break a loop after a few trials if the model gets stuck into an infinite loop during Hallucinations grading or answer relevance?
Nice one. Question : what if all the docs are marked as irrelevant chunks by the model , do you need to query the vector db again ? I guess an improvement may be to include a Hyde model in between to improve the questions and keep trying to get a different chunks from DB ?
Great video! But I have a question I hope you can answer and help me. Why is so slowly answering? that's normal for the architecture or there is other reason, and can we do something to fix that?
The fact that we have 5 LLMs to generate answers + retriever + a websearch is performed when the question is not in the vector store database + we also store the web search results in the database and all these steps can take some time. To make it faster, you can use fewer LLMs and maybe skip the web search, depending on your usecase.
Thanks! I have a free account in Excalidraw and just have 1 session with all my diagrams. But you can get access to the flowchart using this link: github.com/Eduardovasquezn/advanced-rag-app/blob/main/images/rag.png
Yes, you can make 1,000 API calls for free every month. It's also possible to use Google Search as an agent for this. I have a video explaining step-by-step how to use it: ruclips.net/video/ppGRPWpv9Wc/видео.html
Hi Eduardo, this is a really nice video, thank you. Do you think you could add a citation functionality, such that the user get's reaffirmed, where the information was taken from? Thanks
Very nice, the only challenge with this approach is the total cost of answering each query, and it could run forever in some cases till both llms agree or till you get thr eight relevant information from the search. I think of customers want 100% gurantee and are not worried about latency, this will work really well.
Indeed, it'll depend on the usecase that you have because for some cases you wouldn't sacrifice the quality of the responses for the speed.
Surely this approach becomes more and more viable as the cost of newly released models keep on decreasing by 5x, 10x est as we are currently seeing?
So the cost of this multi-shot RAG approach with a new model 5x cheaper is still less expensive than a single-shot of its more expensive predecessor?
Exactly!
Awesome video. Thank you.
Glad you liked it!
Great video! But how to break a loop after a few trials if the model gets stuck into an infinite loop during Hallucinations grading or answer relevance?
thanks, good flow between rag and web search, thanks!!1 :)
Thank you. I'm glad you found it interesting!
Great video, very nice
Thank you very much!
I've been searching for a self-correcting system because sometimes the responses I receive from LLMs aren't precise. Thank you so much for your help.
I'm glad it was helpful!
Nice one. Question : what if all the docs are marked as irrelevant chunks by the model , do you need to query the vector db again ? I guess an improvement may be to include a Hyde model in between to improve the questions and keep trying to get a different chunks from DB ?
It'll perform a web search to find the relevant information (node that has the Agent). Yes, that could be an option too.
Awesome
Thank you!!
Great video! But I have a question I hope you can answer and help me.
Why is so slowly answering? that's normal for the architecture or there is other reason, and can we do something to fix that?
The fact that we have 5 LLMs to generate answers + retriever + a websearch is performed when the question is not in the vector store database + we also store the web search results in the database and all these steps can take some time. To make it faster, you can use fewer LLMs and maybe skip the web search, depending on your usecase.
Can you make an example using only Local LLMs and Local Agents, so no API Keys (and no costs) are created? That would be amazing!
Yes, I'll have it in mind for the next video!
@@eduardov01 Amazing!!
Please could you share the link here
Nice video!
Any chance to get access to the excalidraw version of the diagram?
Thanks!
I have a free account in Excalidraw and just have 1 session with all my diagrams. But you can get access to the flowchart using this link: github.com/Eduardovasquezn/advanced-rag-app/blob/main/images/rag.png
Is the Tavily API for free? Can I use the Google Search Engine instead?
Yes, you can make 1,000 API calls for free every month.
It's also possible to use Google Search as an agent for this. I have a video explaining step-by-step how to use it: ruclips.net/video/ppGRPWpv9Wc/видео.html