Does the LLM first defaults to check the additional datastore we gave it to see if it has any relevant data related to the prompt the user enters, and if it finds relevant data, it responds to the user without checking the original data on which it has been trained, and if it doesnt find any relevant data in the datastore to the prompt, will then act as if RAG wasnt even implemented, and will respond based on the data on which it has been originally trained, or am i getting it wrong?
You got it. First will ping the corpus for relevant data, retrieve and insert into prompt. If none, then you just get the standard LLM output. Hope that helps.
Just wanted to clear my confusion, would i yield better results by applying RAG to a fine-tuned model (i.e. fine-tuned in my field of work) or is RAG on a stock LLM good enough?
Hey Jason, the current best practice is to first try RAG with a stock LLM and see if that works. If not, then consider fine-tuning, because it requires more effort than RAG. Hope that helps.
Correct. The retrieved context still needs to fit into the context window with the original prompt. In terms of compression, we can summarize the retrieved context, saving space as well. Hope that helps.
Best video about this topic by far!! Thank you Manny!
Very kind of you, Max. Thank you!
Clear thanks
Great to hear!
Very Nice. However an example would’ve helped augment the answer. Like ask it the gdp of Chad in 2023 when using ChatGPT.
Agree. Thanks for feedback. 😊
thank you for the video George Santos :)
🤣
Does the LLM first defaults to check the additional datastore we gave it to see if it has any relevant data related to the prompt the user enters, and if it finds relevant data, it responds to the user without checking the original data on which it has been trained, and if it doesnt find any relevant data in the datastore to the prompt, will then act as if RAG wasnt even implemented, and will respond based on the data on which it has been originally trained, or am i getting it wrong?
You got it. First will ping the corpus for relevant data, retrieve and insert into prompt.
If none, then you just get the standard LLM output.
Hope that helps.
Just wanted to clear my confusion, would i yield better results by applying RAG to a fine-tuned model (i.e. fine-tuned in my field of work) or is RAG on a stock LLM good enough?
Hey Jason, the current best practice is to first try RAG with a stock LLM and see if that works. If not, then consider fine-tuning, because it requires more effort than RAG. Hope that helps.
So does that mean that the data needs to fit the llm context window ? Or is the data going through some sort of compression ?
Correct. The retrieved context still needs to fit into the context window with the original prompt. In terms of compression, we can summarize the retrieved context, saving space as well. Hope that helps.