Hey, great video! Just a clarification question because I'm not sure if I heard right - do we usually only take the single top context for RAG? I thought we usually take top-k, with k at 5-8? If we're taking small chunks e.g. a couple of sentences, couldn't multiple chunks be useful for additional context, in case the very top one doesn't exactly capture the answer?
Thanks for tuning in! Yes you are correct, typically you will take the top-k retrieved results not just a single chunk. This will provide more context to the LLM.
Hey guys, I liked your intro to the RAG, I also heard you have a subreddit. You should put a link to it on a video description or somewhere, I couldn't find it directly. Anyways I have a question: How would you optimize the retrieval and chunking for working with something like dialogs, to extract meaning and with the embeddings, what could be possible direction or advice would make sense? What kind of embedding model would you suggest using? And what should I look into when retrieving it? It sounds quite easy on a surface but I've been quite struggling to optimize it for it to retrieve meaningful context, if I go for smaller chunks at a sentence length or the change of speaker, it usually does not retrieve meaningful parts of the conversation. Any advice or reading material would be greatly appreciated. And I'm working with LangChain right now and self hosted LLM.
Thank you for joining the presentation! The subreddit is: www.reddit.com/r/kdbai/... but it's brand new so not much activity yet! As to your question - recent embedding models are capable of creating meaningful vectors even from larger pieces of text, so you could try embedding entire conversations as one chunk. You could also try a method like parent document retriever, or sentence windows (both a method of chunk decoupling), where you do retrieval on smaller chunks like sentences and then provide larger texts (parent docs or windows around retrieved sentences) to the LLM for generation. If you are not getting good retrieval with smaller chunks, try some different embedding models - sentence transformers (huggingface.co/sentence-transformers) could be a good option.
Thanks Guys For the session. It was really helpful.
Hey, great video! Just a clarification question because I'm not sure if I heard right - do we usually only take the single top context for RAG? I thought we usually take top-k, with k at 5-8? If we're taking small chunks e.g. a couple of sentences, couldn't multiple chunks be useful for additional context, in case the very top one doesn't exactly capture the answer?
Thanks for tuning in! Yes you are correct, typically you will take the top-k retrieved results not just a single chunk. This will provide more context to the LLM.
Hey guys, I liked your intro to the RAG, I also heard you have a subreddit. You should put a link to it on a video description or somewhere, I couldn't find it directly.
Anyways I have a question:
How would you optimize the retrieval and chunking for working with something like dialogs, to extract meaning and with the embeddings, what could be possible direction or advice would make sense? What kind of embedding model would you suggest using? And what should I look into when retrieving it? It sounds quite easy on a surface but I've been quite struggling to optimize it for it to retrieve meaningful context, if I go for smaller chunks at a sentence length or the change of speaker, it usually does not retrieve meaningful parts of the conversation. Any advice or reading material would be greatly appreciated. And I'm working with LangChain right now and self hosted LLM.
Thank you for joining the presentation! The subreddit is: www.reddit.com/r/kdbai/... but it's brand new so not much activity yet!
As to your question - recent embedding models are capable of creating meaningful vectors even from larger pieces of text, so you could try embedding entire conversations as one chunk. You could also try a method like parent document retriever, or sentence windows (both a method of chunk decoupling), where you do retrieval on smaller chunks like sentences and then provide larger texts (parent docs or windows around retrieved sentences) to the LLM for generation.
If you are not getting good retrieval with smaller chunks, try some different embedding models - sentence transformers (huggingface.co/sentence-transformers) could be a good option.