Overview of RAG Approaches with Vector Databases

Поделиться
HTML-код
  • Опубликовано: 23 ноя 2024

Комментарии • 5

  • @TusharRathod-li7ql
    @TusharRathod-li7ql 11 месяцев назад +1

    Thanks Guys For the session. It was really helpful.

  • @maryamashraf6370
    @maryamashraf6370 9 месяцев назад +1

    Hey, great video! Just a clarification question because I'm not sure if I heard right - do we usually only take the single top context for RAG? I thought we usually take top-k, with k at 5-8? If we're taking small chunks e.g. a couple of sentences, couldn't multiple chunks be useful for additional context, in case the very top one doesn't exactly capture the answer?

    • @RyanSieglerAI
      @RyanSieglerAI 7 месяцев назад

      Thanks for tuning in! Yes you are correct, typically you will take the top-k retrieved results not just a single chunk. This will provide more context to the LLM.

  • @photonicdev377
    @photonicdev377 Год назад +1

    Hey guys, I liked your intro to the RAG, I also heard you have a subreddit. You should put a link to it on a video description or somewhere, I couldn't find it directly.
    Anyways I have a question:
    How would you optimize the retrieval and chunking for working with something like dialogs, to extract meaning and with the embeddings, what could be possible direction or advice would make sense? What kind of embedding model would you suggest using? And what should I look into when retrieving it? It sounds quite easy on a surface but I've been quite struggling to optimize it for it to retrieve meaningful context, if I go for smaller chunks at a sentence length or the change of speaker, it usually does not retrieve meaningful parts of the conversation. Any advice or reading material would be greatly appreciated. And I'm working with LangChain right now and self hosted LLM.

    • @KxSystems
      @KxSystems  Год назад

      Thank you for joining the presentation! The subreddit is: www.reddit.com/r/kdbai/... but it's brand new so not much activity yet!
      As to your question - recent embedding models are capable of creating meaningful vectors even from larger pieces of text, so you could try embedding entire conversations as one chunk. You could also try a method like parent document retriever, or sentence windows (both a method of chunk decoupling), where you do retrieval on smaller chunks like sentences and then provide larger texts (parent docs or windows around retrieved sentences) to the LLM for generation.
      If you are not getting good retrieval with smaller chunks, try some different embedding models - sentence transformers (huggingface.co/sentence-transformers) could be a good option.