SubDocument RAG: If You Are NOT Using This, You're OUTDATED Already! (step-by-step LlamaIndex)

Поделиться
HTML-код
  • Опубликовано: 19 июн 2024
  • Wow... AI images made us look 100 years younger. :-)
    In this episode, join Angelina and Mehdi, for a discussion about another advanced RAG technique to create production-ready RAG systems, and a walk through of LlamaIndex's implementation.
    Who's Angelina: / meetangelina
    Who's Mehdi: / mehdiallahyari
    00:00 Intro
    00:47 What is the problem with Naive RAG?
    01:00 RAG (retrieval augmented generation) Recap
    01:17 What is Chunking?
    02:05 Different methods for Chunking
    03:12 Challenge with traditional Chunking
    03:53 A simple solution
    06:26 How can we do EVEN BETTER?
    10:13 The book analogy
    11:10 Ah, this is how we humans approach the problem too!
    11:59 LlamaIndex Implementation walk-through (Step-by-step Jupyter Notebook)
    16:50 What's good enough for POC?
    🦄 Any specific contents you wish to learn from us? Sign up here: noteforms.com/forms/twosetai-...
    🖼️ Blogpost for today: Advanced RAG Technique - SubDoc Summary- open.substack.com/pub/mlnotes...
    📊 Flowchart: RAG Flow Chart: blog.griddynamics.com/content...
    🔨 Implementation: LlamaIndex example: github.com/run-llama/llama_in...
    📬 Don't miss out on the latest updates - Subscribe to our newsletter: mlnotes.substack.com/
    📚 If you'd like to learn more about RAG systems, check out our book on the RAG system: angelinamagr.gumroad.com/
    🕴️ Our consulting firm: We help companies that don't want to miss the boat of the current wave of AI advancement by integrating these solutions into their business operations and products. www.transformaistudio.com/
    Stay tuned for more content! 🎥 Thanks you for watching! 🙌
  • НаукаНаука

Комментарии • 42

  • @mrchongnoi
    @mrchongnoi 3 месяца назад +9

    I have been working on a RAG project for an institution in Jakarta. I will share my results of one of my test documents. in the next day or two. Cheers

  • @amir.astaneh
    @amir.astaneh 3 месяца назад +6

    Thank you both for sharing this fantastic solution

  • @deepakwalia9878
    @deepakwalia9878 3 месяца назад +4

    This is similar to Parent Document Retriever (which is available in langchain) without the summary part. Works well in practical scenarios.

  • @mchl_mchl
    @mchl_mchl Месяц назад +1

    won't be long before all published materials will come with RAG friendly metadata by default to improve RAG accuracy

  • @souravbarua3991
    @souravbarua3991 3 месяца назад

    Thank you for this solution. I will implement it.

  • @TheBestgoku
    @TheBestgoku 3 месяца назад +3

    amazing explanation.

  • @andaiai
    @andaiai 3 месяца назад +3

    The problem with AI is its really, extremely fast development, I mean I have a sole belief now that AI is trying to get itself really ready, The way humans are developing new techniques daily and are coming with new algorithms, its hard to tell , wether we are developing AI or is it trying to develop itself by using us as a base model.
    Last i checked 2 months ago RAG was the best way to implement knowledge based search using LLM, none of the the enterprise have been successfully able to implement a centralised knowledge base GPT and here we are with a new technique, SDRAG.
    Its like not doing anything and not implementing any tech is as good as implementing it, because even if you start from 5 months down the line, its all again going to be some new model+ some new tech to retrieve result fast + some new way to make it all cheaper, but what you wont have is a real time implementation of LLM in enterprises.

    • @TwoSetAI
      @TwoSetAI  3 месяца назад

      My two cents on this: it may not be necessary to be real-time catch up for AI/LLM in products. The reason is that for certain use cases - users are looking for "good" quality solutions that are generally "good enough" for them. A solution can last longer than how fast tech is updating. However, what I'm looking forward to seeing is how fast products/end-user apps are gonna change, how new experiences, new needs are going to be created. They are correlated but not the same.

  • @sun-ship
    @sun-ship 2 месяца назад +1

    great content

  • @AK-ox3mv
    @AK-ox3mv 3 месяца назад +3

    Thanks

  • @LahiruFernando
    @LahiruFernando 24 дня назад

    The video is awesome. However, i have a question on this concept so I can better understand how to implement it.
    As described in the video, the sub-doc summary is added to metadata as a summary. From my understanding, metadata search is more like a keyword search. What is the best way to search for the related summaries based on the user query? Are these sub-doc summaries are also converted into embeddings and stored?
    Maybe like a hierarchy where certain vectors include only the subdoc summaries and another set of vectors contain a reference to the related subdoc summary just like how we have primary and foreign keys in relational databases. Is this how it is implemented?
    My second question is: How would this reduce the latency? I feel that we need to do extra querying steps to identify subdocs, and then the related chunks.
    Looking forward for an explanation on this one :)
    Great content btw!!

  • @kirilchi
    @kirilchi 3 месяца назад +3

    Just thought of a good analogy humans use. When someone asks some question that can be answered in different levels of generality - we answer - are you asking in general or in this particular case?
    Same with query - we can find matching general context, but smaller local contexts in it may not match well. While another general context may not match the query exactly, but part of it (maybe sentence) may very closely match and actually provide answer for it.
    I think this approach improves search performance (log n if we have n levels of summarization), but we may loose some really good matches as a price for the speed.
    Was actually just thinking today if there is a good hybrid approach.

    • @TwoSetAI
      @TwoSetAI  3 месяца назад +2

      Absolutely! You can also consider using your own model to make that cutoff decision point for your “hybrid framework”!

    • @shashank1630
      @shashank1630 3 месяца назад

      @@TwoSetAIdude it’s not ok - the summary may loose many relevant matches….

    • @MehdiAllahyari
      @MehdiAllahyari 3 месяца назад +2

      You can think of this approach as when you have a question about a book. Book has different chapters(sub-documents) and each chapter has sections, etc (chunks). So you look at the chapters to see which one may have answer to your question, then you would go to that chapter and sift through the sections and paragraphs. This approach works like this. You can even have multiple levels, i.e. you can even divide sub-documents into sub-sub-documents. There is always a tradeoff between accuracy and speed. For a very relevant approach check this paper: RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval which proposes a multi-level summarization.

    • @MehdiAllahyari
      @MehdiAllahyari 3 месяца назад

      @@shashank1630 There is always a tradeoff. Nevertheless, research showing the effectiveness of multi-level summarization in RAG. Check this paper: RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

  • @uwepleban3784
    @uwepleban3784 3 месяца назад

    The example question about Llama-2 pre-training is answered correctly. However, I see no text in the node displayed below the answer, neither in the summary nor in the detail chunk, that mentions pre-training, only text about fine tuning and performance. Presumably the first source node in the response has the highest similarity value, but here does not contribute to the answer. Did you rerank the nodes before composing the answer, or is there some other magic at play?

  • @TheBestgoku
    @TheBestgoku 3 месяца назад +3

    there is soo much more room for improvement

  • @briancase9527
    @briancase9527 3 месяца назад +4

    I've been playing with PrivateGPT, and it's kind of deficient because--for one reason--it doesn't know about document organization / hierarchy. So, it's usefulness as-is is limited. This is a step in the right direction. I don't think the "tunable parameters" is the solution; the solution should, as was said in the beginning, to obey the document structure; this is especially true for documentation and papers where there are section headers already defined and are easy to parse. The idea shouldn't be to "try different subdocument sizes until you get a good result." That's never going to be useful.

  • @justinduveen3815
    @justinduveen3815 3 месяца назад +2

    Thank you both for sharing this helpful advice. Have you tried the “bag of words” approach to summarise large documents?

    • @fire17102
      @fire17102 2 месяца назад

      Can you elaborate? Or link resources? Thanks :)

  • @diego.castronuovo
    @diego.castronuovo 12 дней назад

    Using this method, how you can we save the embeddings to your ChromaDB instance after using SubDocSummaryPack ?
    I can't get out the embeddings to save them in Chroma !!

  • @maxlgemeinderat9202
    @maxlgemeinderat9202 3 месяца назад +3

    nice, is there already a langchain implementation for this?

    • @MehdiAllahyari
      @MehdiAllahyari 3 месяца назад

      I haven't seen that langchain has implemented the same approach.

  • @jp_coutinho
    @jp_coutinho Месяц назад

    Do we have any evidence on the claiming of "lower latency"?
    If I understand right, search in a vector DB works by calculating the proximity of embedded question and embedded chunks stored (I'll ignore hybrid searches). I dont see how subdoc summary in metadata can lower this latency. Is the summary also indexed? If it is indexed, how the search works? It is also embedded so we run two similarity searches, first in the summary and then in the chunk?
    I dont think this is a valid claim but I would appreciate an explanation or some benchmark that proves me wrong. I believe that subdoc improves the quality of retrieved chunks, but not latency.

  • @pyclassy
    @pyclassy 3 месяца назад +2

    Awesome approach..is there anything with langchain apart from llamaindex

    • @MehdiAllahyari
      @MehdiAllahyari 3 месяца назад

      I haven't seen whether langchain has implemented the exact same approach!

  • @sampathsaicharan244
    @sampathsaicharan244 3 месяца назад +1

    Does it support models other than Open AI?? When trying with llama cpp and mistral 7b, it gives me an error.

    • @MehdiAllahyari
      @MehdiAllahyari 3 месяца назад

      You should be able to use other models other than OpenAI. in the function SubDocSummaryPack() replace the 'llm' parameter with your own model.

  • @shauryatiwari7462
    @shauryatiwari7462 3 месяца назад +1

    Does it support LLMs apart from OpenAI models? I am getting error : OpenAI model not found

    • @SanjayRoy-vz5ih
      @SanjayRoy-vz5ih 3 месяца назад +2

      Look in your import statment, also look at what have passed for llm key...if you have not done that already...to start with

  • @bastabey2652
    @bastabey2652 Месяц назад

    summaries always lose information compared to original text.. adding a metadata to base chuck can help preserving global context.. a summary node might commit to a path from parent tree.. what if information is distributed among different chunks belonging to difference documents

  • @dibu28
    @dibu28 3 месяца назад +1

    But how to create whole document summary if document is larger then context window?

    • @uwepleban3784
      @uwepleban3784 3 месяца назад +2

      You can do recursive summarization. First, summarize each chapter, then summarize the chapter summaries.

  • @MuhammadDanyalKhan
    @MuhammadDanyalKhan 3 месяца назад +1

    Why not add the whole book summary to Sub docs?

  • @dhrumil5977
    @dhrumil5977 3 месяца назад +1

    7:55 RAPTOR?

    • @TwoSetAI
      @TwoSetAI  3 месяца назад

      We will share Raptor video soon!