Semantic Chunking for RAG

Поделиться
HTML-код
  • Опубликовано: 16 янв 2025

Комментарии • 67

  • @energyexecs
    @energyexecs 6 месяцев назад +3

    James Brggs one of my favorites and I believe I am a "Patreon""member - spend hundreds of hours listening to about 10 podcasts, studying Large Language Models, Machine Learning and so called "AI". James Briggs breaks things down in easier to understand concepts. Thank you James Briggs

    • @jamesbriggs
      @jamesbriggs  6 месяцев назад +1

      hey that's awesome, I really appreciate the support!

  • @aaronsmyth7943
    @aaronsmyth7943 8 месяцев назад +12

    At this point, you are practically Captain Chunk.

  • @lalamax3d
    @lalamax3d 8 месяцев назад

    best i have seen so far about understanding core concept of chunking , thanks

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад

      glad it was helpful :)

  • @AaronJOlson
    @AaronJOlson 8 месяцев назад +2

    Thank you! I’ve been doing this for a while, but did not have a good name for it.

  • @naromsky
    @naromsky 8 месяцев назад +9

    King of Chunk

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад +5

      a title I have always wanted

  • @AdrienSales
    @AdrienSales 8 месяцев назад

    Excellent content and explanation , espeicialy chunking core concepts and challenges. Keep going your work it's so precisous to learn 👍

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад +1

      Glad to hear it helps

  • @xuantungnguyen9719
    @xuantungnguyen9719 8 месяцев назад +3

    Need a video on cross-chunk attention. Wasn’t attention all about key query and val anyway

  • @AGI-Bingo
    @AGI-Bingo 8 месяцев назад +4

    Hi James , would you please tell me how you would tackle this one..
    How would you design a realtime updating rag system? For example, let's say our clients updated some details in some watched doc, I want the old chunks to be removed, and rechunked automatically. Have you seen such pipeline existing already? No one seems to cover this and I think it sets apart fun projects and actual production system. Thanks and all the best! Love your channel ❤

    • @shameekm2146
      @shameekm2146 8 месяцев назад +1

      I have achieved this for one of the sources in my RAG bot. It has an api provided to access the data. So i run the embedding script on the delta changes.

    • @AGI-Bingo
      @AGI-Bingo 8 месяцев назад +2

      @@shameekm2146 amazing, would you please opensource it so we can all improve the pipeline as a community? 🌈

    • @rohansingh1057
      @rohansingh1057 2 месяца назад

      RAG does not mean you "have to use vector embeddings and Vector DB". If you can run APIs to fetch relevant info, it should be good enough. Use function/tool calling to call the API.
      Otherwise, if you are planning to watch some doc live you need to have the following pipeline this will only work if you are making small changes in the doc frequently ->
      Doc Changed -> Webhook/Trigger to your system -> If diff is available, use it, if diff is not available, compute diff with old vs new doc.
      -> Take nearby text as sample and compute embeddings -> Fetch top N nearby docs from the Vector DB (Hybrid search will work really well, tune the sparse vector weightage higher than normal RAG here) -> Ask LLM Agent to mark relevant chunks/Use reranking models (old chunk as query) -> Delete these old chunks from VectorDB -> Compute embeddings for the new changes -> Upsert the new vectors into the DB.
      There are tons of edge cases that you will run into when running it this pipeline and they always every for each use case, so you will have to consider those accordingly.

  • @shameekm2146
    @shameekm2146 8 месяцев назад

    Thank you so much for this. Will test it out on the RAG flow in the company.

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад

      welcome, would love to hear how it goes

  • @jonm691
    @jonm691 6 месяцев назад

    Loved this explanation

  • @rodgerb2645
    @rodgerb2645 8 месяцев назад

    Love all your content sir!

  • @dinoscheidt
    @dinoscheidt 8 месяцев назад +3

    People since GPT2: Simply ask an LLM recursively to please insert “{split}“ where a topic change etc happens according to a summary of prior text. Get embeddings. Use to separate and group.
    2024: We would like to introduce a novel concept called Semantic Chunking with a sliding Context……..
    Beginners must be truly lost 😮‍💨

  • @NhatNguyen-bq6jj
    @NhatNguyen-bq6jj 7 месяцев назад

    Can you introduce some articles related to this topic? Thanks!

  • @klik24
    @klik24 8 месяцев назад

    Just what i eas trying to lewrn ...awesome mate, thanks

  • @FatherNovelty
    @FatherNovelty 8 месяцев назад +1

    At ~4:40, you mention that you should use the same encoder for the chunking and the encoding. Why? A chunk size captures a "single meaning", so why would it matter that the same encoder is used? If you look at the chunking as a clutering algorithim that creates meaningful chunks, then what does it matter that the encoders match? What am I missing?

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад +1

      good point - yes they are capturing the "single meaning" and that single meaning will (hopefully) overlap a lot, but embedding models are not perfect and so they will not align between themselves. Similar to if someone asked myself and you to chunk an article, we'd likely overlap for the majority of the article, but I'm sure there would be differences

  • @baskarjayaraman5821
    @baskarjayaraman5821 8 месяцев назад

    Great video. Thanks for posting. I have been thinking of document chunking but using the LLM itself via prompting + k-shot. The approach you show will be cheaper of course but curious to see how these two approaches will compare in terms of any relevant non-cost metrics.

  • @FrankenLab
    @FrankenLab 2 месяца назад

    @James Briggs Newbie here, was wondering if it was necessary to store the chunk with the vector, it seems like a lot of data duplication and a good way to fill your disk. I like the idea of storing the title, I was thinking about storing the document path and filename also. I haven't been able to find good info about what data besides vectors is also kept in the vector db. I understand that the vectors need to correlate to data, I just don't understand what data is actually represented in the vectors. If you just have an ID and the vectors, can't that ID point back to the document with the content?

  • @luciolrv
    @luciolrv 8 месяцев назад

    How does Parent Document Rag fits in your in your new techniques?

  • @nikhilmaddirala
    @nikhilmaddirala 7 месяцев назад

    What's a good way to use the metadata for retrieval and ranking of the chunks?

  • @scottmiller2591
    @scottmiller2591 8 месяцев назад +2

    "Grab complete thoughts" is an obvious good and expensive thing. Except for tables, for instance.

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад +2

      yeah tables need to handled differently - doable if you are identifying text vs. table elements in your processing pipeline

  • @GeertBaeke
    @GeertBaeke 8 месяцев назад

    We use a simple combination of Microsoft's Document Intelligence with markdown output and a simple markdown splitter. The improvement is noticeable although the Document Intelligence models do come at an additional cost.

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад +2

      yeah it depends on what you need ofcourse, I'm mostly interested in further abstraction and more analytics methods for chunking not for where it is now, but for where this type of experimentation might lead to in the future - I could see a few more iterations and improvements to more intelligent doc parsing and chunking to become increasingly more performant - but we'll see

    • @alivecoding4995
      @alivecoding4995 8 месяцев назад

      Do you have a link for this markdown processing? :)
      We are using Document Intelligence as well, but not for layout analysis, yet.

    • @GayathriG-h5h
      @GayathriG-h5h 8 месяцев назад

      @@alivecoding4995you can also use layoutpdf reader from llmsherpra

  • @fayluu248
    @fayluu248 6 месяцев назад

    Hi James, do you think that the chunking and embedding process in RAG will be unnecessary in the short future, as the input token length is no longer a limitation.

    • @jamesbriggs
      @jamesbriggs  6 месяцев назад

      I don’t think the input token length will become unlimited any time soon - but for smaller use cases (fitting within Anthropic limits) where latency and token cost are not important then you can use a pure LLM solution rather than RAG

  • @MrMoonsilver
    @MrMoonsilver 8 месяцев назад

    Amazing video, thank you so much!!

  • @gullyburns1280
    @gullyburns1280 8 месяцев назад

    Another killer video. Great work!

  • @bastabey2652
    @bastabey2652 8 месяцев назад

    using a high end LLM like GPT-4 or Opus or Gemini Ultra or Pro might be effective in performing semantic chunking.. Google large context window seems suitable for chunking large files.. we need to introduce LLM in automating the RAG stack

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад +1

      Yeah I’d like to introduce an LLM chunker and see how they compare

    • @bastabey2652
      @bastabey2652 8 месяцев назад

      @@jamesbriggs better than any non LLM chunker.. if we aim to empower user's with AI, why not empower the developer? chunking is not easy

  • @MrMoonsilver
    @MrMoonsilver 8 месяцев назад

    Can this be used to create chunks for creating a training dataset as well? It would be great to chunk a document into 'statements' and use those statements for a dataset. In essence have a LLM create questions for each of those statements and use those pairs for training. Could you make a video to show how that works?

  • @amantandon-ln9xx
    @amantandon-ln9xx 8 месяцев назад

    I see the #abstract is also with #title ideally both should be in different chunks so that LLM can understand better semantics.

  • @brianferrell9454
    @brianferrell9454 8 месяцев назад

    Do you think this causes the results to be biased towards smaller chunks? Because the user will only query probably no more than 10 words . So the highest semantic similar results may also only be 10 words and the chunks that are 400 tokens wouldn't have as high as a score unless you provide more context to the query?

  • @MrDespik
    @MrDespik 8 месяцев назад

    Hi James. Excuse me, maybe I missed it. But how you handle the situation that when we use semantic chunking we miss pages numbers for chunks? Is it possible to receive it with using this package?

  • @talesfromthetrailz
    @talesfromthetrailz 8 месяцев назад

    Dude already embedded whole documents of texts into PC haha would've helped a month ago. But awesome thanks for this! 🤘🏾

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад +1

      Maybe for the next project 😅

    • @talesfromthetrailz
      @talesfromthetrailz 8 месяцев назад

      @@jamesbriggs quick question man. Is the objective of semantic chunking to achieve broader search results? Or to decrease query times? I'm thinking of it in terms of medium sized text docs, for example movies summaries and such. Thanks!

  • @trn450
    @trn450 8 месяцев назад

    Great material. 🙏

  • @FDasdana
    @FDasdana 4 месяца назад

    Does this library support ollama, gemini or hf encoders also or Is it only for chatgpt?

    • @jamesbriggs
      @jamesbriggs  4 месяца назад

      it supports these encoders github.com/aurelio-labs/semantic-router/tree/main/semantic_router/encoders

  • @botondvasvari5758
    @botondvasvari5758 8 месяцев назад

    and how can I use big models from huggingface ? I can't load them into memory because many of them are bigger than 15gb, some of them are 130gb+ . Any thoughts?

  • @swethak7198
    @swethak7198 5 месяцев назад

    i have a doubt that i have a document which has the many page references to one to another page, should i want to group all the data into the same chunks (like to get data from first page and in this reference page number is in page 3 means should i get data from both pages and store it a single chunk ) does is this only way or is there any special models . Else give some idea

    • @drosi1994
      @drosi1994 4 месяца назад

      Hmm that's an issue that you could solve in the retrieving stage not chunking... When you retrieve a chunk you can check with an LLM fast model if it has references to another one to get them as well

  • @manslaughterinc.9135
    @manslaughterinc.9135 4 месяца назад

    Unfortunately, the semantic router has removed this feature, or refactored it in some way.

    • @jamesbriggs
      @jamesbriggs  4 месяца назад

      hey yes they were deprecated in favour of this ruclips.net/video/7JS0pqXvha8/видео.html

  • @mrchongnoi
    @mrchongnoi 8 месяцев назад

    Why not chunk based on paragraphs, lists, and tables.

  • @maharun
    @maharun 3 месяца назад +1

    Using the semantic chunker is giving this error even thought I'm not using cohere:
    cannot import name 'EmbedResponse_EmbeddingsByType' from 'cohere.types.embed_response'
    how to solve it? i have already wasted on day on it.. this is so annoying.. plz help.. :)

    • @jamesbriggs
      @jamesbriggs  3 месяца назад

      cohere did a surprise SDK update and they are a default package in the library (we may change this) - try doing a `pip install -qU semantic-chunkers semantic-router==0.68`
      more info here if needed github.com/aurelio-labs/semantic-router/issues/422

  • @jimmc448
    @jimmc448 8 месяцев назад +1

    My son just asked if you were the Rock

  • @saqqara6361
    @saqqara6361 8 месяцев назад +1

    "What is the title of the document?" -> 99% of RAG pipelines fail, because there is not answer in the document as it is embedded,

    • @jamesbriggs
      @jamesbriggs  8 месяцев назад

      in that case we can try including the title in our chunk, and possibly consider different routing logic for this type of query - something that triggers when a user asks for metadata about a received document we trigger a function that identifies the document ID in previously retrieved contexts, and uses that to pull in the document metadata for the answer to be generated by the LLM

  • @itzuditsharma
    @itzuditsharma 8 месяцев назад

    I am facing the problem in my jupyter notebook as this, please help
    2024-05-10 10:59:50 WARNING semantic_router.utils.logger Retrying in 2 seconds...