How to Set the Chunk Size in Document Splitter | RAG | LangChain

Поделиться
HTML-код
  • Опубликовано: 10 дек 2024

Комментарии • 18

  • @engineerprompt
    @engineerprompt  6 месяцев назад

    If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag

  • @shameekm2146
    @shameekm2146 9 месяцев назад +3

    Thank you so much for pointing this out. I am running a RAG application in production system. The quality of documents I work with is not that great. I have been asked to improve the accuracy of whole RAG pipeline. Hence this is very helpful. :)

  • @adnanrizve5551
    @adnanrizve5551 9 месяцев назад +2

    Very informative video, ❤ your style of explanation. Keep sharing more on this topic

  • @KevinRank
    @KevinRank 9 месяцев назад

    I appreciate these videos. I'm still trying to get this all figured out. I have a new system, and a big reason to get it, is to run local models I can share out with colleagues. (Keep it all local).

  • @engineerprompt
    @engineerprompt  9 месяцев назад

    If you are interested in leanring more about Advanced RAG Course, signup here: tally.so/r/3y9bb0

  • @stunspot
    @stunspot 9 месяцев назад

    Very nice! I do hope you get into some advanced prompting, though. Good prompting can make a huge difference with RAG.

  • @dibu28
    @dibu28 9 месяцев назад +1

    Did you evaluated also Self-RAG or CRAG or GraphRAG or SubDocument-RAG(Summarizing)? To improve answer quality.

  • @TC-Loom
    @TC-Loom 9 месяцев назад +1

    Is anyone adding the overlap, when split by paragraph, to be the last sentence (conclusions) of the ptevious paragraph and the first sentence (introduction/continuation) of the next paragraph? Also, metadata should be keywords produced by a very small local llm, and then you can make a knowledge graph of the keywords.

  • @abdalrhmanalkabani8784
    @abdalrhmanalkabani8784 9 месяцев назад +1

    Thank you for sharing the video. I have a query regarding a PDF document containing numerous tables. I am currently developing a RAG system, and I am encountering challenges in extracting information from the tables using standard PDF loaders. I have explored using GPT-4 on images, which proved successfully, I asked it to extract it using json form and it worked, but I am seeking an automated solution. Could you kindly suggest effective methods to enhance table content extraction ?.

  • @RocktCityTim
    @RocktCityTim 9 месяцев назад

    It seems that not only do you ensure your content context is maintained, but you should also see a more economical parsing of the ingested text. Is that correct?

  • @matthiasandreas6549
    @matthiasandreas6549 6 месяцев назад

    Yes please more. Thanks

  • @alexandrupop7461
    @alexandrupop7461 9 месяцев назад

    Aprreciate the video! Very interesting

  • @samcavalera9489
    @samcavalera9489 9 месяцев назад +2

    Thanks bro! I can't wait to take your RAG course. Btw, here's a great tutorial on evaluating 8 different RAG models. It show a nice comparative analysis of different metrics of different RAG techniques:
    ruclips.net/video/nze2ZFj7FCk/видео.htmlsi=NzgKUeUlTW9ZYn00

  • @alqods80
    @alqods80 9 месяцев назад +1

    Best chunking is using agentic chunking with grouping, but it costs money

    • @maxlgemeinderat9202
      @maxlgemeinderat9202 9 месяцев назад +4

      Not if you are using a local llm as agent

    • @AnonCoder37
      @AnonCoder37 9 месяцев назад

      ​@@maxlgemeinderat9202interesting, thanks

  • @horacioariash
    @horacioariash 9 месяцев назад +1

    Hi. Thanks for your videos. I remember your videos using normally a chunk size = 1000 and overlapping = 200 characters. That was for ChatGPT, LLama?, Mixtral? or others. What is your recommendation size and overlapping for these very well known LLMs?

  • @borisrusev9474
    @borisrusev9474 9 месяцев назад

    Can you make a video comparing how chunk size is a trade-off between accuracy and recall?