The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Поделиться
HTML-код
  • Опубликовано: 21 сен 2024
  • Anthropic has launched a new retrieval mechanism called contextual retrieval, which combines chunking strategies with re-ranking to significantly improve performance. In this video, I explain how this technique enhances retrieval accuracy, including practical implementation steps and benchmark results. Learn how to optimize your RAG systems by adding contextual embeddings, keyword-based BM25 indexing, and re-ranking to achieve state-of-the-art results.
    LINKS:
    www.anthropic....
    github.com/ant...
    • Is This the End of RAG...
    • Advanced RAG with ColB...
    • Vision-Based RAG Syste...
    💻 RAG Beyond Basics Course:
    prompt-s-site....
    Let's Connect:
    🦾 Discord: / discord
    ☕ Buy me a Coffee: ko-fi.com/prom...
    |🔴 Patreon: / promptengineering
    💼Consulting: calendly.com/e...
    📧 Business Contact: engineerprompt@gmail.com
    Become Member: tinyurl.com/y5h...
    💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
    Signup for Newsletter, localgpt:
    tally.so/r/3y9bb0
    00:00 Introduction to Contextual Retrieval
    00:20 Understanding RAG Systems
    00:55 Combining Semantic and Keyword Search
    01:44 Challenges with Standard RAG Systems
    02:48 Anthropic's Contextual Retrieval Approach
    03:37 Implementing Contextual Retrieval
    07:06 Performance Improvements and Benchmarks
    09:02 Best Practices for RAG Systems
    12:48 Code Example and Practical Implementation
    15:21 Conclusion and Final Thoughts
    All Interesting Videos:
    Everything LangChain: • LangChain
    Everything LLM: • Large Language Models
    Everything Midjourney: • MidJourney Tutorials
    AI Image Generation: • AI Image Generation Tu...

Комментарии • 29

  • @tvwithtiffani
    @tvwithtiffani Час назад +5

    For anyone wondering, I did try these methods (contextual retrieval + reranking) with a local model on my laptop. It does work great the rag part but it takes a while to import new documents due to chunking, generating summaries and generating embeddings. Re-ranking on a local model is surprisingly fast and really good with the right model. If you're building an application using rag, I'd suggest you make adding docs the very first step in the on-boarding to your application because you can then do all of the chunking etc in the background. The user might be expecting real-time drag->drop->ask question workflow but it wont work like that unless you're using models in the cloud. Also, remember to chunk, summarize and gen embeddings simultaneously, not one chunk after another as of course that'll take longer for your end-user.

    • @kenchang3456
      @kenchang3456 27 минут назад

      Thanks for the follow-up.

    • @TheShreyas10
      @TheShreyas10 13 минут назад

      Can you share the code if possible

  • @BinWang-b7f
    @BinWang-b7f Час назад +4

    Sending my best to the little one in the background!

  • @MatichekYoutube
    @MatichekYoutube 2 часа назад +1

    do you maybe know what is going on in GPT Assistants - cause they rag is really efficiant - accurate - they have default 800 token chunks and 400 overlap. And it seems to work really well.Perhaps they use somekind of re-ranker also? Maybe you know ..

  • @IAMCFarm
    @IAMCFarm 2 часа назад

    Applying this to local models for large document repos seems like a good combo to increase RAG performance. I wonder how you would optimize for the local environment.

  • @AlfredNutile
    @AlfredNutile 2 часа назад

    Great work!

  • @loicbaconnier9150
    @loicbaconnier9150 3 часа назад +2

    I you want to make, the embedding, bm25 and reranker , just use Colbert it's more efficient...

    • @the42nd
      @the42nd 18 минут назад

      True, but he does mention colbert at 09:45

  • @martinsherry
    @martinsherry Час назад

    V helpful explanation.

  • @robrita
    @robrita 3 часа назад +3

    can hear baby in the background 👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶👶

  • @janalgos
    @janalgos Час назад +1

    how does it compare to hybridRAG?

  • @RedCloudServices
    @RedCloudServices 2 часа назад

    Do you think Visual LLMs like ColPali provide accurate context and results than traditional RAG using text-based LLMs?

  • @limjuroy7078
    @limjuroy7078 29 минут назад

    What happened if the document contains a lot of images like tables, charts, and so on? Can we still chunk the document in a normal way like setting a chunk size?

  • @konstantinlozev2272
    @konstantinlozev2272 4 часа назад

    Losing the context in RAG is a real issue that can destyall usefulness.
    I have read that a combination of the chunks and Graphs is a way to overcome that.
    But have not tested with a use case yet myself.

    • @NLPprompter
      @NLPprompter 3 часа назад

      I'm interested why graph can be useful for LLM to able retrieve better

    • @konstantinlozev2272
      @konstantinlozev2272 2 часа назад +1

      @@NLPprompter My understanding is that graphs condense and formalise the context of a piece of text.
      My use case is a database of case law.
      There are some obvious use cases for that when a paragraph cites another paragraph from another case.
      But beyond that I think there is a lot of opportunity is just representing each judgement in a standardised hierarchical format.
      But I am not 100% sure how to put all together from a software engineering perspective.
      And maybe one could use relational database instead of graphs too.🤔

    • @NLPprompter
      @NLPprompter 2 часа назад +1

      @@konstantinlozev2272 graph indeed is fascinating, maybe I'm not really know what and how it's able related to LLMs, what's makes it interesting is when Grokking state happen and model reach to be able generalize it's training, they tend to create a pattern with their given data, and those pattern are mostly geometric patterns, really fascinating although i tried to understand that paper which i can't comprehend with my little brain.... so i do believe graph rag somehow also have meaning/useful for llm.

    • @konstantinlozev2272
      @konstantinlozev2272 Час назад +1

      @@NLPprompter I guess it will have to be the LLM working with the API of the knowledge graph which function calling

  • @megamehdi89
    @megamehdi89 2 часа назад

    Best wishes for the kid in the background

  • @VerdonTrigance
    @VerdonTrigance 3 часа назад

    How did they put a whole doc into prompt?

    • @vaioslaschos
      @vaioslaschos 3 часа назад

      most commercial LLm have a window of 120k or more. But even if this not the case, you can just take much bigger chunks as context.

  • @MrAhsan99
    @MrAhsan99 41 минуту назад

    You can name the little one "Ahsan" just in case, if you are looking for the names.

  • @micbab-vg2mu
    @micbab-vg2mu 2 часа назад

    interesting :)

  • @NLPprompter
    @NLPprompter 3 часа назад

    so we are going to have chunking model, embedding model, graph model, and conversation model... and they can work within program called by lines of codes, or... they can work freely fuzzyly in agentic way...
    i imagine a UI of game dev, drag and drop pdf to them, they will busy working on that file running around like cute little employee, and when done user can click a pc item then it will.... ah nevermind that would be waste of VRAM

  • @jackbauer322
    @jackbauer322 45 минут назад

    I think the baby in the background disagrees :p

  • @cherepanovilya
    @cherepanovilya 3 часа назад

    old news))

  • @hayho4614
    @hayho4614 2 часа назад

    maybe speaking with a bit more energy would keep me more engaged

  • @ZukunftTrieben
    @ZukunftTrieben 3 часа назад +2

    00:30