Advanced RAG 04 - Contextual Compressors & Filters

Поделиться
HTML-код
  • Опубликовано: 16 окт 2024

Комментарии • 34

  • @MasterBrain182
    @MasterBrain182 Год назад +4

    Astonishing content Man Sam 💯💯 Thanks to share your knowledge with us (thanks for the subtitles too 😄) Thumbs Up from Brazil 👍👍👍

  • @TheAmit4sun
    @TheAmit4sun Год назад +6

    I have found Filters that give yes and not to be of not much of a help. for example, i have embedding of tech docs and then embeddings of order processing system. When filter is set and prompt is submitted with a random query like "can i order pizza with it?" model thinks the context is related to order processing and returns YES which is totally wrong.

  • @shivamroy1775
    @shivamroy1775 Год назад +1

    Great content. Thanks for taking the time to make such videos. I've been learning a lot from them.

  • @billykotsos4642
    @billykotsos4642 Год назад +3

    this is actually an interesting idea...

  • @clray123
    @clray123 Год назад +8

    So in short, in order to make the new revolutionary AI actually useful, you must meticulously hardcode the thinking it is supposed to be doing for you. Feels almost like crafting the expert systems in the 80's! Imagine the expected explosion in productivity from applying that same process! Or let the AI imagine for you (imagination is what it's really good for).

    • @alchemication
      @alchemication Год назад +1

      Yeah. But in some cases I’ve seen, we don’t need that much sophistication and bare bones approach works well 😊 peace

    • @eugeneware3296
      @eugeneware3296 Год назад +6

      RAG is built on retrieval. And retrieval is another word for search. Search is a very hard problem. The difficulty of searching, ranking, filtering to get a good quality set of candidate documents to reason over is underestimated. That's where the complexity. Vector search doesn't directly solve these issues. Search engines like Google has hundreds of ranking factors, including vector searches, re-ranking cross-encoding models, and quality factors. TL; DR - vector search makes for a good demo and proof of concept. For true production systems, there is a lot of complexity and engineering that's required to make these systems work in practice.

    • @hidroman1993
      @hidroman1993 Год назад +2

      LLMs are not the solution to any problem, as always it's the engineering part that brings the actual results

  • @marshallmcluhan33
    @marshallmcluhan33 Год назад +2

    Thoughts on the "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" paper?

    • @samwitteveenai
      @samwitteveenai  Год назад +3

      Interesting paper. I am currently traveling, but will try to make a video about the paper or show some of the ideas in a project when I get a chance

  • @moonly3781
    @moonly3781 7 месяцев назад

    Thank you for the amazing tutorial! I was wondering, instead of using ChatOpenAi, how can I utilize a llama 2 model locally? Specifically, I couldn't find any implementation, for example, for contextual compression, where you pass compressor = LLMChainExtractor.from_llm(llm) with the ChatOpenAi (llm). How can I achieve this locally with llama 2? My use case involves private documents, so I'm looking for solutions using open-source LLMS.

  • @micbab-vg2mu
    @micbab-vg2mu Год назад +1

    Thank you for another great video:)

  • @zd676
    @zd676 Год назад

    First of all, thanks for the great video! As some of the comments have rightfully, while I see some merits for offline use cases, this will be very challenging for real-time use cases. Also, I'm curious how much of a dependency this requires of the chosen LLM to understand and follow the default prompts. It seems the LLM choice and make it or break it, which is quite brittle.

  • @mungojelly
    @mungojelly Год назад +1

    hm when you were going over those instructions that are like, don't change the text, don't do it, repeat it the same, & it's hard to convince it to write the same text out ,, i thought, like, why make it then? if we just like numbered the sentences then it could just respond w/ the numbers of which sentences to include, or smth, maybe that'd save output tokens as well as not give it any chance to imagine things

  • @wiltedblackrose
    @wiltedblackrose Год назад +3

    This is really interesting. My only worry is that this makes it prohibitively slow. The longest part of RAG is often the call to the LLM. I'd be interesting if you could review some companies which have faster models than OpenAI but still have decent performance.

    • @mungojelly
      @mungojelly Год назад +2

      if i was making a chatbot & needed it to not lag before responding, i'd just fake it,,, like how windows has twelve different bars go across & various things slowly fade in so it doesn't seem like it's taking forever to boot XD ,, like i'd send the request simultaneously to both the thoughtful process & also a model that just has instructions to respond immediately echoing the user "ok so what you're saying you want is...." personally i'd even want it to be transparent about what's happening, like, say that it's looking stuff up right now, i'd think of feeding the agent that's looking busy for the user some data about how much we've retrieved and how we've processed it so far so it can say computery things like "i have discovered 8475 documents relevant to your query, and i am currently filtering and compressing them to find the most relevant information"... but you could also just fake it by pretending you have the answer and you're just a little slow at getting to the point,,, like stall for a few seconds by giving a cookiecutter disclaimer about how you're just a hapless ai :D

    • @wiltedblackrose
      @wiltedblackrose Год назад

      @@mungojelly aha, cool. But this doesn't make a difference to when I use it, e.g., for studying at Uni.

    • @mungojelly
      @mungojelly Год назад +1

      @@wiltedblackrose if it's for your own use & there's no customers to offend then you could make it quick & dirty in other ways--- then i'd think of like giving random raw retrieved documents to a little cheap hallucinatey model to see if it gets lucky and can answer right away, then next get answers from progressively slower chains of reasoning,,,,, if it was for my own use i'd definitely make it so there's visual feedback about what stuff it found & what it's doing, since if i made it myself then otherwise obscure visual feedback where documents are flashing by too quickly to read or w/e would make sense to me b/c i knew exactly what it's doing

  • @henkhbit5748
    @henkhbit5748 11 месяцев назад

    Thanks for the video about finetuning RAG. Personally I think the solution of Self-RAG is more generic because its embedded in the LLM...

  • @billykotsos4642
    @billykotsos4642 Год назад +2

    So instead of using an 'Extractive QA model' you prompt an LLM into doing the same thing... amazing how flexible these LLMs are... in this case you are basing your hopes on the models 'reasoning'....

    • @clray123
      @clray123 Год назад +1

      As long as someone else pays for it...

  • @foobars3816
    @foobars3816 11 месяцев назад

    13:09 Sounds like you should be using an llm to narrow down that prompt for each case

  • @shamikbanerjee9965
    @shamikbanerjee9965 Год назад

    Good ideas Sam 👌

  • @RunForPeace-hk1cu
    @RunForPeace-hk1cu Год назад

    Wouldn't it be simpler if you just use a small chunk_size for the initial splitter function when you embed the documents into the vector database?

  • @luisjoseve
    @luisjoseve 9 месяцев назад

    thanks a lot, keep it up!

  • @theunknown2090
    @theunknown2090 11 месяцев назад

    Thabks for the video.

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 8 месяцев назад

    It seems like there's this huge disconnect in understanding of how state of the art 'RAG' works, eg. using document upload in the chatGPT 4 UI, vs all the langchain tutorials etc on RAG, I feel like the community doesn't understand that OpenAI is getting far better results, and seems to be processing embeddings in a way that's much more advanced than langchain based systems do, but that the community isn't even aware that 'langchain RAG' and 'OpenAI internal RAG' are completely different animals. eg. it seems uploaded docs are added as embeddings into a chatGPT 4 query completely orthogonally to the context window, yet all langchain examples I see end up returning text from a 'retriever and shoving this output into the llm context, I don't think good RAG even works that way...

  • @HazemAzim
    @HazemAzim Год назад

    Great . how about cross-encoders and re-reranking

    • @adriangabriel3219
      @adriangabriel3219 Год назад

      i use it and my experience is that it improves retrieval a lot! The out of fashion SentenceTransformers perform amazing there!

    • @HazemAzim
      @HazemAzim Год назад

      I am doing some benchmark testing on arabic datasets and the top I am getting super results with ME5 embeddings with cohere reranker

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      Yes I still have a number more coming in this series.

  • @choiswimmer
    @choiswimmer Год назад +1

    Typo in the thumbnail. It's 4 not 5