Great video. A few caveats: When working with documents which can be found online (potentially in the LLM's training data) it would not be clear if the LLM is picking up the errors from the context (whether full text or retrieved chunks) or from memory. This is important if you're building a system and you need to be sure it's identifying errors by inferring from context and not from what it thinks it knows, such as when your corpora come from proprietary data. If the LLM is leveraging what it knows from parametric memory more than inferring from context, then the quality would be significantly lower (as it has no memory of this data).
Thanks! Very interesting comparison! Maybe having a custom splitting function that "splits the text by facts" and then a retrieval process just as you demonstrated (split by pages) could be an improvement?
Great video. A few caveats: When working with documents which can be found online (potentially in the LLM's training data) it would not be clear if the LLM is picking up the errors from the context (whether full text or retrieved chunks) or from memory. This is important if you're building a system and you need to be sure it's identifying errors by inferring from context and not from what it thinks it knows, such as when your corpora come from proprietary data. If the LLM is leveraging what it knows from parametric memory more than inferring from context, then the quality would be significantly lower (as it has no memory of this data).
As always good job Ronan! Some planning for a custom impl of MoA?
Cheers Fabio, what do you mean by MoA? Mixture of Agents?
Yep! Would be very insteresting! Tks
Bro is cooking with the thumbnails.
You like those? Or too much… better to go back raw dawg?
@@TrelisResearch No, they are awesome! They made me want to click the video immediately.
Wow this video was amazing. Thank you for showing the pressure Testing and the needle on a haystack stuff. Great video!
Thanks, appreciate the comment
Thanks! Very interesting comparison!
Maybe having a custom splitting function that "splits the text by facts" and then a retrieval process just as you demonstrated (split by pages) could be an improvement?
Yeah definitely doing better chunking would help
COuld you please do a vide on Graph RAG/ Knowledge graphs? Thanks.
Will take a look again at Graph RAG, I did get that request before too. I'll add to my notes