It’d be excellent if you could test gpt4o and Flash against your RAG and show the results like you did in this video. That would be a nice demonstration of different capabilities and results of course with the use of local LLM
Hi, can you do a video on this: In a typical AI workflow, you might pass the same input tokens over and over to a model. Using the Gemini API context caching feature, you can pass some content to the model once, cache the input tokens, and then refer to the cached tokens for subsequent requests. At certain volumes, using cached tokens is lower cost than passing in the same corpus of tokens repeatedly.
Impressive model. Thank you for the video. I think the main benefit from classic RAG so far for me has been citations and clear sourcing (where the llm can return which page it is using for information). How well does Gemini Flash return this kind of info?
In scientific papers tables are usually in text format. Latex just uses fancy formatting of text to make tables, so table content extraction is not test of visual capabilities of a model.
Thanks for your videos and course. You said at the beginning Gemini 1.5 was only good for small docs what would you recommend for a large corpus of multi-modal PDF requirements? Would an agentic approach work to breakup the PDFs into buckets and a single agent to combine responses?
One Q that I missed: when making API calls to our pdf, does our private data become publicly available in any way? Another amazing vid. Really appreciate all the work you put into making great content.
For free api, Google does say, they can use it for training. For paid api, that doesn't seem to be case. Now just like the other api providers, really it's on your own comfort level and how much you trust their words :)
I don't like using libraries to parse my PDF files. I found it to be more complex and less robust than writing the parsing services myself. I will defintely give flash a try though.
This review is basically pointless. Youre running it on one pdf. The whole pdf can easily be dumped into the context (oai default is 20 x 1000 token chunk). You should be doing it on much larger datasets
RAG in general has been slowly dying as context increases are combined with cost decrease. On top of that, folk are getting better at compression and database use (LLMs understand SQL, etc), and agentic flows. The speed loss and cost to maintain a vector database, just isnt always worth it when I can simply task a flow itself for semantic search and feed it to whatever needs it.
RAG is not dying. It merely depends on the use-case. It was even mentioned several times in this video where this is not a replacement for RAG where there is a large corpus of information (millions of docs). It certainly is evolving however, and quite rapidly. I would love to get to the point where I can avoid having to parse pdfs and documents completely, and just feed docs to a vision model & have that the chunks stored directly in a db. But getting rid of RAG completely? Nah. Not yet. I would say RAG would only go away if there's some way where model training reaches a point you can just throw docs at it and rather than feeding them into a vector db, you can feed docs directly into the llm itself.
i wanted to build a previous year paper analysis system for my colllege ( engineering ) , there are total 7 departments , all subjects come upto 7*6*8. Can you just guide fine tuning or Rag ??
Check out the RAG Beyond Basics Course: prompt-s-site.thinkific.com/courses/rag
It’d be excellent if you could test gpt4o and Flash against your RAG and show the results like you did in this video. That would be a nice demonstration of different capabilities and results of course with the use of local LLM
Yes!
That would be great
Hi, can you do a video on this:
In a typical AI workflow, you might pass the same input tokens over and over to a model. Using the Gemini API context caching feature, you can pass some content to the model once, cache the input tokens, and then refer to the cached tokens for subsequent requests. At certain volumes, using cached tokens is lower cost than passing in the same corpus of tokens repeatedly.
What if Gemma 2 is also able to do this. How could we test this?
Impressive model. Thank you for the video.
I think the main benefit from classic RAG so far for me has been citations and clear sourcing (where the llm can return which page it is using for information). How well does Gemini Flash return this kind of info?
I haven't tested it on multiple files yet but I suspect that should be possible. I will put together a new tutorial on it when I get a chance.
In scientific papers tables are usually in text format. Latex just uses fancy formatting of text to make tables, so table content extraction is not test of visual capabilities of a model.
Thanks for your videos and course. You said at the beginning Gemini 1.5 was only good for small docs what would you recommend for a large corpus of multi-modal PDF requirements? Would an agentic approach work to breakup the PDFs into buckets and a single agent to combine responses?
What about using Gemini Flash to parse the PDFs into markdown and optimally structure it for LLMs and then embedding for RAG?
Pursuing this idea
@@wesleymogaka report back once you do it. Maybe send the RUclipsr a link so he can also review it and give you some exposure
Hi. Can u show us how to get to the UI ?
One Q that I missed: when making API calls to our pdf, does our private data become publicly available in any way? Another amazing vid. Really appreciate all the work you put into making great content.
For free api, Google does say, they can use it for training. For paid api, that doesn't seem to be case. Now just like the other api providers, really it's on your own comfort level and how much you trust their words :)
your Colab link doesn't work. It doesn't open
love the meta paper choice to scan
Thanks
Thank you 😊
small number of pdf means how many? whats ur assumption?
As long as they fit in the context, which is 1M, although I would suggest using about 50-70% of that. Using more can result in lost in the middle
I don't like using libraries to parse my PDF files. I found it to be more complex and less robust than writing the parsing services myself. I will defintely give flash a try though.
Agree, its worth a shot.
Please run any ad compaign for your channel as your channel has the potential to get 500k subscribes in a hour.
Why testing Gemini flash? Does Gemini Pro not work better?
Pro is better but has more limitations for free usage.
thank you so much for this video
great i will test it -:)
Let me know how it goes
This review is basically pointless. Youre running it on one pdf. The whole pdf can easily be dumped into the context (oai default is 20 x 1000 token chunk). You should be doing it on much larger datasets
RAG in general has been slowly dying as context increases are combined with cost decrease. On top of that, folk are getting better at compression and database use (LLMs understand SQL, etc), and agentic flows.
The speed loss and cost to maintain a vector database, just isnt always worth it when I can simply task a flow itself for semantic search and feed it to whatever needs it.
RAG is not dying. It merely depends on the use-case. It was even mentioned several times in this video where this is not a replacement for RAG where there is a large corpus of information (millions of docs). It certainly is evolving however, and quite rapidly. I would love to get to the point where I can avoid having to parse pdfs and documents completely, and just feed docs to a vision model & have that the chunks stored directly in a db. But getting rid of RAG completely? Nah. Not yet. I would say RAG would only go away if there's some way where model training reaches a point you can just throw docs at it and rather than feeding them into a vector db, you can feed docs directly into the llm itself.
i wanted to build a previous year paper analysis system for my colllege ( engineering ) , there are total 7 departments , all subjects come upto 7*6*8. Can you just guide fine tuning or Rag ??
For this, my recommendation will be to use RAG for it.
Cool thanks @@engineerprompt
Great video.
thank you!
Is there demand of rag in the market ?
RAG is the only real application of GenAI at the moment that businesses are actually widely using.
gemini 1.5 pro also has this new feature i think
Yes, it does. Its relatively more expensive though if you put it in production.
Why would u want to pay for cloud GPT !?!? Do it yourself.
checkout localgpt for that :)
As usual I will wait for third parties to verify which google's claims are real and which are just another scam.