Thanks again for the great work! I have tested similar approach with the vision model. It is especially good for pdf's with lots of unstructured data like graphs, plots, pictures, text, etc... One limitation for this approach was when I created a chatbot and wanted to get the hyperlink within the documents I couldn't because the url of the hyperlink is not visible in the image, but it was not a problem when I used markdown with the standard text based RAG system. Questions: - how many pdf's can I upload? Is there any size limit? - Does the chatbot has a memory of the current conversation? If so, how are you handling it?
Pdf document format is specific right, so maybe posssible to compare results just using that formatted content data? It's closed, owned, controlled by Adobe correct? So why do this?
Any chance you can input the new Mistral Pixtral model in your software? -- It seems to be the best version of a local model for vision, and it's based on Nemo.
Thanks again for the great work! I have tested similar approach with the vision model. It is especially good for pdf's with lots of unstructured data like graphs, plots, pictures, text, etc... One limitation for this approach was when I created a chatbot and wanted to get the hyperlink within the documents I couldn't because the url of the hyperlink is not visible in the image, but it was not a problem when I used markdown with the standard text based RAG system.
Questions:
- how many pdf's can I upload? Is there any size limit?
- Does the chatbot has a memory of the current conversation? If so, how are you handling it?
Would love a video about the detailed architecture and code explanation. Thanks.
Indeed, this is an amazing project. I'll check out the code and give try. Thank you very much for sharing, there's a lot to learn from this one.
Wooohoooo!!! This is so cool! I need more time, I definitely have to test it!!!!
Cool! Is there a context window or any strict limit on the quantity of pages or images that can be uploaded?
WIll try it out
This is amazing ! Thanks will try it out
ERROR - models.indexer - Error during indexing: Unable to get page count. Any ideas?
This is awesome. Very grateful. What is your local setup, GPU?
What would be the complexity level to combining Verbi and Local GPT --Vision? Is this a realistic possibility?
Very nice work
Pdf document format is specific right, so maybe posssible to compare results just using that formatted content data?
It's closed, owned, controlled by Adobe correct?
So why do this?
Any chance you can input the new Mistral Pixtral model in your software? -- It seems to be the best version of a local model for vision, and it's based on Nemo.
Yes, I think it can be added. Will have a look into it.
If poppler is missing under Windows, use: choco install poppler
Great stuff though!! Nice work!
VERY cool!
I like the concept of this, but I don't like the original model selection. Can you add other open ai api's like 4o?
Yes, will update the list with more models
I think google-generativeai is misspelled as google-generative-ai in the requirements.txt
Thanks for pointing it out, will fix that
Qwen2.5 VL 72b support?