That's impressive accuracy, thanks for showing this. I wonder how it would do if I wanted to add fields that are use case specific? I'll have to give it a try for sure. Thanks again.
Hi, I may sound unpopular - but I believe in most cases fine-tuning is not required. Qwen2-VL model is general enough to handle various use cases out of the box.
@@AndrejBaranovskij thank you. So if I need to do hand written extraction how can we achieve that. Do we need to use an oct or will it be handled out of box
Hey great video! I have always the problem that my colab run out of memory even if i am running on A100 , tried also your notebook but always the same at # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=1024) do you know any solution?
@@AndrejBaranovskij Thanks u very much , i had to split RAG model to retrieve the page number in one iteration and then try to apply the retrieved image and text to vml to generate the answer.... and i must resized to max_width=600, max_height=800 and still i was using 33 out of 40 available RAM. Do you know how can i improve the use of my RAM to use less Still thanks a lot
@@cristiantironi296 Don't know about RAM improvement. But in general, I always try to use one iteration only - get all page data with Visual LLM and then process this data without LLM, using my own code. In case of multipage doc, splitting it into pages and processing each page separately. Afterwards merging results.
That's impressive accuracy, thanks for showing this. I wonder how it would do if I wanted to add fields that are use case specific? I'll have to give it a try for sure. Thanks again.
It should be able to handle any fields.
Fantastic! Thanks very much
Thanks 👌
Hi thank you for your amazing video. Do you know how to fine tune the qwen2 for this case using our own dataset? Thanks!
Hi, I may sound unpopular - but I believe in most cases fine-tuning is not required. Qwen2-VL model is general enough to handle various use cases out of the box.
How would this handle a PDF consisting of images/diagrams? E.g technical documentation
You can try yourself using sample HF space for this model: huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B
Which OCR do u recommend to use along with this model for hand written dara extraction. I used tesseract the results are not promising.
Qwen2 Vision LLM handles OCR out of the box, you dont need separate OCR.
@@AndrejBaranovskij thank you.
So if I need to do hand written extraction how can we achieve that. Do we need to use an oct or will it be handled out of box
Also would like to know if I can train this model with hand written docs.
I can share few docs if required.
@@hsnavas It should work out of the box with vision LLM as described in this video.
@@hsnavas Normally you dont need to train vision LLM, it already will know how to recognize hand written text
Hey great video! I have always the problem that my colab run out of memory even if i am running on A100 , tried also your notebook but always the same at
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=1024)
do you know any solution?
Hey, I was facing this issue, when input image resolution was too big. It works better, when resolution is resized to max_width=1250, max_height=1750
@@AndrejBaranovskij Thanks u very much , i had to split RAG model to retrieve the page number in one iteration and then try to apply the retrieved image and text to vml to generate the answer.... and i must resized to max_width=600, max_height=800 and still i was using 33 out of 40 available RAM.
Do you know how can i improve the use of my RAM to use less
Still thanks a lot
@@cristiantironi296 Don't know about RAM improvement. But in general, I always try to use one iteration only - get all page data with Visual LLM and then process this data without LLM, using my own code. In case of multipage doc, splitting it into pages and processing each page separately. Afterwards merging results.
Could you please share invoice document?
Sample doc is inside Sparrow repo: github.com/katanaml/sparrow/tree/main/sparrow-ml/llm/data