Document Querying with Qwen2-VL-7B and JSON Output

Поделиться
HTML-код
  • Опубликовано: 8 ноя 2024

Комментарии • 20

  • @kenchang3456
    @kenchang3456 Месяц назад

    That's impressive accuracy, thanks for showing this. I wonder how it would do if I wanted to add fields that are use case specific? I'll have to give it a try for sure. Thanks again.

  • @harrykekgmail
    @harrykekgmail Месяц назад +1

    Fantastic! Thanks very much

  • @hadyanpratama
    @hadyanpratama 23 дня назад

    Hi thank you for your amazing video. Do you know how to fine tune the qwen2 for this case using our own dataset? Thanks!

    • @AndrejBaranovskij
      @AndrejBaranovskij  22 дня назад +1

      Hi, I may sound unpopular - but I believe in most cases fine-tuning is not required. Qwen2-VL model is general enough to handle various use cases out of the box.

  • @kareemyoussef2304
    @kareemyoussef2304 Месяц назад

    How would this handle a PDF consisting of images/diagrams? E.g technical documentation

    • @AndrejBaranovskij
      @AndrejBaranovskij  Месяц назад

      You can try yourself using sample HF space for this model: huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B

  • @hsnavas
    @hsnavas Месяц назад

    Which OCR do u recommend to use along with this model for hand written dara extraction. I used tesseract the results are not promising.

    • @AndrejBaranovskij
      @AndrejBaranovskij  Месяц назад +2

      Qwen2 Vision LLM handles OCR out of the box, you dont need separate OCR.

    • @hsnavas
      @hsnavas Месяц назад

      @@AndrejBaranovskij thank you.
      So if I need to do hand written extraction how can we achieve that. Do we need to use an oct or will it be handled out of box

    • @hsnavas
      @hsnavas Месяц назад

      Also would like to know if I can train this model with hand written docs.
      I can share few docs if required.

    • @AndrejBaranovskij
      @AndrejBaranovskij  Месяц назад

      @@hsnavas It should work out of the box with vision LLM as described in this video.

    • @AndrejBaranovskij
      @AndrejBaranovskij  Месяц назад +1

      @@hsnavas Normally you dont need to train vision LLM, it already will know how to recognize hand written text

  • @cristiantironi296
    @cristiantironi296 День назад

    Hey great video! I have always the problem that my colab run out of memory even if i am running on A100 , tried also your notebook but always the same at
    # Inference: Generation of the output
    generated_ids = model.generate(**inputs, max_new_tokens=1024)
    do you know any solution?

    • @AndrejBaranovskij
      @AndrejBaranovskij  20 часов назад

      Hey, I was facing this issue, when input image resolution was too big. It works better, when resolution is resized to max_width=1250, max_height=1750

    • @cristiantironi296
      @cristiantironi296 18 часов назад

      @@AndrejBaranovskij Thanks u very much , i had to split RAG model to retrieve the page number in one iteration and then try to apply the retrieved image and text to vml to generate the answer.... and i must resized to max_width=600, max_height=800 and still i was using 33 out of 40 available RAM.
      Do you know how can i improve the use of my RAM to use less
      Still thanks a lot

    • @AndrejBaranovskij
      @AndrejBaranovskij  16 часов назад

      @@cristiantironi296 Don't know about RAM improvement. But in general, I always try to use one iteration only - get all page data with Visual LLM and then process this data without LLM, using my own code. In case of multipage doc, splitting it into pages and processing each page separately. Afterwards merging results.

  • @harunulrasheedshaik5879
    @harunulrasheedshaik5879 Месяц назад

    Could you please share invoice document?

    • @AndrejBaranovskij
      @AndrejBaranovskij  Месяц назад

      Sample doc is inside Sparrow repo: github.com/katanaml/sparrow/tree/main/sparrow-ml/llm/data