Install Microsoft Florence-2 Model Locally - Best for Vision Tasks

Поделиться
HTML-код
  • Опубликовано: 27 сен 2024
  • This video locally installs Florence-2 which is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks.
    🔥 Buy Me a Coffee to support the channel: ko-fi.com/fahd...
    🔥 Get 50% Discount on any A6000 or A5000 GPU rental, use following link and coupon:
    bit.ly/fahd-mirza
    Coupon code: FahdMirza
    ▶ Become a Patron 🔥 - / fahdmirza
    #florence2 #florence2large
    PLEASE FOLLOW ME:
    ▶ LinkedIn: / fahdmirza
    ▶ RUclips: / @fahdmirza
    ▶ Blog: www.fahdmirza.com
    RELATED VIDEOS:
    ▶ Resource huggingface.co...
    All rights reserved © 2021 Fahd Mirza

Комментарии • 31

  • @EM-yc8tv
    @EM-yc8tv 3 месяца назад +1

    Freaking love Florence. It was a pain to get working on Windows, but once you get it working, it's marvelous. For some reason, OCR is more accurate in MoreDetailedCaption than it is in the Region OCR...but this is a superb model overall.

    • @fahdmirza
      @fahdmirza  3 месяца назад

      Thanks for sharing!

    • @xmagcx1
      @xmagcx1 2 месяца назад

      Could you indicate how you achieved it in Windows? I downloaded different versions of the library and when installing it always throws errors, in the different forums the solutions are not correct @em-yc8tv

  • @divyamchandel8734
    @divyamchandel8734 3 месяца назад +1

    I tried this for our OCR usecase. It is giving decent results. How would you recommend we deploy and test it for scale on production?

    • @fahdmirza
      @fahdmirza  2 месяца назад

      thanks for feedback

    • @vijaybhaskar5333
      @vijaybhaskar5333 2 месяца назад

      I have the same question on deploy and test it for scale on Production. Having issues with FlashAttn

  • @nott8476
    @nott8476 3 месяца назад +1

    Can you make a tutorial to use it for videos?

    • @fahdmirza
      @fahdmirza  3 месяца назад +1

      noted

    • @daryladhityahenry
      @daryladhityahenry 2 месяца назад

      Plus one for this. I really really really having a hard time trying to run this on my windows... Failing so much at building flash attention 2.... It's really appreciated if you can make tutorial for this on windows. THank youuuuu

  • @MrRaycaster
    @MrRaycaster 3 месяца назад +1

    Has anyone used this to catalog images? I have a ton of images I would love meta description added using ai. My Python is rusty but will give it a try. Being able to sort with meta would be huge with large libraries.

    • @fahdmirza
      @fahdmirza  3 месяца назад

      good use case

    • @adityashinde436
      @adityashinde436 3 месяца назад

      if you want to give input a set catalog images and system prompt and description as output, use gemini flash model. which is not free but it is very cheap and gives good results in less time

  • @latent-broadcasting
    @latent-broadcasting 3 месяца назад

    Is it possible to caption images in batch? This would be great for captioning large datasets

    • @fahdmirza
      @fahdmirza  3 месяца назад

      yes I guess so

    • @EM-yc8tv
      @EM-yc8tv 3 месяца назад

      Most certainly yes. The way I did it was to log to a CSV the filename, various image properties such as height/width, I calculated megapixels, calculated inference time per image, and recorded the caption....that's for the case of MoreDetailedCaption. If you do Object Detection and go down the CSV route, you'd want a delimiter other than comma to track the various classes detected, or similarily to keep track of bounding box coordinates.

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 3 месяца назад

    Please share the Jupiter notebook

    • @fahdmirza
      @fahdmirza  3 месяца назад +1

      Its in the model card, link is in description of video

  • @elias-zl6jj
    @elias-zl6jj 3 месяца назад

    How do you compare this to paligemma, which is better

    • @fahdmirza
      @fahdmirza  3 месяца назад

      Both are good with slightly different architecture as explained in their respective videos, thanks. Please also subscribe to the channel.

    • @EM-yc8tv
      @EM-yc8tv 3 месяца назад

      I tried out both. In my limited testing, Florence 2 kicks PaliGemma's rear end...more accurate/truthful...and over twice as fast. I was getting 7-16 seconds inference with Florence, and probably 40+ seconds with PaliGemma. This was with 4GB 3050 Ti, having both CUDA enabled TensorFlow and PyTorch for Florence and PaliGemma, respectively.

  • @ishimaro
    @ishimaro 3 месяца назад

    what specs/gpu type in massedcompute did you use for inferencing in this video?

    • @fahdmirza
      @fahdmirza  2 месяца назад

      its in video description

    • @aproli90
      @aproli90 Месяц назад

      @@fahdmirza Couldnt find the GPU used for this? Can you share again please?

  • @denijane89
    @denijane89 3 месяца назад

    I don't think the ocr of this is very impressive. I tested it on a plot of a function and it didn't guess correctly even the direction.

    • @fahdmirza
      @fahdmirza  3 месяца назад

      thanks for feedback

  • @shawnvines2514
    @shawnvines2514 3 месяца назад +2

    I hope they add this to LM Studio soon

    • @fahdmirza
      @fahdmirza  3 месяца назад +1

      yeah that would be good. Please also subscribe to the channel.

    • @shawnvines2514
      @shawnvines2514 3 месяца назад

      ​@@fahdmirzaI already am 😁

    • @fahdmirza
      @fahdmirza  3 месяца назад +1

      Thanks mate, just trying my hands on some marketing :)

    • @bigglyguy8429
      @bigglyguy8429 3 месяца назад

      Look at Pinokio, it's a way of easily installing this sort of stuff. I have Florence installed, now I'm trying to find if you can ask it questions about an image...