Florence-2: Fine-tune Microsoft’s Multimodal Model

Поделиться
HTML-код
  • Опубликовано: 22 июл 2024
  • Learn how to fine-tune Microsoft's Florence-2, a powerful open-source Vision Language Model, for custom object detection tasks. This in-depth tutorial guides you through setting up your environment in Google Colab, preparing datasets, and optimizing the model using LoRA.
    Chapters:
    - 00:00 Introduction: Unlock the Power of Florence-2
    - 01:09 Getting Started: Prepare for VLM Fine-Tuning
    - 03:55 Florence-2 in Action: Explore Pre-trained Capabilities
    - 07:00 Dataset Deep Dive: PyTorch Data Loading for Florence-2
    - 13:02 LoRA: Optimize Your VLM Training
    - 14:21 Fine-Tuning: Unleash Florence-2's Custom Object Detection
    - 17:30 Model Evaluation: Measure Your VLM's Success
    - 21:37 Florence-2 vs Other Computer Vision Models
    - 24:09 Conclusion and Next Steps
    Resources:
    - Roboflow: roboflow.com
    - 🔴 Community Session July 3th, 2024 at 08:00 AM PST / 11:00 AM EST / 05:00 PM CET: roboflow.stream
    - ⭐ Notebooks GitHub: github.com/roboflow/notebooks
    - 📓 Florence notebook: colab.research.google.com/git...
    - 🗞 Florence-2 arXiv paper: arxiv.org/abs/2311.06242
    - 🗞 Florence-2 overview blog post: blog.roboflow.com/florence-2
    - 🗞 Florence-2 fine-tuning blog post: blog.roboflow.com/fine-tune-f...
    - 🔗 Florence-2 HF Space: huggingface.co/spaces/gokaygo...
    - 🗞 Mean Average Precision (mAP) blog post: blog.roboflow.com/mean-averag...
    - 🗞 Confusion Matrix blog post: blog.roboflow.com/what-is-a-c...
    Stay updated with the projects I'm working on at github.com/roboflow and github.com/SkalskiP! ⭐
  • НаукаНаука

Комментарии • 48

  • @SridharanS-vz7re
    @SridharanS-vz7re 20 дней назад +5

    how to train this model on custom dataset for OCR

  • @abdshomad
    @abdshomad 21 день назад +2

    I've been waiting for this tutorial for days.
    Thank you again for being the first to comprehensively review this new model.
    Super exited! 🎉🥳

    • @Roboflow
      @Roboflow  21 день назад

      As usual you are the first one to comment on the video! Thanks a lot for all the support! 🔥

  • @artem-yw8km
    @artem-yw8km 19 дней назад +1

    Thank you for this turtorial, was working on these kind of setup for a couple of days. You definetely could save lot of time

    • @Roboflow
      @Roboflow  19 дней назад

      Sad I didn’t save your time this time.

  • @arifahnurainia272
    @arifahnurainia272 4 дня назад

    thank you for the video tutorial, you are cool..... 👏👏👏
    I hope there is this tutorial using jupyter notebook 😁

  • @jk_c66
    @jk_c66 19 дней назад +1

    thank you roboflow for providing such nice and lovely tutorials for free and with a nice instructions

  • @SatyamKumar-cb2mt
    @SatyamKumar-cb2mt 20 дней назад +1

    Thanks a ton for this awesome video! Every single term is explained so clearly-it's super helpful.
    I can't wait to dive in the code and start putting this knowledge to use!

    • @Roboflow
      @Roboflow  20 дней назад +1

      Thanks a lot! I really put an effort and try not to fall into a bias (not assume that people know those things).

  • @suphotnarapong355
    @suphotnarapong355 21 день назад +1

    Thank you

  • @Jordufi
    @Jordufi 21 день назад +1

    Nice video, as usual

  • @8-P
    @8-P 20 дней назад

    For the community session I have a couple of (beginner) questions:
    - the google collabs on roboflow seem to be linux based, is there an easy way to make them work on windows?
    - in general, how do I download a model (YOLO) to use in a python app (on windows)
    - are there models that would run for realtime video detection on a regular laptop with an integrated iGPU?
    - I am planning to use a YOLO model for a sports live stream, but only have a simple 3 Year old mid range laptop on me - would it be better to send the stream over to my desktop PC with an Rtx3060Ti-8GB and let the model run there (and send back the detection back and sync on the laptop) - if a laptop is underpowered?
    - for simple applications, like the realtime sports detection of yours, would it be better to run it on my own hardware or investigate in cloud servers for inference?
    Thank you very much for your tutorials, the help a lot!

  • @VLM234
    @VLM234 21 день назад +1

    Very informative video. Thanks for making auch a valuable video free of cost. Just one request when your you make tutorials if possible try to do inferencing, training or fine tuning on agricultural or satellite related data.

    • @Roboflow
      @Roboflow  21 день назад

      Next time I will try to find some cool datasets from this domains

  • @dabaizhang-x5b
    @dabaizhang-x5b 19 дней назад

    Master, could you please tell me if Florence-2 can perform SER (Semantic Entity Recognition) and RE (Relation Extraction) tasks? If so, what should my dataset look like? 🤔

  • @kylewang6704
    @kylewang6704 13 дней назад +1

    Do you have plan to release tutorial about finetuning Florence-2 on other vision tasks such as captioning? I wonder if Florence-2 can be tuned in a way so it can do object detection/classification and also provide the reason for its prediction.

    • @Roboflow
      @Roboflow  12 дней назад +1

      That’s quite possible. We will release a KeyPoint detection video this week and if no new model come out we will drop one more Florence-2 tutorial.

  • @nikilragav
    @nikilragav 12 дней назад

    9:35 how did you see this embedding vector projection thing for the Roboflow 100 datasets?

  • @3DFinalCut1
    @3DFinalCut1 18 дней назад +1

    Thank you for this very informative video. Something like this helps enormously.
    Dziękuję!
    I have a question and maybe someone here can give me a tip. I am looking for a tool that searches for similar images in a folder and shows me the results so that I can clean up the data set later. I have already found tools that do this, but they mainly work with image hashing methods or use fuzzy matching algorithms. But I wonder if there aren't already tools that use AI to solve this task. Does anyone use such a tool?

    • @Roboflow
      @Roboflow  18 дней назад

      You can make it happen using CLIP model. We covered that topic here: ruclips.net/video/YxJkE6FvGF4/видео.html

  • @kylewang6704
    @kylewang6704 16 дней назад +1

    Thank you for the awesome tutorial! I wonder what about the detection accuracy comparing to YOLO based model?

    • @Roboflow
      @Roboflow  16 дней назад +1

      We cover that topic in the video ;) something tells me you didn’t watch till the end.

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 20 дней назад

    Thanks Sir. Please do fine-tuning for Oct, captioning and segmentation task

    • @Roboflow
      @Roboflow  20 дней назад

      Did you tried to run OCR with pre-trained model?

  • @UhuruSsemakula
    @UhuruSsemakula 18 дней назад +1

    Can you please make one for Object detection using web camera?

    • @Roboflow
      @Roboflow  18 дней назад +1

      You mean using Florence-2 and Webcam? Or webcam in general?

    • @UhuruSsemakula
      @UhuruSsemakula 18 дней назад

      Yes Florence-2 and Webcam

  • @yj1548
    @yj1548 19 дней назад +1

    Good video.and I'm curious about what can be done to improve mispelled class names on object detection tasks,do you have any ideas?

    • @Roboflow
      @Roboflow  19 дней назад

      I think you asked me this question on Twitter, but let me answers here as well. 1. Longer training could fix it. 2. Fuzzy class matching. In the video we filter out anything that is not exact match. Hamming distance for example.

  • @sandrojunioraraujo3706
    @sandrojunioraraujo3706 18 дней назад

    Wonderful tutorial! Could you make a tutorial about how to fine tune florence 2 for the segmentation task?

    • @Roboflow
      @Roboflow  14 дней назад

      I'm almost sure I'll create Google Colab covering this topic. Not sure about RUclips video.

  • @jimshtepa5423
    @jimshtepa5423 19 дней назад +1

    can you please upload recording of teh community session for those of us who are in different zone or might otherwise miss the call?

    • @Roboflow
      @Roboflow  19 дней назад

      Sure! All our community sessions are available to re-watch on YT channel

  • @user-mc7tg4pf3i
    @user-mc7tg4pf3i 19 дней назад +2

    Hell Sir Thanks for your all videos and efforts. I am following your channel, but I request you please upload one detail video on how to finetuning Yolov5 model for custome images classification.

  • @bladethirst1
    @bladethirst1 21 день назад +1

    Is this applicable to grade handwritten pdf math assignments?

    • @Roboflow
      @Roboflow  20 дней назад

      Florence-2 can be really good at OCR processing of handwritten text. Not sure about math equations. We would need to confirm that.

    • @bladethirst1
      @bladethirst1 20 дней назад

      @@Roboflow {'': 'In this image we can see a book with some text on it.'} This is the test output of a handwritten math problem deduction, is there someway to get more detailed caption or the OCR output?

  • @ahmeddiaamaroufi867
    @ahmeddiaamaroufi867 10 дней назад

    Hey, I've gone through 10 different companies and I still love yours the most.
    I'm excited about your service and also run two RUclips channels with 560k and 280k subscribers. Could we work together to make a video about your service?
    I have some ideas for how I would do the video. Have you done any work with RUclipsrs in the past?
    I hope we can work together on this collaboration. If you have any questions, feel free to ask.
    Best regards,
    diaa maroufi.

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 20 дней назад +1

    Please sir also tech us how to annotate with it

    • @Roboflow
      @Roboflow  20 дней назад

      You mean how to automatically annotate images?

  • @adrianalbertomarinbalseca7132
    @adrianalbertomarinbalseca7132 7 дней назад

    ufss se ve bacano pero en si necesita de internet esos entrenamientos :/ si en el caso no hubiera

  • @richardobiri2642
    @richardobiri2642 17 дней назад

    what if I want to detect fake and authentic certificates ? please any help

    • @Roboflow
      @Roboflow  14 дней назад

      You mean distinguish between authentic and fake certificates? Do you have a dataset for that?

  • @nicolassuarez2933
    @nicolassuarez2933 Час назад

    Sorry, but if you do not explain how to fine-tune real custom data from scratch, the tutorial is almost useless...