Fine Tune a Multimodal LLM "IDEFICS 9B" for Visual Question Answering

Поделиться
HTML-код
  • Опубликовано: 20 янв 2025

Комментарии • 34

  • @DreamingofDO
    @DreamingofDO 11 месяцев назад +1

    this is the best image captioning model I have trained. All those VLM's are yeti water. This is the stuff, thanks for making a notebook.

  • @avinash-manoli
    @avinash-manoli 9 месяцев назад +3

    Thanks

  • @nurusterling8024
    @nurusterling8024 Год назад +2

    Finally a Multimodal LLM. Thanks a lot

  • @avinash-manoli
    @avinash-manoli 9 месяцев назад +2

    Firstly thank you for the wonderful video. You are one of the few guys making videos on the stuff that really matters in the LLM space. I followed these steps and fine-tuned the model. I tested the results while in the Jupyter notebook and was getting expected results. I chose to save the model locally instead of saving it in hugging face and to load the model locally for Inference. I get the following error "The current model class (IdeficsModel) is not compatible with `.generate()`, as it doesn't have a language model head." while inferencing a Pokemon card. Any thoughts on how I can overcome this ?

  • @digixty545
    @digixty545 7 месяцев назад +2

    Hello! how to download your model that you uploaded it on huggingface and do the inference?
    I tried but failed. Can you make a video on that?

  • @AI_Roller_Buster
    @AI_Roller_Buster 4 месяца назад +1

    How to save this Fine-tune model locally on the system, any help.

  • @trapbushali542
    @trapbushali542 11 месяцев назад +2

    would you plz make the same video on LLaVA?

  • @deekshitht786
    @deekshitht786 25 дней назад

    You are awesome ❤

  • @SaiKiran-jc8yp
    @SaiKiran-jc8yp 11 месяцев назад

    Nice explanation. But someone please clarify how is it going to extract image from the URL ?
    Where is that code to extract image from URL and also
    In the fine tune process are we just tuning on them based on the url? Or the image?
    If it is the image are we converting them into base64?
    And also how can we do inference on the local image path?

  • @ayushsinghal28
    @ayushsinghal28 11 месяцев назад

    can we give multiple images in the prompt for inference

  • @dr.aravindacvnmamit3770
    @dr.aravindacvnmamit3770 Год назад

    If we want to train our own model, suppose we have data lets assume .doc format, so I want to train this data with the existing model, then what format should I use. Not for prediction, I need text generation. passing a query and getting response. Can you help in this matter

  • @superspectrum625
    @superspectrum625 7 месяцев назад

    hi, how to change the code, if I want to finetune it without quantization?

  • @thisurawz
    @thisurawz Год назад

    Really waited for this video. Thank you so much. I have a question. When there is more than 1 modality. How to fine-tune. I mean the coding part. For example, think there are 3 modalities: text, images, and videos. And the samples in the dataset have either images with text or videos with text or both images and videos with text.

  • @phanikumar3136
    @phanikumar3136 10 месяцев назад

    Great video, can u do a video on fine tuning idefics 9b model on custom multimodal local dataset

  • @fintech1378
    @fintech1378 Год назад

    can you do a video on prompting and RAG for multimodal LLM to reduce hallucination? on how to detect this hallucination automatically

  • @hemachandhers
    @hemachandhers 10 месяцев назад

    put a video on ,after fine tuning the idefics we will push to hub and then download again from hub and running it like the original one

  • @StutiGarg-z5e
    @StutiGarg-z5e Год назад

    Please upload fine tuning on Mistral 7b ai with the information about how to fine tune it using your own data model and the data model should be in which format.

  • @thangarajerode7971
    @thangarajerode7971 Год назад

    Pls add the video how to interface the checkpoint pushed into huggingface hub?

  • @ArtistrystoriesUnleashed45
    @ArtistrystoriesUnleashed45 8 месяцев назад

    hi, how can i do inferencing on locally stored images?

  • @ravitanwar9537
    @ravitanwar9537 Год назад

    How to add images through local as the fine tuning dataset?nice video as always

  • @RICHARDSON143
    @RICHARDSON143 Год назад +1

    ❤❤❤

  • @byccc3244
    @byccc3244 Год назад

    you saved my life!

  • @ShubhamKumar-zw7oq
    @ShubhamKumar-zw7oq Год назад

    Can You please start a complete llm course from scratch till fin tuning and everything

    • @akj3344
      @akj3344 Год назад

      What are you not able to find on his channel?

    • @ShubhamKumar-zw7oq
      @ShubhamKumar-zw7oq Год назад

      @@akj3344 it's not structured. I am looking from scratch and end to end project

  • @SMENATH_DEVELOPMENT
    @SMENATH_DEVELOPMENT Год назад

    Video generate , finetune model tutorial pls ...

  • @Vedhar2104
    @Vedhar2104 Год назад

    hi , every video is very interesting and iam also same organisation , can we have call once , you provide any paid course ? if yes what is it and detaiIs pIease

  • @narendraparmar1631
    @narendraparmar1631 9 месяцев назад

    Thanks Bro

  • @arslanabid2245
    @arslanabid2245 Год назад

    Sir please make video on how to serve langchain (chat with pdf) with FAST-API
    thanks

    • @AIAnytime
      @AIAnytime  Год назад

      I already have many videos. Watch my RAG playlist.

  • @user4-j1w
    @user4-j1w Год назад

    Finally thank you

  • @fintech1378
    @fintech1378 Год назад

    awesome