Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing

Поделиться
HTML-код
  • Опубликовано: 21 ноя 2024

Комментарии • 16

  • @intresting2395
    @intresting2395 2 месяца назад

    Just came after seeing post on LinkedIn as I follow you there - going to try on weekends

    • @airoundtable
      @airoundtable  2 месяца назад

      I hope you enjoy the content!

  • @AryanPoddar-d3w
    @AryanPoddar-d3w 2 месяца назад

    This is Pure Masterclass!

    • @airoundtable
      @airoundtable  2 месяца назад

      Thanks! I am glad the video was helpful

  • @terryliu3635
    @terryliu3635 25 дней назад

    Hi Farzad, trust you're having a good weekend. Another quick question from me on this demo...which version of PIL are you using? Most of the codes worked for me however, I run into a small issue while trying to execute "image = dataset[0]["image"]" (under loading cordv2 dataset). The error message is "module 'PIL.Image' has no attribute 'ExifTags'"...thanks!

    • @airoundtable
      @airoundtable  25 дней назад

      Thanks. For that project pillow==10.3.0 on Linux OS

  • @SofiaHuppertz
    @SofiaHuppertz 29 дней назад

    Hi! Thank you very much for this video. I am trying to fine-tune LLAVA on my macbook M3 pro using "mps", but I always run out of memory. I am wondering if it's because of something that I'm doing wrong or if it's the mac lack of support. Also, I wanted to know where I can train LLAVA for free (maybe Kaggle?). Thank you :)

    • @airoundtable
      @airoundtable  28 дней назад

      Hi, I’m glad you liked it! The error you encountered is due to insufficient GPU memory on your PC. Unfortunately, I don't believe there's any free online GPU service capable of training LLAVA. That's why I used HyperStack.
      My suggestion is to choose an affordable GPU provider to train the model. I’ve already shared the steps to set up a VM in HyperStack, which will help you save money if you decide to use that platform.
      Here’s the link to check out their GPU pricing:
      www.hyperstack.cloud/?Influencer&AI%20Round%20Table&Video%201

  • @PareshPawar-y5w
    @PareshPawar-y5w 2 месяца назад

    What do you suggest for that making Python GUI app using tkinkter? or do you prefer other one? do you have any video for it? Thank you in advance!!! Big fan of your teaching!!!

    • @airoundtable
      @airoundtable  2 месяца назад

      Thanks! I haven't used thinkter and I don't have any videos for it in the channel

  • @MuhammadAdnan-tq3fx
    @MuhammadAdnan-tq3fx 2 месяца назад

    Thanks for this informative video. I have a question: how can we perform distributed model training on multiple GPUs? In this video, the training is performed on a single 80GB GPU. For example, if we want to perform the training on multiple GPUs (48,48GB) than what should we do?

    • @airoundtable
      @airoundtable  2 месяца назад +1

      The concept is called model sharding where the architecture will be distributed over multiple GPUs. I haven't done it with LLAVA but to understand it, you can have a look at this pytorch blog:
      pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/
      In pytorch the class that does this is called `FullyShardedDataParallel`. You can find more info about it here:
      pytorch.org/docs/stable/fsdp.html

  • @raminguyen7940
    @raminguyen7940 Месяц назад +1

    I am currently working with this model: LLaVA-v1.6 Mistral 7B. I have my own image dataset, but the images are stored in array format. I would appreciate some guidance on how to convert these images into a suitable input for the model. Below is the code I am using:
    prompt = "What are the things I should be cautious about when I visit this place? What should I bring with me?"
    max_output_token = 500
    prompt = f"[INST]
    {prompt} [/INST]"
    inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")
    output = model.generate(**inputs, max_new_tokens=max_output_token)
    response = processor.decode(output[0], skip_special_tokens=True)
    pprint(response)

  • @divye.ruhela
    @divye.ruhela Месяц назад

    Great video! Subbed! Can you direct me to the resources for how one could train llava to add new classes to it? For instance, teach it to recognize and describe traditional battle poses or describe dishes with their traditional names, etc.?

    • @airoundtable
      @airoundtable  Месяц назад +1

      Thanks. From the technical stand point, what you want to do is very similar with what I did in the video. I also explained how you need to prepare your data for that scenario in the video. There is also a notebook that gives you the hints for data preparation. from there it is just passing the right data to the model and that's it. You have access to everything that you need with this video and the project in my github repository