Fine-tune Multi-modal LLaVA Vision and Language Models

Chat with SQL and Tabular Databases using LLM Agents (DON'T USE RAG!)

Fine Tuning LLaVA

INEVITABLE Season 1 Trailer

Eminem - Fuel (feat. JID) [Official Lyric Video]

Gordon Ramsay Teaches Matthew McConaughey How to Make the Ultimate Steak & Eggs

Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing

Farzad Roozitalab (AI RoundTable)

Просмотров 3,3 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 ноя 2024

Комментарии • 16

@intresting2395 2 месяца назад
Just came after seeing post on LinkedIn as I follow you there - going to try on weekends
@airoundtable 2 месяца назад
I hope you enjoy the content!
@AryanPoddar-d3w 2 месяца назад
This is Pure Masterclass!
@airoundtable 2 месяца назад
Thanks! I am glad the video was helpful
@terryliu3635 25 дней назад
Hi Farzad, trust you're having a good weekend. Another quick question from me on this demo...which version of PIL are you using? Most of the codes worked for me however, I run into a small issue while trying to execute "image = dataset[0]["image"]" (under loading cordv2 dataset). The error message is "module 'PIL.Image' has no attribute 'ExifTags'"...thanks!
@airoundtable 25 дней назад
Thanks. For that project pillow==10.3.0 on Linux OS
@SofiaHuppertz 29 дней назад
Hi! Thank you very much for this video. I am trying to fine-tune LLAVA on my macbook M3 pro using "mps", but I always run out of memory. I am wondering if it's because of something that I'm doing wrong or if it's the mac lack of support. Also, I wanted to know where I can train LLAVA for free (maybe Kaggle?). Thank you :)
@airoundtable 28 дней назад
Hi, I’m glad you liked it! The error you encountered is due to insufficient GPU memory on your PC. Unfortunately, I don't believe there's any free online GPU service capable of training LLAVA. That's why I used HyperStack.
My suggestion is to choose an affordable GPU provider to train the model. I’ve already shared the steps to set up a VM in HyperStack, which will help you save money if you decide to use that platform.
Here’s the link to check out their GPU pricing:
www.hyperstack.cloud/?Influencer&AI%20Round%20Table&Video%201
@PareshPawar-y5w 2 месяца назад
What do you suggest for that making Python GUI app using tkinkter? or do you prefer other one? do you have any video for it? Thank you in advance!!! Big fan of your teaching!!!
@airoundtable 2 месяца назад
Thanks! I haven't used thinkter and I don't have any videos for it in the channel
@MuhammadAdnan-tq3fx 2 месяца назад
Thanks for this informative video. I have a question: how can we perform distributed model training on multiple GPUs? In this video, the training is performed on a single 80GB GPU. For example, if we want to perform the training on multiple GPUs (48,48GB) than what should we do?
@airoundtable 2 месяца назад ⁺¹
The concept is called model sharding where the architecture will be distributed over multiple GPUs. I haven't done it with LLAVA but to understand it, you can have a look at this pytorch blog:
pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/
In pytorch the class that does this is called `FullyShardedDataParallel`. You can find more info about it here:
pytorch.org/docs/stable/fsdp.html
@raminguyen7940 Месяц назад ⁺¹
I am currently working with this model: LLaVA-v1.6 Mistral 7B. I have my own image dataset, but the images are stored in array format. I would appreciate some guidance on how to convert these images into a suitable input for the model. Below is the code I am using:
prompt = "What are the things I should be cautious about when I visit this place? What should I bring with me?"
max_output_token = 500
prompt = f"[INST]
{prompt} [/INST]"
inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")
output = model.generate(**inputs, max_new_tokens=max_output_token)
response = processor.decode(output[0], skip_special_tokens=True)
pprint(response)
@airoundtable Месяц назад
I responded to you on LinkedIn
@divye.ruhela Месяц назад
Great video! Subbed! Can you direct me to the resources for how one could train llava to add new classes to it? For instance, teach it to recognize and describe traditional battle poses or describe dishes with their traditional names, etc.?
@airoundtable Месяц назад ⁺¹
Thanks. From the technical stand point, what you want to do is very similar with what I did in the video. I also explained how you need to prepare your data for that scenario in the video. There is also a notebook that gives you the hints for data preparation. from there it is just passing the right data to the model and that's it. You have access to everything that you need with this video and the project in my github repository

Следующие

Автовоспроизведение

Fine-tune Multi-modal LLaVA Vision and Language Models

Fine-tune Multi-modal LLaVA Vision and Language Models

Chat with SQL and Tabular Databases using LLM Agents (DON'T USE RAG!)

Chat with SQL and Tabular Databases using LLM Agents (DON'T USE RAG!)

Fine Tuning LLaVA

Fine Tuning LLaVA

INEVITABLE Season 1 Trailer

INEVITABLE Season 1 Trailer

Eminem - Fuel (feat. JID) [Official Lyric Video]

Eminem - Fuel (feat. JID) [Official Lyric Video]

Gordon Ramsay Teaches Matthew McConaughey How to Make the Ultimate Steak & Eggs

Gordon Ramsay Teaches Matthew McConaughey How to Make the Ultimate Steak & Eggs

Getting In The Christmas Spirit & Vlogmas Plans | ad

Getting In The Christmas Spirit & Vlogmas Plans | ad

Chat and RAG with Tabular Databases Using Knowledge Graph and LLM Agents

Chat and RAG with Tabular Databases Using Knowledge Graph and LLM Agents

How to Fine-Tune LLama-3.2 Vision language Model on Custom Dataset.

How to Fine-Tune LLama-3.2 Vision language Model on Custom Dataset.

How To Fine-tune LLaVA Model (From Your Laptop!)

How To Fine-tune LLaVA Model (From Your Laptop!)

All-In-One Chatbot: RAG, Generate/analyze image, Web Access, Summarize web/doc, and more...

All-In-One Chatbot: RAG, Generate/analyze image, Web Access, Summarize web/doc, and more...

Fine-tuning Large Language Models (LLMs) | w/ Full Code

Fine-tuning Large Language Models (LLMs) | w/ Full Code

Open Source RAG with Gemma and Langchain | (Deploy LLM on-prem)

Open Source RAG with Gemma and Langchain | (Deploy LLM on-prem)

Fine Tune Vision Model LlaVa on Custom Dataset

Fine Tune Vision Model LlaVa on Custom Dataset

Langchain vs Llama-Index - The Best RAG framework? (8 techniques)

Langchain vs Llama-Index - The Best RAG framework? (8 techniques)

RAG explained: A Step-by-Step Guide to Vector Search and Content Retrieval

RAG explained: A Step-by-Step Guide to Vector Search and Content Retrieval

Don't underestimate anyone

Don't underestimate anyone

How To Eat Oreo #funny #sigma

How To Eat Oreo #funny #sigma

BD556+ Smoke Silencer.Who needs this for Christmas? #toys #gelblasters #gelblasterguns #airsoft

BD556+ Smoke Silencer.Who needs this for Christmas? #toys #gelblasters #gelblasterguns #airsoft

😱 БЕЗУМЦЫ! РФ впервые АТАКОВАЛА Украину МЕЖКОНТИНЕНТАЛЬНОЙ баллистической ракетой #shorts

😱 БЕЗУМЦЫ! РФ впервые АТАКОВАЛА Украину МЕЖКОНТИНЕНТАЛЬНОЙ баллистической ракетой #shorts

Lp. Сердце Вселенной #50 ПЕРВАЯ МАТЬ [Настоящий Междумирец] • Майнкрафт

Lp. Сердце Вселенной #50 ПЕРВАЯ МАТЬ [Настоящий Междумирец] • Майнкрафт

2024 ЛАДА ВЕСТА СВ запуск после простоя #лада #веста #shorts #short

2024 ЛАДА ВЕСТА СВ запуск после простоя #лада #веста #shorts #short

Трамп Миротворец или еще 1000 Дней Войны Путина? | Быть Или

Трамп Миротворец или еще 1000 Дней Войны Путина? | Быть Или

Сокрушительный Удар💥Великая Новоселка И Курахово В Полуокружении⚔️Военные Сводки И Анализ 21.11.2024

Сокрушительный Удар💥Великая Новоселка И Курахово В Полуокружении⚔️Военные Сводки И Анализ 21.11.2024