Hi Farzad, trust you're having a good weekend. Another quick question from me on this demo...which version of PIL are you using? Most of the codes worked for me however, I run into a small issue while trying to execute "image = dataset[0]["image"]" (under loading cordv2 dataset). The error message is "module 'PIL.Image' has no attribute 'ExifTags'"...thanks!
Hi! Thank you very much for this video. I am trying to fine-tune LLAVA on my macbook M3 pro using "mps", but I always run out of memory. I am wondering if it's because of something that I'm doing wrong or if it's the mac lack of support. Also, I wanted to know where I can train LLAVA for free (maybe Kaggle?). Thank you :)
Hi, I’m glad you liked it! The error you encountered is due to insufficient GPU memory on your PC. Unfortunately, I don't believe there's any free online GPU service capable of training LLAVA. That's why I used HyperStack. My suggestion is to choose an affordable GPU provider to train the model. I’ve already shared the steps to set up a VM in HyperStack, which will help you save money if you decide to use that platform. Here’s the link to check out their GPU pricing: www.hyperstack.cloud/?Influencer&AI%20Round%20Table&Video%201
What do you suggest for that making Python GUI app using tkinkter? or do you prefer other one? do you have any video for it? Thank you in advance!!! Big fan of your teaching!!!
Thanks for this informative video. I have a question: how can we perform distributed model training on multiple GPUs? In this video, the training is performed on a single 80GB GPU. For example, if we want to perform the training on multiple GPUs (48,48GB) than what should we do?
The concept is called model sharding where the architecture will be distributed over multiple GPUs. I haven't done it with LLAVA but to understand it, you can have a look at this pytorch blog: pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/ In pytorch the class that does this is called `FullyShardedDataParallel`. You can find more info about it here: pytorch.org/docs/stable/fsdp.html
I am currently working with this model: LLaVA-v1.6 Mistral 7B. I have my own image dataset, but the images are stored in array format. I would appreciate some guidance on how to convert these images into a suitable input for the model. Below is the code I am using: prompt = "What are the things I should be cautious about when I visit this place? What should I bring with me?" max_output_token = 500 prompt = f"[INST] {prompt} [/INST]" inputs = processor(prompt, image, return_tensors="pt").to("cuda:0") output = model.generate(**inputs, max_new_tokens=max_output_token) response = processor.decode(output[0], skip_special_tokens=True) pprint(response)
Great video! Subbed! Can you direct me to the resources for how one could train llava to add new classes to it? For instance, teach it to recognize and describe traditional battle poses or describe dishes with their traditional names, etc.?
Thanks. From the technical stand point, what you want to do is very similar with what I did in the video. I also explained how you need to prepare your data for that scenario in the video. There is also a notebook that gives you the hints for data preparation. from there it is just passing the right data to the model and that's it. You have access to everything that you need with this video and the project in my github repository
Just came after seeing post on LinkedIn as I follow you there - going to try on weekends
I hope you enjoy the content!
This is Pure Masterclass!
Thanks! I am glad the video was helpful
Hi Farzad, trust you're having a good weekend. Another quick question from me on this demo...which version of PIL are you using? Most of the codes worked for me however, I run into a small issue while trying to execute "image = dataset[0]["image"]" (under loading cordv2 dataset). The error message is "module 'PIL.Image' has no attribute 'ExifTags'"...thanks!
Thanks. For that project pillow==10.3.0 on Linux OS
Hi! Thank you very much for this video. I am trying to fine-tune LLAVA on my macbook M3 pro using "mps", but I always run out of memory. I am wondering if it's because of something that I'm doing wrong or if it's the mac lack of support. Also, I wanted to know where I can train LLAVA for free (maybe Kaggle?). Thank you :)
Hi, I’m glad you liked it! The error you encountered is due to insufficient GPU memory on your PC. Unfortunately, I don't believe there's any free online GPU service capable of training LLAVA. That's why I used HyperStack.
My suggestion is to choose an affordable GPU provider to train the model. I’ve already shared the steps to set up a VM in HyperStack, which will help you save money if you decide to use that platform.
Here’s the link to check out their GPU pricing:
www.hyperstack.cloud/?Influencer&AI%20Round%20Table&Video%201
What do you suggest for that making Python GUI app using tkinkter? or do you prefer other one? do you have any video for it? Thank you in advance!!! Big fan of your teaching!!!
Thanks! I haven't used thinkter and I don't have any videos for it in the channel
Thanks for this informative video. I have a question: how can we perform distributed model training on multiple GPUs? In this video, the training is performed on a single 80GB GPU. For example, if we want to perform the training on multiple GPUs (48,48GB) than what should we do?
The concept is called model sharding where the architecture will be distributed over multiple GPUs. I haven't done it with LLAVA but to understand it, you can have a look at this pytorch blog:
pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/
In pytorch the class that does this is called `FullyShardedDataParallel`. You can find more info about it here:
pytorch.org/docs/stable/fsdp.html
I am currently working with this model: LLaVA-v1.6 Mistral 7B. I have my own image dataset, but the images are stored in array format. I would appreciate some guidance on how to convert these images into a suitable input for the model. Below is the code I am using:
prompt = "What are the things I should be cautious about when I visit this place? What should I bring with me?"
max_output_token = 500
prompt = f"[INST]
{prompt} [/INST]"
inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")
output = model.generate(**inputs, max_new_tokens=max_output_token)
response = processor.decode(output[0], skip_special_tokens=True)
pprint(response)
I responded to you on LinkedIn
Great video! Subbed! Can you direct me to the resources for how one could train llava to add new classes to it? For instance, teach it to recognize and describe traditional battle poses or describe dishes with their traditional names, etc.?
Thanks. From the technical stand point, what you want to do is very similar with what I did in the video. I also explained how you need to prepare your data for that scenario in the video. There is also a notebook that gives you the hints for data preparation. from there it is just passing the right data to the model and that's it. You have access to everything that you need with this video and the project in my github repository