Vision Transformer for Image Classification Using transfer learning

Code With Aarohi

Просмотров 16 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 12 янв 2025

Комментарии •

@dr.noushathshaffi7515 Год назад ⁺³
I've been searching for this tutorial for long time, and I can't express how thankful I am, Aarohi! Your RUclips channel is an absolute gem, and it truly deserves a multitude of subscriptions. The way you effortlessly share your expertise is not only enlightening but also engaging. Keep up the exceptional work!
@CodeWithAarohi Год назад
Thank you for your heartwarming comment 🙂
@debjitdas1714 11 месяцев назад ⁺¹
Very informative tutorial, Thank you. I have the following questions and doubts-
1) During training, how to save the best model only after each epoch, and load that best model after completing training, for future use? (e.g. based on lowest validation loss)
2) How to generate the confusion matrix and also the F-1 Score, Precision, Recall?
3) Finally how to identify actually which test samples are correctly predicted and which test samples are not?
4) Since, after initial 4-5 epochs the gap between training loss and test loss or between train accuracy and test accuracy is increasing continuously, so it needs further fine-tuning, so, please suggest how to do that.
@salihsalur4855 8 месяцев назад
Hello, Could you answer question 2? f-1 Score, precision ... Do you have code to f1 score ...
@shounakdas1001 Год назад
Thanks Aarohi, it is brilliant. Great Help to learn ViT
@CodeWithAarohi Год назад
Glad it was helpful!
@soravsingla6574 Год назад
Hello Ma’am
Your AI and Data Science content is consistently impressive! Thanks for making complex concepts so accessible. Keep up the great work! 🚀 #ArtificialIntelligence #DataScience #ImpressiveContent 👏👍
@CodeWithAarohi Год назад
My pleasure 😊
@sanjoetv5748 Год назад
please make a landmark detection here in vision transformer. i greatly in need for this project to be finished and the task is to create a 13 landmark detection using vision transformer. and i cant find any resources that teaches how to do a landmark detection if vision transformer. this channel is my only hope.
@soravsingla6574 Год назад
Code with Aarohi is Best RUclips channel for Artificial Intelligence
#BestChannel #RUclipsChannel #ArtificialIntelligence #CodeWithAarohi #DataScience #Engineering #MachineLearning #DataAnalysis #BestLearning #LearnDataScience #DataScienceCourse #AytificialIntelligenceCourse #Codewithaarohi #CodeWithAarohi
@ambikajadoonanan2852 Год назад
Good day. Thank you for this wonderful demo. I have a few questions:
1. Are there any other existing vision transformer models that you know of?
2. How do I go about training a model using images corresponded with nutritional values in a certain column range within a separate excel database and spitting out the values predicted when applied to a single image? The name on each image is also identified against each value within the excel file.
Many many thanks in advance for the assistance. :)
@JKaks-gr5zm 10 месяцев назад ⁺³
I am getting the error "ModuleNotFoundError: No module named 'going_modular'" even though the going_modular folder and the Notebook are under the same folder. I am working in Colab. Please Help Ma'am.
@Ikramkrt 10 месяцев назад
i have the same probleme but in jupyter , do you resolve this probleme?
@Ritam_Goswami_ 7 месяцев назад
i am currently having problems with epochs not running, it keeps taking very long time, what to do
@Ritam_Goswami_ 7 месяцев назад
just install the module from the directory in which the module is present, in a different cell
@aakashyadav1589 3 месяца назад
how to predict on very large dataset? lets say, you have 30,000 images, then using for loop will be comp. expensive, so , what's the best way to inference from pretrained model on large datasets?
@sathishkumars4463 Год назад ⁺¹
Awesome upload. How do I save the model or weights which I can load and perform inference later?
@danielasefa8087 Год назад
Thanks so much ,I was waiting this video from you.
@CodeWithAarohi Год назад
Hope you like it!
@neelshah1651 Год назад
Thank you for such a great content!!
@CodeWithAarohi Год назад
Glad you enjoy it!
@НиколайНовичков-е1э Год назад
Thank you! Your video is very informative!
@CodeWithAarohi Год назад
Glad it was helpful!
@anishmgeorge207 7 месяцев назад
Madam, I have one doubt...Here we use a pretrained model and we are training the model again with our dataset. So my doubts are from where do we get the pre trained model? And for which dataset the pretrained model got trained? Also, after retraining the model with our dataset, the weights will all get changed right?
@MaryBrockyn Год назад
Hi again, when I print the summary of the Vision Transformer, the Input Shapes for each Layer start with 32. I understand that the very first input [32, 3, 224, 224] means we have originally have an image size 224x224 with 3 colour channels. What does the 32 mean? Is that the batch size, and if so, do I have to change that value if I change my batch size for training?
@CodeWithAarohi Год назад
Yes, you are correct! The "32" in the input shape [32, 3, 224, 224] refers to the batch size.
@swatimishra1555 Месяц назад ⁺¹
I am getting this error "
ModuleNotFoundError: No module named 'going_modular'"
when trying to run it on google colab .
how to fix it in google co lab.plz reply
@CodeWithAarohi Месяц назад ⁺¹
This is a folder. You can get it from my repo. Place it in the directory where your Jupyter notebook is
@swatimishra1555 Месяц назад ⁺¹
@@CodeWithAarohi thanks a lot
@swatimishra1555 Месяц назад ⁺¹
@@CodeWithAarohi how can I use this going_modular in google colab, is there any way
@CodeWithAarohi Месяц назад ⁺¹
@ paste it in your google drive where your colab notebook is
@swatimishra1555 Месяц назад ⁺¹
@@CodeWithAarohi thanks
@Vibhu-ts8dh 8 месяцев назад
ma'am how do i save and then load the model....since after saving and loading the model, i am not able to get the same predictions..is there any resources i can refer to learn about it
@hulkbaiyo8512 Год назад
I combine ur code and my code of training process. Add Learning rate scheduler and GPU memory gc. The result and speeds of training become so much beautiful without worry about GPU out of memory
@CodeWithAarohi Год назад
Sounds great!
@FERNANDOVALLE-ig8gl 8 месяцев назад
Could you add how to calculate the confusion matrix and other metrics please?
@devavratpro7061 Год назад
Hi, Thanks for your great video. I am willing to traing the model for some other input size like 448x448. However, the model only takes 224x224 input size or gives error. How can I make neceesary changes?
@CodeWithAarohi Год назад
You'll need to adapt the architecture to accommodate the larger input size.
The key components to modify include:
1- In the original ViT, the input image is divided into non-overlapping patches of size 16x16 pixels. For a 448x448 input size, you'll need to adjust the patch size accordingly. To keep it consistent with the original approach, you can use a patch size of 28x28 (448/16).
2- The number of patches depends on the input size and patch size. For 448x448 input and 28x28 patches, you'll have 16x16 = 256 patches.
3- Adjust the embedding dimension to suit your needs. The embedding dimension should still be proportional to the patch size and number of patches.
4- You may need to adjust the number of transformer blocks to accommodate the larger input size. More blocks may be required for better performance.
Example- Using PyTorch and Hugging Face Transformers ViT model for a 448x448 input size:
import torch
from transformers import ViTFeatureExtractor, ViTModel
# Modify the feature extractor to match your desired input size
feature_extractor = ViTFeatureExtractor(
image_size=(448, 448),
patch_size=28, # Adjusted patch size
)
# Modify the ViT model architecture
model = ViTModel(
image_size=(448, 448),
patch_size=28,
num_classes=1000, # Adjust the number of output classes
# Modify other parameters as needed (embedding_dim, num_layers, etc.)
)
@harshavenkatesh4409 Год назад
could not generate a random directory for manager socket , how do i resolve this error?
@Sunil-ez1hx Год назад
Thank you soo much mam for this amazing video
@CodeWithAarohi Год назад
Thanks for liking
@nandiniloku7747 Год назад
thank you , very good explanation . which pre-trained model you are using here, is that tey are same as cnn pre trained model or you are using only the weights of the pre trained model ? which pre trained model is this >?
@CodeWithAarohi Год назад
You can check this: github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py Here check class ViT_B_16_Weights(WeightsEnum):
@MaryBrockyn Год назад
Thanks for the tutorial! Is there a quick way to let all images out of a folder get classified by the trained model and to also add the confusion matrix and other metrics therefore?
@MaryBrockyn Год назад
Also I am wondering about how to convert the images that I wanna get classified into the proper input shape? Can ypu help with that?
Thanks in advance!
@CodeWithAarohi Год назад ⁺¹
image_transform = transforms.Compose(
[
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
),
]
)
@MaryBrockyn Год назад
Thank you! Do you maybe also have an answer for my first question? ( Is there a quick way to let all images out of a folder get classified by the trained model and to also add the confusion matrix and other metrics as the accuarcy, rcall and F1-Score therefore?) @@CodeWithAarohi
@maharaniizza4601 Год назад
Hi Ms.Aarohi, thank you so much for your video. Can I ask, if I want to add callback early stopping, is it correct to modify the file engine in the epoch looping section? Thank you
@CodeWithAarohi Год назад
Yes, correct
@mehedihasanshojib5831 Год назад
I prepared my dataset like you. But when i try to train it gives OSError: Caught OSError in DataLoader worker process 0. and image file is truncated (40 bytes not processed). I followed same to same like your code. just applied my own dataset. Can you tell me how to fix it?
@harshavenkatesh4409 Год назад
did you figure it out ?
@sandhyarani-wk4mn Год назад
mam I am getting no module found error for importing engine from going modular. I have downloaded and copied in the directory. plz help mam
@CodeWithAarohi Год назад
Check the location of going_modular folder and your jupyter notebook. Both should be under same folder
@올라쿤레아요데지오몰 Год назад
Hi Aarohi, you made it look easy. I have a challenge: I am getting this error: ModuleNotFoundError: No module named 'helper_functions'
@CodeWithAarohi Год назад ⁺¹
You can get the helper_functions.py file from ghere and paste it in your directory github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
@올라쿤레아요데지오몰 Год назад
@@CodeWithAarohi Thank you. It worked! One more thing, which activation function did you use? and at what stage did you implement it please?
@loveofmylifesoumyarashmi9972 Год назад ⁺¹
code to print the accuracy , f1 score, precision and recall??
@CodeWithAarohi Год назад ⁺¹
Will create a separate video on it.
@salihsalur4855 8 месяцев назад
Do you have code to f1 score, precision...
@ericobeng3139 8 месяцев назад
Thank you for this great video. Can this be applied to video datasets? or do you have a video link to training ViT on Video dataset? Thank you.
@CodeWithAarohi 8 месяцев назад ⁺¹
Yes, ViT can be applied to video datasets. While ViT was initially designed for processing static images, researchers have extended its application to video data by incorporating temporal information.
@YashSharma-le3mo Год назад
Hi mam
I have Cuda available
But it is giving assertion error
Unable to run with Cuda
@CodeWithAarohi Год назад
Check pytorch version. Is it compiled with cuda.
@princekhunt1 Месяц назад
Nice tutorial
@CodeWithAarohi Месяц назад
Thanks!
@dipankarporey2171 Год назад
Could you please make one single video completely on "Attention"(including self-attention) architecture? Thank you for these videos.
@CodeWithAarohi Год назад
Sure!
@pranavdubal-c9j 10 месяцев назад
I am getting an error of module 'torchvision.models' has no attribute 'ViT_B_16_Weights'
1 # 1. Get pretrained weights for ViT-Base
----> 2 pretrained_vit_weights = torchvision.models.ViT_B_16_Weights.DEFAULT
3
4 # 2. Setup a ViT model instance with pretrained weights
5 pretrained_vit = torchvision.models.vit_B_16(weights=pretrained_vit_weights).to(device)
AttributeError: module 'torchvision.models' has no attribute 'ViT_B_16_Weights'
@lavanyaravilla1511 21 день назад
Hi Arohi,i'm trying this code in JArvis pytorch environment i'm getting this error FileNotFoundError: Found no valid file for the classes .ipynb_checkpoints. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp inspite of path correct
@CodeWithAarohi 19 дней назад
your code is trying to load files with the .ipynb_checkpoints directory in the path, which isn't a valid image file format
@soravsingla6574 Год назад
Well done
@CodeWithAarohi Год назад
Thanks
@salmatiru8797 День назад
aapne ye predict ka image ka path kahase diya pls boliye
@CodeWithAarohi День назад
test.jpg image folder mein hain. Jisme jupyter notebook hai.
@sidharthpisharody Год назад
Mam is it possible to implement the paper "GA-Nav: Efficient Terrain Segmentation for Robot
Navigation in Unstructured Outdoor Environments" I tried it but there is a "ModuleNotFoundError: No module named 'mmcv._ext'" error that I am not able to rectify. If u could show it it would be very helpful
@CodeWithAarohi Год назад
I will try but after finishing the pipelined work.
@MaryBrockyn Год назад
Hello again,
I am wondering about why you are using the CategricalCrossEntropy as the loss function. I tried to use Binary Cross Entropy instead as ist is a binary classification problem. I used loss_fn = torch.nn.BCELoss() . Somehow it does not work with your model. Do you have any idea why?
@MaryBrockyn Год назад
I am receiving this error: "Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 2])) is deprecated. Please ensure they have the same size."
@CodeWithAarohi Год назад
The reason for using categorical cross-entropy is that it is well-suited for multi-class classification problems.
@CodeWithAarohi Год назад
The error you're encountering indicates a mismatch between the size of your target labels and the size of the model's output.
@MaryBrockyn Год назад
@@CodeWithAarohi But we are dealing with a binary problem, and not a multiclass classification problem, right? So thats why I assume a BCE would be a better loss function
@MaryBrockyn Год назад
Also my programm runs perfectly fine with CrossEntropyLoss(). As soon as I simply change the loss to BCELoss I get the error
@Shaggysus 10 месяцев назад
hi this video so helpful. im facing a issue with the helper_functions. how can i resolve that issue?
@CodeWithAarohi 10 месяцев назад
Download helper_functions.py file from here and paste it in your working directory: github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
@YashSharma-le3mo Год назад
Mam What actually it means that you have modified Classifer head and pause all other layers?
@CodeWithAarohi Год назад ⁺¹
Modified the Classifier Head: Modifying the classifier head means that you are changing the architecture or parameters of the top layers responsible for making predictions. This can include adding or removing layers, changing the number of neurons, or making other architectural changes to better suit your specific task.
Paused All Other Layers: "Pausing" or "freezing" layers means that you are preventing the weights of the layers in the feature extraction backbone from being updated during training. In other words, you are keeping these layers fixed and not allowing them to learn new features during fine-tuning.
@YashSharma-le3mo Год назад
@@CodeWithAarohi ok mam Thank you
@joshuahentinlal205 Год назад
Maam i have problem in importing engine of going_modular can you help please
@CodeWithAarohi Год назад ⁺¹
Dowmload the going_modular folder from github.com/AarohiSingla/Image-Classification-Using-Vision-transformer and put it in your current working directory.
@joshuahentinlal205 Год назад
thanks alot Maam
it really helped me.
and one more enquiry, using your code, while training my dataset with just 2000 images i had been trainning for more than an hour but not even 1 epochs is completed. it goes it something like forever loop. can you please help @@CodeWithAarohi
@shrikar7341 11 месяцев назад
if i were to put an image for prediction lets say an image of orange but the only class headers are dandelion and daisy what will the prediction be?
@CodeWithAarohi 11 месяцев назад
If you have added background class for random images which are not a part of these 2 classes then model will take the image of orange as background but if you only have these 2 classes then model will try to provide label to this orange image. Your model will not behave accurately in this case.
@MaryBrockyn Год назад
Hello again, how can I save the model to use it later on again?
@CodeWithAarohi Год назад
You need to do something like this.
# save model
MODEL_PATH = 'custom-model'
model.model.save_pretrained(MODEL_PATH)
# loading model
model = DetrForObjectDetection.from_pretrained(MODEL_PATH)
model.to(DEVICE)
@MaryBrockyn Год назад
Thank you!@@CodeWithAarohi
@rajatchakraborty2058 8 месяцев назад
When I am trying to predict an image for my dataset it is showing "The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0" error. Can anyone please help
@CodeWithAarohi 8 месяцев назад
This means that you're trying to perform an operation that requires the two tensors to have the same size along their first dimension, but they don't match.
For example, tensor "a" might have a shape of [4, X], where 4 represents the size of the first dimension. Tensor "b" might have a shape of [3, Y], where 3 represents the size of its first dimension. The error is raised because the size (4) of the first dimension of tensor "a" does not match the size (3) of the first dimension of tensor "b".
@kadapallanithin Год назад
Thanks for the video
@CodeWithAarohi Год назад
welcome
@safiullah353 Год назад
from going_modular.going_modular import engine
here a problem occur i'm unable to handle this please help me here
@CodeWithAarohi Год назад
What error you are getting?
@soravsingla8782 Год назад
Awesome
@sharifimroz6231 Год назад
could you please share the dataset link?
@fatematujjohora6163 Год назад
How to install going_modular? plz answer me
@CodeWithAarohi Год назад
going_modular is a folder in my repo. You need to put it in current working directory.
@AbdulQadeerRasooli-l8k Год назад
Hi thanks for your great video. i faced to this error ### ModuleNotFoundError: No module named 'going_modular', how to download going_modul folder from your link i cannot downloaded this folder
@CodeWithAarohi Год назад
You can get the folder from here:github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
@rohitsk5300 Год назад
how can i extract the trained model for making an app??
@CodeWithAarohi Год назад
MODEL_PATH = 'custom-model'
model.model.save_pretrained(MODEL_PATH)
@rohitsk5300 Год назад
@@CodeWithAarohi I'm not sure .. how to make that work... my code is almost same as the explained code... what should be exactly done to extract it out and loaded it back.....
@DataTheory92 Год назад
Can you make lectures on MLops please?
@CodeWithAarohi Год назад
Will try
@pifordtechnologiespvtltd5698 10 месяцев назад
Amazing
@CodeWithAarohi 10 месяцев назад
Thanks
@priyanshupandey3148 Год назад ⁺¹
Please upload the notebooks. It is not there.
@CodeWithAarohi Год назад ⁺¹
github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
@priyanshupandey3148 Год назад
@@CodeWithAarohi Thank you very much!
@cyreneschannel5017 Год назад
i like your video using image classification transformer, can you also make a video using vision transformer using video dataset
@CodeWithAarohi Год назад ⁺¹
Sure
@ericobeng3139 8 месяцев назад
@@CodeWithAarohi Please was the video on using ViT for videos already done?
@shresthjain7557 Год назад
how to download that data set?
@CodeWithAarohi Год назад
You can prepare your dataset by creating 2 folders and then put some images in those folders.
@imrankhan-el2zp 8 месяцев назад
how to resolve this issue???
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 6
4 from torch import nn
5 from torchvision import transforms
----> 6 from helper_functions import set_seeds
ModuleNotFoundError: No module named 'helper_functions'
@CodeWithAarohi 8 месяцев назад
PAste teh helper_functions.py file where your juoyter notebook is
@SuryaPrakash-mp8bz 19 дней назад
Mam please provide a dataset of different fruits and say how to download it and what changes we need to do in code and how to train the model please help me mam
@CodeWithAarohi 19 дней назад
Download the images of different fruits from internet. Then create seperate folder for each fruit and place the related images in it. Changes in code:- 1 Change the path of dataset. 2- Change the number of classes.
Watch this video again and you will know where I have discussed about the number of classes and dataset.
@teetanrobotics5363 Год назад
Awesome. tutorials. Aarohi, Could you please make a code tutorial for video superresolution using ESRGAN ?
@CodeWithAarohi Год назад
Sure, I will do the video after finishing pipelined videos.
@cyberhard Год назад
Excellent as usual! How well do vision transformerd compare traditional CNNs for image classification?
@DataTheory92 Год назад
Vision transformer perform more better than CNN on images task as tested by scientist.
@DataTheory92 Год назад
For more complex task we have LLM models now where ML and normal neural networks are outdated. Understand first framework that why it is designed and how it operated then implement it using a code . You will understand more
@PawanKumar-fu2fh 9 месяцев назад
ModuleNotFoundError: No module named 'going_modular'
@CodeWithAarohi 9 месяцев назад
going_modular is a folder. You need to put it in your current working directory and please check the path of it.
@hussamsarfraz7952 Год назад
thnx alot
@CodeWithAarohi Год назад
Most welcome
@gitgat-wx4vq 9 месяцев назад
import torch
import maxvit
# from .maxvit import MaxViT, max_vit_tiny_224, max_vit_small_224, max_vit_base_224, max_vit_large_224
# Tiny model
network: maxvit.MaxViT = maxvit.max_vit_tiny_224(num_classes=1000)
input = torch.rand(1, 3, 224, 224)
output = network(input)
my purpose is to do give an input as an image (1,3,224,224) and generate output as its description for that. how should i do that, what should i add more to this code?
@CodeWithAarohi 9 месяцев назад
To achieve this, you'll need to use a different model architecture and approach, as image classification models like MaxViT are not designed for generating textual descriptions.

Следующие

Автовоспроизведение

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows