I've been searching for this tutorial for long time, and I can't express how thankful I am, Aarohi! Your RUclips channel is an absolute gem, and it truly deserves a multitude of subscriptions. The way you effortlessly share your expertise is not only enlightening but also engaging. Keep up the exceptional work!
Very informative tutorial, Thank you. I have the following questions and doubts- 1) During training, how to save the best model only after each epoch, and load that best model after completing training, for future use? (e.g. based on lowest validation loss) 2) How to generate the confusion matrix and also the F-1 Score, Precision, Recall? 3) Finally how to identify actually which test samples are correctly predicted and which test samples are not? 4) Since, after initial 4-5 epochs the gap between training loss and test loss or between train accuracy and test accuracy is increasing continuously, so it needs further fine-tuning, so, please suggest how to do that.
Hello Ma’am Your AI and Data Science content is consistently impressive! Thanks for making complex concepts so accessible. Keep up the great work! 🚀 #ArtificialIntelligence #DataScience #ImpressiveContent 👏👍
please make a landmark detection here in vision transformer. i greatly in need for this project to be finished and the task is to create a 13 landmark detection using vision transformer. and i cant find any resources that teaches how to do a landmark detection if vision transformer. this channel is my only hope.
Good day. Thank you for this wonderful demo. I have a few questions: 1. Are there any other existing vision transformer models that you know of? 2. How do I go about training a model using images corresponded with nutritional values in a certain column range within a separate excel database and spitting out the values predicted when applied to a single image? The name on each image is also identified against each value within the excel file. Many many thanks in advance for the assistance. :)
I am getting the error "ModuleNotFoundError: No module named 'going_modular'" even though the going_modular folder and the Notebook are under the same folder. I am working in Colab. Please Help Ma'am.
how to predict on very large dataset? lets say, you have 30,000 images, then using for loop will be comp. expensive, so , what's the best way to inference from pretrained model on large datasets?
Madam, I have one doubt...Here we use a pretrained model and we are training the model again with our dataset. So my doubts are from where do we get the pre trained model? And for which dataset the pretrained model got trained? Also, after retraining the model with our dataset, the weights will all get changed right?
Hi again, when I print the summary of the Vision Transformer, the Input Shapes for each Layer start with 32. I understand that the very first input [32, 3, 224, 224] means we have originally have an image size 224x224 with 3 colour channels. What does the 32 mean? Is that the batch size, and if so, do I have to change that value if I change my batch size for training?
I am getting this error " ModuleNotFoundError: No module named 'going_modular'" when trying to run it on google colab . how to fix it in google co lab.plz reply
ma'am how do i save and then load the model....since after saving and loading the model, i am not able to get the same predictions..is there any resources i can refer to learn about it
I combine ur code and my code of training process. Add Learning rate scheduler and GPU memory gc. The result and speeds of training become so much beautiful without worry about GPU out of memory
Hi, Thanks for your great video. I am willing to traing the model for some other input size like 448x448. However, the model only takes 224x224 input size or gives error. How can I make neceesary changes?
You'll need to adapt the architecture to accommodate the larger input size. The key components to modify include: 1- In the original ViT, the input image is divided into non-overlapping patches of size 16x16 pixels. For a 448x448 input size, you'll need to adjust the patch size accordingly. To keep it consistent with the original approach, you can use a patch size of 28x28 (448/16). 2- The number of patches depends on the input size and patch size. For 448x448 input and 28x28 patches, you'll have 16x16 = 256 patches. 3- Adjust the embedding dimension to suit your needs. The embedding dimension should still be proportional to the patch size and number of patches. 4- You may need to adjust the number of transformer blocks to accommodate the larger input size. More blocks may be required for better performance. Example- Using PyTorch and Hugging Face Transformers ViT model for a 448x448 input size: import torch from transformers import ViTFeatureExtractor, ViTModel # Modify the feature extractor to match your desired input size feature_extractor = ViTFeatureExtractor( image_size=(448, 448), patch_size=28, # Adjusted patch size ) # Modify the ViT model architecture model = ViTModel( image_size=(448, 448), patch_size=28, num_classes=1000, # Adjust the number of output classes # Modify other parameters as needed (embedding_dim, num_layers, etc.) )
thank you , very good explanation . which pre-trained model you are using here, is that tey are same as cnn pre trained model or you are using only the weights of the pre trained model ? which pre trained model is this >?
Thanks for the tutorial! Is there a quick way to let all images out of a folder get classified by the trained model and to also add the confusion matrix and other metrics therefore?
Also I am wondering about how to convert the images that I wanna get classified into the proper input shape? Can ypu help with that? Thanks in advance!
Thank you! Do you maybe also have an answer for my first question? ( Is there a quick way to let all images out of a folder get classified by the trained model and to also add the confusion matrix and other metrics as the accuarcy, rcall and F1-Score therefore?) @@CodeWithAarohi
Hi Ms.Aarohi, thank you so much for your video. Can I ask, if I want to add callback early stopping, is it correct to modify the file engine in the epoch looping section? Thank you
I prepared my dataset like you. But when i try to train it gives OSError: Caught OSError in DataLoader worker process 0. and image file is truncated (40 bytes not processed). I followed same to same like your code. just applied my own dataset. Can you tell me how to fix it?
You can get the helper_functions.py file from ghere and paste it in your directory github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
Yes, ViT can be applied to video datasets. While ViT was initially designed for processing static images, researchers have extended its application to video data by incorporating temporal information.
I am getting an error of module 'torchvision.models' has no attribute 'ViT_B_16_Weights' 1 # 1. Get pretrained weights for ViT-Base ----> 2 pretrained_vit_weights = torchvision.models.ViT_B_16_Weights.DEFAULT 3 4 # 2. Setup a ViT model instance with pretrained weights 5 pretrained_vit = torchvision.models.vit_B_16(weights=pretrained_vit_weights).to(device) AttributeError: module 'torchvision.models' has no attribute 'ViT_B_16_Weights'
Hi Arohi,i'm trying this code in JArvis pytorch environment i'm getting this error FileNotFoundError: Found no valid file for the classes .ipynb_checkpoints. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp inspite of path correct
Mam is it possible to implement the paper "GA-Nav: Efficient Terrain Segmentation for Robot Navigation in Unstructured Outdoor Environments" I tried it but there is a "ModuleNotFoundError: No module named 'mmcv._ext'" error that I am not able to rectify. If u could show it it would be very helpful
Hello again, I am wondering about why you are using the CategricalCrossEntropy as the loss function. I tried to use Binary Cross Entropy instead as ist is a binary classification problem. I used loss_fn = torch.nn.BCELoss() . Somehow it does not work with your model. Do you have any idea why?
I am receiving this error: "Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 2])) is deprecated. Please ensure they have the same size."
@@CodeWithAarohi But we are dealing with a binary problem, and not a multiclass classification problem, right? So thats why I assume a BCE would be a better loss function
Download helper_functions.py file from here and paste it in your working directory: github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
Modified the Classifier Head: Modifying the classifier head means that you are changing the architecture or parameters of the top layers responsible for making predictions. This can include adding or removing layers, changing the number of neurons, or making other architectural changes to better suit your specific task. Paused All Other Layers: "Pausing" or "freezing" layers means that you are preventing the weights of the layers in the feature extraction backbone from being updated during training. In other words, you are keeping these layers fixed and not allowing them to learn new features during fine-tuning.
Dowmload the going_modular folder from github.com/AarohiSingla/Image-Classification-Using-Vision-transformer and put it in your current working directory.
thanks alot Maam it really helped me. and one more enquiry, using your code, while training my dataset with just 2000 images i had been trainning for more than an hour but not even 1 epochs is completed. it goes it something like forever loop. can you please help @@CodeWithAarohi
If you have added background class for random images which are not a part of these 2 classes then model will take the image of orange as background but if you only have these 2 classes then model will try to provide label to this orange image. Your model will not behave accurately in this case.
You need to do something like this. # save model MODEL_PATH = 'custom-model' model.model.save_pretrained(MODEL_PATH) # loading model model = DetrForObjectDetection.from_pretrained(MODEL_PATH) model.to(DEVICE)
When I am trying to predict an image for my dataset it is showing "The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0" error. Can anyone please help
This means that you're trying to perform an operation that requires the two tensors to have the same size along their first dimension, but they don't match. For example, tensor "a" might have a shape of [4, X], where 4 represents the size of the first dimension. Tensor "b" might have a shape of [3, Y], where 3 represents the size of its first dimension. The error is raised because the size (4) of the first dimension of tensor "a" does not match the size (3) of the first dimension of tensor "b".
Hi thanks for your great video. i faced to this error ### ModuleNotFoundError: No module named 'going_modular', how to download going_modul folder from your link i cannot downloaded this folder
@@CodeWithAarohi I'm not sure .. how to make that work... my code is almost same as the explained code... what should be exactly done to extract it out and loaded it back.....
how to resolve this issue??? ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 6 4 from torch import nn 5 from torchvision import transforms ----> 6 from helper_functions import set_seeds ModuleNotFoundError: No module named 'helper_functions'
Mam please provide a dataset of different fruits and say how to download it and what changes we need to do in code and how to train the model please help me mam
Download the images of different fruits from internet. Then create seperate folder for each fruit and place the related images in it. Changes in code:- 1 Change the path of dataset. 2- Change the number of classes. Watch this video again and you will know where I have discussed about the number of classes and dataset.
For more complex task we have LLM models now where ML and normal neural networks are outdated. Understand first framework that why it is designed and how it operated then implement it using a code . You will understand more
import torch import maxvit # from .maxvit import MaxViT, max_vit_tiny_224, max_vit_small_224, max_vit_base_224, max_vit_large_224 # Tiny model network: maxvit.MaxViT = maxvit.max_vit_tiny_224(num_classes=1000) input = torch.rand(1, 3, 224, 224) output = network(input) my purpose is to do give an input as an image (1,3,224,224) and generate output as its description for that. how should i do that, what should i add more to this code?
To achieve this, you'll need to use a different model architecture and approach, as image classification models like MaxViT are not designed for generating textual descriptions.
I've been searching for this tutorial for long time, and I can't express how thankful I am, Aarohi! Your RUclips channel is an absolute gem, and it truly deserves a multitude of subscriptions. The way you effortlessly share your expertise is not only enlightening but also engaging. Keep up the exceptional work!
Thank you for your heartwarming comment 🙂
Very informative tutorial, Thank you. I have the following questions and doubts-
1) During training, how to save the best model only after each epoch, and load that best model after completing training, for future use? (e.g. based on lowest validation loss)
2) How to generate the confusion matrix and also the F-1 Score, Precision, Recall?
3) Finally how to identify actually which test samples are correctly predicted and which test samples are not?
4) Since, after initial 4-5 epochs the gap between training loss and test loss or between train accuracy and test accuracy is increasing continuously, so it needs further fine-tuning, so, please suggest how to do that.
Hello, Could you answer question 2? f-1 Score, precision ... Do you have code to f1 score ...
Thanks Aarohi, it is brilliant. Great Help to learn ViT
Glad it was helpful!
Hello Ma’am
Your AI and Data Science content is consistently impressive! Thanks for making complex concepts so accessible. Keep up the great work! 🚀 #ArtificialIntelligence #DataScience #ImpressiveContent 👏👍
My pleasure 😊
please make a landmark detection here in vision transformer. i greatly in need for this project to be finished and the task is to create a 13 landmark detection using vision transformer. and i cant find any resources that teaches how to do a landmark detection if vision transformer. this channel is my only hope.
Code with Aarohi is Best RUclips channel for Artificial Intelligence
#BestChannel #RUclipsChannel #ArtificialIntelligence #CodeWithAarohi #DataScience #Engineering #MachineLearning #DataAnalysis #BestLearning #LearnDataScience #DataScienceCourse #AytificialIntelligenceCourse #Codewithaarohi #CodeWithAarohi
Good day. Thank you for this wonderful demo. I have a few questions:
1. Are there any other existing vision transformer models that you know of?
2. How do I go about training a model using images corresponded with nutritional values in a certain column range within a separate excel database and spitting out the values predicted when applied to a single image? The name on each image is also identified against each value within the excel file.
Many many thanks in advance for the assistance. :)
I am getting the error "ModuleNotFoundError: No module named 'going_modular'" even though the going_modular folder and the Notebook are under the same folder. I am working in Colab. Please Help Ma'am.
i have the same probleme but in jupyter , do you resolve this probleme?
i am currently having problems with epochs not running, it keeps taking very long time, what to do
just install the module from the directory in which the module is present, in a different cell
how to predict on very large dataset? lets say, you have 30,000 images, then using for loop will be comp. expensive, so , what's the best way to inference from pretrained model on large datasets?
Awesome upload. How do I save the model or weights which I can load and perform inference later?
Thanks so much ,I was waiting this video from you.
Hope you like it!
Thank you for such a great content!!
Glad you enjoy it!
Thank you! Your video is very informative!
Glad it was helpful!
Madam, I have one doubt...Here we use a pretrained model and we are training the model again with our dataset. So my doubts are from where do we get the pre trained model? And for which dataset the pretrained model got trained? Also, after retraining the model with our dataset, the weights will all get changed right?
Hi again, when I print the summary of the Vision Transformer, the Input Shapes for each Layer start with 32. I understand that the very first input [32, 3, 224, 224] means we have originally have an image size 224x224 with 3 colour channels. What does the 32 mean? Is that the batch size, and if so, do I have to change that value if I change my batch size for training?
Yes, you are correct! The "32" in the input shape [32, 3, 224, 224] refers to the batch size.
I am getting this error "
ModuleNotFoundError: No module named 'going_modular'"
when trying to run it on google colab .
how to fix it in google co lab.plz reply
This is a folder. You can get it from my repo. Place it in the directory where your Jupyter notebook is
@@CodeWithAarohi thanks a lot
@@CodeWithAarohi how can I use this going_modular in google colab, is there any way
@ paste it in your google drive where your colab notebook is
@@CodeWithAarohi thanks
ma'am how do i save and then load the model....since after saving and loading the model, i am not able to get the same predictions..is there any resources i can refer to learn about it
I combine ur code and my code of training process. Add Learning rate scheduler and GPU memory gc. The result and speeds of training become so much beautiful without worry about GPU out of memory
Sounds great!
Could you add how to calculate the confusion matrix and other metrics please?
Hi, Thanks for your great video. I am willing to traing the model for some other input size like 448x448. However, the model only takes 224x224 input size or gives error. How can I make neceesary changes?
You'll need to adapt the architecture to accommodate the larger input size.
The key components to modify include:
1- In the original ViT, the input image is divided into non-overlapping patches of size 16x16 pixels. For a 448x448 input size, you'll need to adjust the patch size accordingly. To keep it consistent with the original approach, you can use a patch size of 28x28 (448/16).
2- The number of patches depends on the input size and patch size. For 448x448 input and 28x28 patches, you'll have 16x16 = 256 patches.
3- Adjust the embedding dimension to suit your needs. The embedding dimension should still be proportional to the patch size and number of patches.
4- You may need to adjust the number of transformer blocks to accommodate the larger input size. More blocks may be required for better performance.
Example- Using PyTorch and Hugging Face Transformers ViT model for a 448x448 input size:
import torch
from transformers import ViTFeatureExtractor, ViTModel
# Modify the feature extractor to match your desired input size
feature_extractor = ViTFeatureExtractor(
image_size=(448, 448),
patch_size=28, # Adjusted patch size
)
# Modify the ViT model architecture
model = ViTModel(
image_size=(448, 448),
patch_size=28,
num_classes=1000, # Adjust the number of output classes
# Modify other parameters as needed (embedding_dim, num_layers, etc.)
)
could not generate a random directory for manager socket , how do i resolve this error?
Thank you soo much mam for this amazing video
Thanks for liking
thank you , very good explanation . which pre-trained model you are using here, is that tey are same as cnn pre trained model or you are using only the weights of the pre trained model ? which pre trained model is this >?
You can check this: github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py Here check class ViT_B_16_Weights(WeightsEnum):
Thanks for the tutorial! Is there a quick way to let all images out of a folder get classified by the trained model and to also add the confusion matrix and other metrics therefore?
Also I am wondering about how to convert the images that I wanna get classified into the proper input shape? Can ypu help with that?
Thanks in advance!
image_transform = transforms.Compose(
[
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
),
]
)
Thank you! Do you maybe also have an answer for my first question? ( Is there a quick way to let all images out of a folder get classified by the trained model and to also add the confusion matrix and other metrics as the accuarcy, rcall and F1-Score therefore?) @@CodeWithAarohi
Hi Ms.Aarohi, thank you so much for your video. Can I ask, if I want to add callback early stopping, is it correct to modify the file engine in the epoch looping section? Thank you
Yes, correct
I prepared my dataset like you. But when i try to train it gives OSError: Caught OSError in DataLoader worker process 0. and image file is truncated (40 bytes not processed). I followed same to same like your code. just applied my own dataset. Can you tell me how to fix it?
did you figure it out ?
mam I am getting no module found error for importing engine from going modular. I have downloaded and copied in the directory. plz help mam
Check the location of going_modular folder and your jupyter notebook. Both should be under same folder
Hi Aarohi, you made it look easy. I have a challenge: I am getting this error: ModuleNotFoundError: No module named 'helper_functions'
You can get the helper_functions.py file from ghere and paste it in your directory github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
@@CodeWithAarohi Thank you. It worked! One more thing, which activation function did you use? and at what stage did you implement it please?
code to print the accuracy , f1 score, precision and recall??
Will create a separate video on it.
Do you have code to f1 score, precision...
Thank you for this great video. Can this be applied to video datasets? or do you have a video link to training ViT on Video dataset? Thank you.
Yes, ViT can be applied to video datasets. While ViT was initially designed for processing static images, researchers have extended its application to video data by incorporating temporal information.
Hi mam
I have Cuda available
But it is giving assertion error
Unable to run with Cuda
Check pytorch version. Is it compiled with cuda.
Nice tutorial
Thanks!
Could you please make one single video completely on "Attention"(including self-attention) architecture? Thank you for these videos.
Sure!
I am getting an error of module 'torchvision.models' has no attribute 'ViT_B_16_Weights'
1 # 1. Get pretrained weights for ViT-Base
----> 2 pretrained_vit_weights = torchvision.models.ViT_B_16_Weights.DEFAULT
3
4 # 2. Setup a ViT model instance with pretrained weights
5 pretrained_vit = torchvision.models.vit_B_16(weights=pretrained_vit_weights).to(device)
AttributeError: module 'torchvision.models' has no attribute 'ViT_B_16_Weights'
Hi Arohi,i'm trying this code in JArvis pytorch environment i'm getting this error FileNotFoundError: Found no valid file for the classes .ipynb_checkpoints. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp inspite of path correct
your code is trying to load files with the .ipynb_checkpoints directory in the path, which isn't a valid image file format
Well done
Thanks
aapne ye predict ka image ka path kahase diya pls boliye
test.jpg image folder mein hain. Jisme jupyter notebook hai.
Mam is it possible to implement the paper "GA-Nav: Efficient Terrain Segmentation for Robot
Navigation in Unstructured Outdoor Environments" I tried it but there is a "ModuleNotFoundError: No module named 'mmcv._ext'" error that I am not able to rectify. If u could show it it would be very helpful
I will try but after finishing the pipelined work.
Hello again,
I am wondering about why you are using the CategricalCrossEntropy as the loss function. I tried to use Binary Cross Entropy instead as ist is a binary classification problem. I used loss_fn = torch.nn.BCELoss() . Somehow it does not work with your model. Do you have any idea why?
I am receiving this error: "Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 2])) is deprecated. Please ensure they have the same size."
The reason for using categorical cross-entropy is that it is well-suited for multi-class classification problems.
The error you're encountering indicates a mismatch between the size of your target labels and the size of the model's output.
@@CodeWithAarohi But we are dealing with a binary problem, and not a multiclass classification problem, right? So thats why I assume a BCE would be a better loss function
Also my programm runs perfectly fine with CrossEntropyLoss(). As soon as I simply change the loss to BCELoss I get the error
hi this video so helpful. im facing a issue with the helper_functions. how can i resolve that issue?
Download helper_functions.py file from here and paste it in your working directory: github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
Mam What actually it means that you have modified Classifer head and pause all other layers?
Modified the Classifier Head: Modifying the classifier head means that you are changing the architecture or parameters of the top layers responsible for making predictions. This can include adding or removing layers, changing the number of neurons, or making other architectural changes to better suit your specific task.
Paused All Other Layers: "Pausing" or "freezing" layers means that you are preventing the weights of the layers in the feature extraction backbone from being updated during training. In other words, you are keeping these layers fixed and not allowing them to learn new features during fine-tuning.
@@CodeWithAarohi ok mam Thank you
Maam i have problem in importing engine of going_modular can you help please
Dowmload the going_modular folder from github.com/AarohiSingla/Image-Classification-Using-Vision-transformer and put it in your current working directory.
thanks alot Maam
it really helped me.
and one more enquiry, using your code, while training my dataset with just 2000 images i had been trainning for more than an hour but not even 1 epochs is completed. it goes it something like forever loop. can you please help @@CodeWithAarohi
if i were to put an image for prediction lets say an image of orange but the only class headers are dandelion and daisy what will the prediction be?
If you have added background class for random images which are not a part of these 2 classes then model will take the image of orange as background but if you only have these 2 classes then model will try to provide label to this orange image. Your model will not behave accurately in this case.
Hello again, how can I save the model to use it later on again?
You need to do something like this.
# save model
MODEL_PATH = 'custom-model'
model.model.save_pretrained(MODEL_PATH)
# loading model
model = DetrForObjectDetection.from_pretrained(MODEL_PATH)
model.to(DEVICE)
Thank you!@@CodeWithAarohi
When I am trying to predict an image for my dataset it is showing "The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0" error. Can anyone please help
This means that you're trying to perform an operation that requires the two tensors to have the same size along their first dimension, but they don't match.
For example, tensor "a" might have a shape of [4, X], where 4 represents the size of the first dimension. Tensor "b" might have a shape of [3, Y], where 3 represents the size of its first dimension. The error is raised because the size (4) of the first dimension of tensor "a" does not match the size (3) of the first dimension of tensor "b".
Thanks for the video
welcome
from going_modular.going_modular import engine
here a problem occur i'm unable to handle this please help me here
What error you are getting?
Awesome
could you please share the dataset link?
How to install going_modular? plz answer me
going_modular is a folder in my repo. You need to put it in current working directory.
Hi thanks for your great video. i faced to this error ### ModuleNotFoundError: No module named 'going_modular', how to download going_modul folder from your link i cannot downloaded this folder
You can get the folder from here:github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
how can i extract the trained model for making an app??
MODEL_PATH = 'custom-model'
model.model.save_pretrained(MODEL_PATH)
@@CodeWithAarohi I'm not sure .. how to make that work... my code is almost same as the explained code... what should be exactly done to extract it out and loaded it back.....
Can you make lectures on MLops please?
Will try
Amazing
Thanks
Please upload the notebooks. It is not there.
github.com/AarohiSingla/Image-Classification-Using-Vision-transformer
@@CodeWithAarohi Thank you very much!
i like your video using image classification transformer, can you also make a video using vision transformer using video dataset
Sure
@@CodeWithAarohi Please was the video on using ViT for videos already done?
how to download that data set?
You can prepare your dataset by creating 2 folders and then put some images in those folders.
how to resolve this issue???
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 6
4 from torch import nn
5 from torchvision import transforms
----> 6 from helper_functions import set_seeds
ModuleNotFoundError: No module named 'helper_functions'
PAste teh helper_functions.py file where your juoyter notebook is
Mam please provide a dataset of different fruits and say how to download it and what changes we need to do in code and how to train the model please help me mam
Download the images of different fruits from internet. Then create seperate folder for each fruit and place the related images in it. Changes in code:- 1 Change the path of dataset. 2- Change the number of classes.
Watch this video again and you will know where I have discussed about the number of classes and dataset.
Awesome. tutorials. Aarohi, Could you please make a code tutorial for video superresolution using ESRGAN ?
Sure, I will do the video after finishing pipelined videos.
Excellent as usual! How well do vision transformerd compare traditional CNNs for image classification?
Vision transformer perform more better than CNN on images task as tested by scientist.
For more complex task we have LLM models now where ML and normal neural networks are outdated. Understand first framework that why it is designed and how it operated then implement it using a code . You will understand more
ModuleNotFoundError: No module named 'going_modular'
going_modular is a folder. You need to put it in your current working directory and please check the path of it.
thnx alot
Most welcome
import torch
import maxvit
# from .maxvit import MaxViT, max_vit_tiny_224, max_vit_small_224, max_vit_base_224, max_vit_large_224
# Tiny model
network: maxvit.MaxViT = maxvit.max_vit_tiny_224(num_classes=1000)
input = torch.rand(1, 3, 224, 224)
output = network(input)
my purpose is to do give an input as an image (1,3,224,224) and generate output as its description for that. how should i do that, what should i add more to this code?
To achieve this, you'll need to use a different model architecture and approach, as image classification models like MaxViT are not designed for generating textual descriptions.