For those new to transfer learning: ideally we would like to freeze all of the layers other than the newly added head layer, and train for n epochs, then unfreeze the preceding layers, and train the entire network using a sliced learning rate, where the parameters of the later layers are updated faster than the parameters of the earlier layers. This is how libraries like fastai handle transfer learning out of the box.
for thos who dont know, how these standard_dev and mean are calculated before handedly 1) apply the to_tensor() transformation to the dataset 2) load the dataloader 3) keep the batch size as one for the train dataset data = iter(train_data_loader) data = next(data) print(data[0].mean(), data[0].std())
If I use a Pretrained network, trained with a certain normalisation factors(mean and std), should I use those values or the values calculated with the method explained above from my dataset ? @python engineer
The link in the description that says "More about Transfer Learning" is actually the link to the source code that Patrick copied and provided minimal revisions to (including deleting the comments specifying the original author and licensing info). The original code has a BSD license which requires attribution. Saying "more information here" in the description isn't the same as saying "source code from here". Please provide appropriate attribution in your videos and on your github.
Thanks for your teaching , normally when we transform train data then we use the same transformation on validation data , but here i am seeing train data transformation is transforms.RandomResizedCrop(224) and validation data has transforms.Resize(256). Same with Flip with train data but not on validation data, i am confused?
Hey, appreciate the tutorials, but you keep saying that you explained everything for the training loop in the previous tutorials. However I find it differs significantly with what you covered sofar. E.g. an explaination of what use it is to set the model in training or eval mode is completely missing. I am wondering what that is for, since we didnt need it in the previous tutorials. Also the line with best_model_wts = copy.deepcopy(model.state_dict()) leaves a question mark with me. Could you clear that up ? :)
Great stuff, but after tutorial 14 about CNNs your code explanation got more chaotic/confusing. Until tutorial 14 the code was perfectly explained. I used to write the code alongside with you until tutorial 14 so that I can have some muscle memory and intuition about how to code NNs. Now I'm a little confused.
@@patloeber the approach is very similar to nn.Module colab.research.google.com/github/d2l-ai/d2l-en-colab/blob/master/chapter_natural-language-processing-pretraining/bert.ipynb#scrollTo=5uDSQyw9j7UC
The normal way of doing something like this as far as I know is having *train,* *validation ,* and *test* sets. But here I see only train and val. I watch this video for several times but I still don't understand the val is stands for 'validation' or for 'evaluate'?
How to deal with tabular data and the transfer learning? I have a model that has been trained, and suddenly, the number of features will be increased, so now, how to deal with trained model and various head part of features (the first, dynamic layer of dense neural network) ? Thank you.
Very useful tutorial! Especially appreciate the explanation of how to keep the training from updating pre-trained weights. One question, the validation accuracy here is higher than the training accuracy. I used a different data set and also get a higher accuracy on my validation set. Then with two data sets, same thing. I don't see anything wrong with the code, but that's not the typical expectation no, or am I overlooking something?
Thanks for sharing. I'm coming from TensorFlow and are amazing how easy PyTorch is. A doubt: when you are predicting, wouldn't be necessary to apply sigmoid on the outputs: _, prediction = torch.max(F.sigmoid(outputs), 1)
Thanks for watching! Good question! It depends on the loss function, because some loss functions in PyTorch already apply an activation function like sigmoid or softmax. In this case nn.CrossEntropyLoss() applies the softmax, that's why we don't need another one. I think I explained this in tutorial #11 or some other one.
Thanks for your videos. Can you make a video on "Distillation learning in Pytorch", which is somewhat similar to transfer learning. It follows the Teacher - Student principle.
really it is amazing tutorial ever !! thank you so much , i have one question ? i used pre-trained model for vision transformer model from timm and i finetuned the last layer for num_features and number of our classes m but when i run the code is take a lot of time than normal training!! i don't know what's the problem exactly?? can you help me?
In the code, shouldn't labels range from 0 to 1 instead of 1 to 2, and therefore following line be added after labels = labels.to(device): labels=labels-1
@@patloeber _, preds = torch.max(outputs, 1). doesn't this return preds values as 0 or 1 which we are using in comparing preds and labels in running_corrects += torch.sum(preds == labels.data) ? I was getting error on running the code which went away when I did labels=labels-1
Thank you so much for these tutorials. I have a question regarding the normalization of data. The mean and std you gave are the same for any use case, or they are model-dependent?
This is a great tutorial, i just need one clarification, instead of using cross entropy, can we use it as binary classification, how we can do dis with same folder structure
If I use a Pretrained network, trained with a certain normalisation factors(mean and std), should I use those values or the values calculated with the method explained above from my dataset ?
Hello, I had a few errors and have checked the code 2 times... FIRST - I had to change the argument "scheduler" at the end to match step_lr_scheduler (not sure if that was correct, but it was not defined previously, so it was either than or change step_lr_scheduler to just "scheduler".) THEN... SECOND - when i ran it after the above change, I got this error: (took out some of the path for privacy) ['ants', 'bees'] Epoch 0/1 ---------- train Loss: 0.6369 Acc: 0.6311 Traceback (most recent call last): File "/python_engineer/pytorch15-transfer.py", line 137, in model = train_model(model, criterion, optimizer, scheduler=step_lr_scheduler, num_epochs=2) File "/python_engineer/pytorch15-transfer.py", line 77, in train_model for inputs, labels in dataloaders[phase]: File "/.virtualenvs/torch38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 363, in __next__ data = self._next_data() File "/.virtualenvs/torch38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 403, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/.virtualenvs/torch38/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/.virtualenvs/torch38/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/.virtualenvs/torch38/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 139, in __getitem__ sample = self.transform(sample) File "/.virtualenvs/torch38/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 61, in __call__ img = t(img) File "/.virtualenvs/torch38/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 212, in __call__ return F.normalize(tensor, self.mean, self.std, self.inplace) File "/.virtualenvs/torch38/lib/python3.8/site-packages/torchvision/transforms/functional.py", line 280, in normalize raise TypeError('tensor should be a torch tensor. Got {}.'.format(type(tensor))) TypeError: tensor should be a torch tensor. Got . Any ideas? I do not know how to fix this.
@@patloeber Thank you, that was the issue! Btw, I am having so much fun with PyTorch thanks in part to your amazing tutorials. Very clear, well constructed, and makes learning it easy and fun!
Cn someone help me with this, I tried this but some bugs are there in mu case. I was working for speech classification on speech spectrograms. they are two dimensional , I think resnet is trained for rgb mages (3 channels)? plz guide
Hi ,heck_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
Traceback (most recent call last): File "15_transfer_learning.py", line 156, in step_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1) NameError: name 'optimizer_ft' is not defined Please update your sample code, thank you. (should be optimizer, instead of optimizer_ft ?)
for training and parameter optimization you use train and validation set. the test set should be used afterwards for completely new and unseen data (for example when submitting a kaggle competition)
Big fan of this series, but this video in particular was a bit painful. Code is not explained enough even if you followed the previous videos in the series (especially the training loop is rather different, and normalization of the data is kinda glossed over), and some of the numbers are a bit magic (why do we update the scheduler every 7 epochs?) As others have pointed out, around this video and somewhat in the CNN video, the explanations started degrading a bit. I get you don't wanna re-explain everything that has already been shown, but there really is quite some things here that just appear out of nowhere. Just some (hopefully constructive) criticism.
You literally just copy pasted the tutorial from torch documentation ... you should at least mention that, cause this same example is really well documented on the site ...
For those new to transfer learning: ideally we would like to freeze all of the layers other than the newly added head layer, and train for n epochs, then unfreeze the preceding layers, and train the entire network using a sliced learning rate, where the parameters of the later layers are updated faster than the parameters of the earlier layers. This is how libraries like fastai handle transfer learning out of the box.
thanks for the tip!
Best PyTorch tutorial series available on Internet ! Could you please make a video on "Training models using Tensorboard" as well?
Hi, thank you for the feedback :) Yes this is indeed on my list and I want to do the tutorial in the next few weeks...
The best pytorch tutorial I've ever seen,very foudmamental and good for pytorh beginer
thanks, glad you like it!
i'm from Korea. it was wonderful lectures ever!! thank you.
glad you like it! greetings from Germany
Thanks for creating this playlist. It has been really helpful. Well done, and keep going...
Finally, it was a great journey with you bro. :) Thanks a ton
I'm glad you like it :)
I didn't know about this concept before and was super impressed. That is so damn amazing! Nice explanation as always :D
glad you like it! yes transfer learning is a very important concept!
Thank you for this really nice, simple and efficient course which has become my reference for learning PyTorch and practice deep learning.
thanks so much,bro, it is really good explanation and helpful
Thanks!
for thos who dont know, how these standard_dev and mean are calculated before handedly
1) apply the to_tensor() transformation to the dataset
2) load the dataloader
3) keep the batch size as one for the train dataset
data = iter(train_data_loader)
data = next(data)
print(data[0].mean(), data[0].std())
If I use a Pretrained network, trained with a certain normalisation factors(mean and std), should I use those values or the values calculated with the method explained above from my dataset ? @python engineer
@@zt0t0s Yeah you should calculate individual mean and std for each datasets
Why calculate the mean and std for one batch only?
Awesome job, thanks!
An LSTMs Pytorch tutorial would be welcome...
Thank you! And thanks for the suggestion, I will add it to my list
@@patloeber hey sir , great content on pytorch. when are you making vedio on lstm and autoencoders?
Excellent course! Danke!
glad you like it :)
The link in the description that says "More about Transfer Learning" is actually the link to the source code that Patrick copied and provided minimal revisions to (including deleting the comments specifying the original author and licensing info). The original code has a BSD license which requires attribution. Saying "more information here" in the description isn't the same as saying "source code from here". Please provide appropriate attribution in your videos and on your github.
Thanks for your teaching , normally when we transform train data then we use the same transformation on validation data , but here i am seeing train data transformation is transforms.RandomResizedCrop(224) and validation data has transforms.Resize(256). Same with Flip with train data but not on validation data, i am confused?
Hey, appreciate the tutorials, but you keep saying that you explained everything for the training loop in the previous tutorials. However I find it differs significantly with what you covered sofar. E.g. an explaination of what use it is to set the model in training or eval mode is completely missing. I am wondering what that is for, since we didnt need it in the previous tutorials.
Also the line with best_model_wts = copy.deepcopy(model.state_dict()) leaves a question mark with me. Could you clear that up ? :)
Great stuff, but after tutorial 14 about CNNs your code explanation got more chaotic/confusing. Until tutorial 14 the code was perfectly explained. I used to write the code alongside with you until tutorial 14 so that I can have some muscle memory and intuition about how to code NNs. Now I'm a little confused.
Thanks you so much !! so clean and clear , .. one request from my side please make some viedos on transfer learning in NLP .. please
thanks! I will
just wonder why the mean and std are np.array with three elements?
Thanks for sharing. Can you make a video on GANs
Thanks for the effort to put all these videos together, could you tell me where can I get the code of the tutorials ?
Have a nice one,
Aldo
Great Job!!! Thanks. Could you please upload LSTM with "Attention" using Pytorch. Keep it up!!!!
Thanks for the suggestion, I will consider this!
Nice Tutorial. It would be nice if you can explain transfer learning on imbalance datasheet using sampling method
Thanks for the suggestion!
Nice tutorial
Thank you!
Thanks for your videos!
Could you make a video, showing how to use nn.Block for building networks instead of nn.Module?
Thanks. I do not know nn.Block. Can you point me to the resource?
@@patloeber the approach is very similar to nn.Module colab.research.google.com/github/d2l-ai/d2l-en-colab/blob/master/chapter_natural-language-processing-pretraining/bert.ipynb#scrollTo=5uDSQyw9j7UC
@@prabaldutta1935 Thanks. I'll have a look at that. But this is not PyTorch code, this is mxnet: from mxnet.gluon import nn, and nn.Block...
dude i love you
thank you :)
I couldn't find any previous video in this playlist that discusses dividing the data into training and evaluation sets. Did I miss anything?
The normal way of doing something like this as far as I know is having *train,* *validation ,* and *test* sets. But here I see only train and val. I watch this video for several times but I still don't understand the val is stands for 'validation' or for 'evaluate'?
How to deal with tabular data and the transfer learning? I have a model that has been trained, and suddenly, the number of features will be increased, so now, how to deal with trained model and various head part of features (the first, dynamic layer of dense neural network) ? Thank you.
Very useful tutorial! Especially appreciate the explanation of how to keep the training from updating pre-trained weights. One question, the validation accuracy here is higher than the training accuracy. I used a different data set and also get a higher accuracy on my validation set. Then with two data sets, same thing. I don't see anything wrong with the code, but that's not the typical expectation no, or am I overlooking something?
Ah, I forgot that the transforms applied to the "train" dataset introduce more variability
Good observation!
Amazing tutorial. Could you also 'freeze' the earlier weights by simply only passing model.fc.parameters() to the optimizer?
Thanks for sharing. I'm coming from TensorFlow and are amazing how easy PyTorch is.
A doubt: when you are predicting, wouldn't be necessary to apply sigmoid on the outputs:
_, prediction = torch.max(F.sigmoid(outputs), 1)
Thanks for watching! Good question! It depends on the loss function, because some loss functions in PyTorch already apply an activation function like sigmoid or softmax. In this case nn.CrossEntropyLoss() applies the softmax, that's why we don't need another one. I think I explained this in tutorial #11 or some other one.
Thanks for your videos.
Can you make a video on "Distillation learning in Pytorch", which is somewhat similar to transfer learning.
It follows the Teacher - Student principle.
I will have a look into that
@@patloeber plus one for distillation learning!
really it is amazing tutorial ever !! thank you so much , i have one question ? i used pre-trained model for vision transformer model from timm and i finetuned the last layer for num_features and number of our classes m but when i run the code is take a lot of time than normal training!! i don't know what's the problem exactly?? can you help me?
3:39, that is more like a fly than a bee.
how you calculated mean and std? I have to transfer learning the alexnet for mnist data but how to calculate mean and std for that
In the code, shouldn't labels range from 0 to 1 instead of 1 to 2, and therefore following line be added after labels = labels.to(device): labels=labels-1
Why should it be 0 and 1? I think it won't make a difference in this example, you just need two different class labels
@@patloeber _, preds = torch.max(outputs, 1). doesn't this return preds values as 0 or 1 which we are using in comparing preds and labels in running_corrects += torch.sum(preds == labels.data) ? I was getting error on running the code which went away when I did labels=labels-1
where do you find the method with model eg model.fc or model.fc.in_features??
How do you do a single prediction?
Thank you so much for these tutorials. I have a question regarding the normalization of data. The mean and std you gave are the same for any use case, or they are model-dependent?
they are dataset dependent. It should use the mean snd std dev of the training dataset (In this video this was pre-calculated)
@@patloeber Thanks for the reply.
your code only allows to train 2 classes, how about multiple classes?
This is a great tutorial, i just need one clarification, instead of using cross entropy, can we use it as binary classification, how we can do dis with same folder structure
You can use sigmoid as last layer and then BCELoss, same as in logistic regression tutorial #8
@@patloeber thanks
Also, request you to please add video on imbalance class weighting for both binary and multi class classification problem
thanks for the suggestion!
If I use a Pretrained network, trained with a certain normalisation factors(mean and std), should I use those values or the values calculated with the method explained above from my dataset ?
the values the pretrained network uses
Hi, How can we plot train acc and loss and val acc and loss in your code?
Thanks
is there any option for audiofolder similar to imagefolder in pytorch?
How can I use pretrained weights like vgg16 in fcn architecture.
Is this correct?
self.conv1_1 = vgg16.features[0]
Please help
is there any option for audiofolder similar to imagefolder in pytorch? plz reply
Brilliant. Can i ask which text editor it is?
VS Code
VS Code
Why train loss is more than validation loss?
is it? i did not notice...but i guess that can happen during shuffled training...
are the "bee and ant images" a common dataset which we can download to follow your tutorial? Danke :)
Yes you can download it from pytorch official website: download.pytorch.org/tutorial/hymenoptera_data.zip
Hello, I had a few errors and have checked the code 2 times...
FIRST - I had to change the argument "scheduler" at the end to match step_lr_scheduler (not sure if that was correct, but it was not defined previously, so it was either than or change step_lr_scheduler to just "scheduler".) THEN...
SECOND - when i ran it after the above change, I got this error: (took out some of the path for privacy)
['ants', 'bees']
Epoch 0/1
----------
train Loss: 0.6369 Acc: 0.6311
Traceback (most recent call last):
File "/python_engineer/pytorch15-transfer.py", line 137, in
model = train_model(model, criterion, optimizer, scheduler=step_lr_scheduler, num_epochs=2)
File "/python_engineer/pytorch15-transfer.py", line 77, in train_model
for inputs, labels in dataloaders[phase]:
File "/.virtualenvs/torch38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
data = self._next_data()
File "/.virtualenvs/torch38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 403, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/.virtualenvs/torch38/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/.virtualenvs/torch38/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/.virtualenvs/torch38/lib/python3.8/site-packages/torchvision/datasets/folder.py", line 139, in __getitem__
sample = self.transform(sample)
File "/.virtualenvs/torch38/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 61, in __call__
img = t(img)
File "/.virtualenvs/torch38/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 212, in __call__
return F.normalize(tensor, self.mean, self.std, self.inplace)
File "/.virtualenvs/torch38/lib/python3.8/site-packages/torchvision/transforms/functional.py", line 280, in normalize
raise TypeError('tensor should be a torch tensor. Got {}.'.format(type(tensor)))
TypeError: tensor should be a torch tensor. Got .
Any ideas? I do not know how to fix this.
Did you compare the code with mine on GitHub? I guess for 2 you have to apply the toTensor tranform to get an actual tensor from the image
@@patloeber Thank you, that was the issue! Btw, I am having so much fun with PyTorch thanks in part to your amazing tutorials. Very clear, well constructed, and makes learning it easy and fun!
@@ShermanSitter happy to hear that :)
Cn someone help me with this, I tried this but some bugs are there in mu case. I was working for speech classification on speech spectrograms. they are two dimensional , I think resnet is trained for rgb mages (3 channels)? plz guide
Hi ,heck_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
use if _name_ == '__main__': and also try setting num_workers=0
Traceback (most recent call last):
File "15_transfer_learning.py", line 156, in
step_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
NameError: name 'optimizer_ft' is not defined
Please update your sample code, thank you. (should be optimizer, instead of optimizer_ft ?)
Thanks for the hint. I will update this. You can also open an issue on GitHub for such findings
Hi, is it necessary to have test set? As i see you created train and valid dataset.
Thanks
for training and parameter optimization you use train and validation set. the test set should be used afterwards for completely new and unseen data (for example when submitting a kaggle competition)
Why is the first one models.resnet18() and the second one torchvision.models.resnet18()? Is there a difference?
it's the same. I imported "from torchvision import models" so i could use it directly. I should have used it for both instances...
how to do resnet 34?
Big fan of this series, but this video in particular was a bit painful. Code is not explained enough even if you followed the previous videos in the series (especially the training loop is rather different, and normalization of the data is kinda glossed over), and some of the numbers are a bit magic (why do we update the scheduler every 7 epochs?) As others have pointed out, around this video and somewhat in the CNN video, the explanations started degrading a bit. I get you don't wanna re-explain everything that has already been shown, but there really is quite some things here that just appear out of nowhere. Just some (hopefully constructive) criticism.
You literally just copy pasted the tutorial from torch documentation ... you should at least mention that, cause this same example is really well documented on the site ...
Yeah it was very close to the docs article, I referenced it in the description. Should have mentioned it in the video, too...
Bro my GPU is out of memory how to clear GPU memory