Converting a list of layers to a Sequential module is not always best. For example, suppose you want to add positional embeddings to each intermediate output. Can't do that with Sequential. But you can use a ModuleList as if it's a vanilla Python list, and the parameters will be visible to both the optimizer and the GPU.
I would run into the need to use gc and clear the cache when running pytorch lightning. If allowed to stop on its own, lightning would clean up after itself, but at least in preliminary runs, you'd see things were going south or want to stop iterating for some other reason, and breaking the iteration loop in lightning before it completed its iterations would leave all kinds of garbage (the still-loaded models, etc.) on the GPUs, as you could easily test by looking at nvidia-smi - the memory would still be in use, even though lightning was supposed to release it. I can't remember if I used your exactly your lines, but I ran something to clean up the GPUs. It convinced me (along with the relative inflexibility of lightning) to stop using lightning. I don't know if the lightning devs have cleaned this up in the intervening 4 years, but it's something to keep in mind.
I am a PhD student focusing on machine learning and I can assure you these are amazing starter tips - sometimes I even forget occasionally the .eval one xD
Thanks for the video ! For the 6th trick, do you know if replacing the model's instance frees up memory or does it accumulate? For example, if I first train a model using the instance name 'example_model' and then train another model with the same name, in terms of memory will I be accumulating the 2 models' space? Or just the last? Thanks!
Not 100% sure, but I believe that if you reassign the value of the model, then that should delete the model according to Python's standard garbage collection implementation. However, if you have any separate references to it, those would also need to be deleted. Once there are no references left, it should be put in the garbage collection queue, and will then be deleted at some point in the future when the space is needed.
please use a bigger font in the video. code is taking oly 1/3 of the screen width, so there is empty space for bigger font, for us TV tubers and couch programmers
You can also initialize Sequential layer with an OrderedDict if you want to access the layers with names instead of list indices.
Converting a list of layers to a Sequential module is not always best. For example, suppose you want to add positional embeddings to each intermediate output. Can't do that with Sequential. But you can use a ModuleList as if it's a vanilla Python list, and the parameters will be visible to both the optimizer and the GPU.
Another tip: use torch.no_grad() or torch.inference_mode() at evaluation time
I would run into the need to use gc and clear the cache when running pytorch lightning. If allowed to stop on its own, lightning would clean up after itself, but at least in preliminary runs, you'd see things were going south or want to stop iterating for some other reason, and breaking the iteration loop in lightning before it completed its iterations would leave all kinds of garbage (the still-loaded models, etc.) on the GPUs, as you could easily test by looking at nvidia-smi - the memory would still be in use, even though lightning was supposed to release it. I can't remember if I used your exactly your lines, but I ran something to clean up the GPUs. It convinced me (along with the relative inflexibility of lightning) to stop using lightning. I don't know if the lightning devs have cleaned this up in the intervening 4 years, but it's something to keep in mind.
Amazing video! I didn't knew adding device='cuda' can make such difference!
3rd tip: use torch.nn.ModuleList instead of list
Thank you very much, Edan. Hugs from Brazil!
5:24 essential issue! Thank you for sharing.
I am a PhD student focusing on machine learning and I can assure you these are amazing starter tips - sometimes I even forget occasionally the .eval one xD
Great video! I like to find out what other PyTorch users think about, and these are some helpful "best practices"
Thank you for the video. It could have been better if the display were zoomed in a bit more.
Thanks for the video ! For the 6th trick, do you know if replacing the model's instance frees up memory or does it accumulate? For example, if I first train a model using the instance name 'example_model' and then train another model with the same name, in terms of memory will I be accumulating the 2 models' space? Or just the last? Thanks!
Not 100% sure, but I believe that if you reassign the value of the model, then that should delete the model according to Python's standard garbage collection implementation. However, if you have any separate references to it, those would also need to be deleted. Once there are no references left, it should be put in the garbage collection queue, and will then be deleted at some point in the future when the space is needed.
Ok, thanks for the reply!
The test time of the first case is obviously not correct since you do not synchronize before measuring time.
Is Pytorch as good as TF for high performance?
Why deploying in PyTorch is not the best option?
Great lectures.
Great video. Keep making videos like this, please
Great tips. Thanks.
Tip number one and already got my like 🤯
Very cool, thanks :)
i dont get coding but this make me want to understand it
Rock solid
super useful
Use pytorch lighting you will avoid 70% of your mistakes
nice
Subscribed...
please use a bigger font in the video. code is taking oly 1/3 of the screen width, so there is empty space for bigger font, for us TV tubers and couch programmers
Actually instead of the list it's better to use nn.ModuleList().