00:03 PyTorch buffers are essential for implementing large models 01:39 Instantiating a new causal attention without buffers 03:12 Transferring data to GPU using PyTorch Cuda 04:56 Optimizing memory usage during forward pass 06:36 Explanation of creating mask for efficiency in PyTorch Buffers 08:07 Parameters are automatically transferred to GPU, but torch tensors need to be made parameters to be transferred. 10:05 The mask is made a buffer so it's not learned by the optimizer. 11:50 PyTorch buffers facilitate easy transfer of parameters between GPU and CPU
I always see this register buffer code in transformer network and never though of the reason would be so simple. Thanks for explaining such ignored concept of pytprch.
Hi, Sebastian I really respect what you are doing. I like your github repository - there are a lot of helpful tutorials I'm going to buy your next book - Build a Large Language Model (From Scratch). i have one question. What minimal gpu do you recommend to have to explore and do all examples from your next book?
Thanks @raiszakirdzhanov2148! Actually, you don't need anything powerful -- I made sure all the examples run on minimal hardware. The other day, there was a reader who got it to work on an RTX3060 Laptop GPU with ~6GB of RAM (by decreasing the batch size). That being said, for some chapters, if you don't have a GPU, I would recommend an A10G or L4 GPU, which cost around 50 cents / hour on a cloud platform. I have some recommendations here: github.com/rasbt/LLMs-from-scratch/tree/main/setup#cloud-resources
@@ricardogomes9528 My "Machine Learning with PyTorch and Scikit-Learn" books perhaps: www.amazon.com/Machine-Learning-PyTorch-Scikit-Learn-scikit-learn-ebook-dp-B09NW48MR1/dp/B09NW48MR1/
@putskan This is useful feedback! I also often wish the videos would be more concise, but it's hard to know how long they actually are until the recording is finished, and then it's already too late 😅
Vielen Dank für die tollen Videos Sebastian! Freue mich bereits, wenn dein Buch auch im deutschsprachigen Raum verfügbar ist.
Vielen Dank fuer das nette Feedback! Es waere toll falls es eine Uebersetzung geben wird!
00:03 PyTorch buffers are essential for implementing large models
01:39 Instantiating a new causal attention without buffers
03:12 Transferring data to GPU using PyTorch Cuda
04:56 Optimizing memory usage during forward pass
06:36 Explanation of creating mask for efficiency in PyTorch Buffers
08:07 Parameters are automatically transferred to GPU, but torch tensors need to be made parameters to be transferred.
10:05 The mask is made a buffer so it's not learned by the optimizer.
11:50 PyTorch buffers facilitate easy transfer of parameters between GPU and CPU
Your video was incredibly clear and engaging! Thank you for the awesome explanation!
That's awesome to hear! Glad it was clear and helpful!
Thanks Sebastian! I got what is buffer for. Great lecture.
I recently purchased llm from scratch from Manning. Amazing learning experience till now
Thanks for getting a copy. And I’m really happy to hear that you are getting lots out of the book :)
@@SebastianRaschka is the book released or is it just the pre-order?
@@2dapoint424 Currently preorder but the publisher is currently wrapping up the layouting, so it shouldn't be too long...
Can I purchase it this point of time from Manning directly , plz let me know I am eager to purchase it
@@mainakkundu2103 Yes you could! 😊
I always see this register buffer code in transformer network and never though of the reason would be so simple. Thanks for explaining such ignored concept of pytprch.
Great, I'm glad to hear that I was able to finally shed some light on this 😊
Great Work! I like your LLM notebooks as well!
The man is back more videos please ❤
Thank you very much for this explanation.
Glad to hear it was useful!
Thanks for explaining
Back to basics. Love it. ❤
Actually learned something new. Thanks Sebastian!
Wow thanks @andrei_aksionau! The fact that even you as a PyTorch expert learned something new is probably the biggest compliment 😊
Hi, Sebastian
I really respect what you are doing. I like your github repository - there are a lot of helpful tutorials
I'm going to buy your next book - Build a Large Language Model (From Scratch).
i have one question. What minimal gpu do you recommend to have to explore and do all examples from your next book?
Thanks @raiszakirdzhanov2148! Actually, you don't need anything powerful -- I made sure all the examples run on minimal hardware. The other day, there was a reader who got it to work on an RTX3060 Laptop GPU with ~6GB of RAM (by decreasing the batch size). That being said, for some chapters, if you don't have a GPU, I would recommend an A10G or L4 GPU, which cost around 50 cents / hour on a cloud platform. I have some recommendations here: github.com/rasbt/LLMs-from-scratch/tree/main/setup#cloud-resources
@@SebastianRaschka thanks a lot!)
Another advantage is that the buffer gets saved in the state_dict when saving the model
Yes good point! In this case, if you'd modify the mask during the usage, then this would be super useful.
@kevindelnoye9641 thanks again for the suggestion, I added a section on this to the code notebook
@@SebastianRaschka great! Thanks for the great tutorials, keep them coming
Awesome tutorial🔥
Thanks!
More videos please
Great content...thanks a lot
Very useful tip 💪💪
Thanks, glad to hear!
@@SebastianRaschka do you have any book on pytorch coding that would somehow resemble “Deep Learning with Python” from François Chollet?
@@ricardogomes9528 My "Machine Learning with PyTorch and Scikit-Learn" books perhaps: www.amazon.com/Machine-Learning-PyTorch-Scikit-Learn-scikit-learn-ebook-dp-B09NW48MR1/dp/B09NW48MR1/
@@SebastianRaschka thank you for your prompt reply. Hope I can master it 🙏 keep up with the good videos 💪🙏
It's indeed a clean way to do things but Can't we do the same-thing by adding them as parameter and setting up .requires_grad = False ?
This might achieve the same thing, but at the same time, it would also be a bit more work 😅
Cheers, great video. I'd suggest being slightly more concise. Either way, great video.
@putskan This is useful feedback! I also often wish the videos would be more concise, but it's hard to know how long they actually are until the recording is finished, and then it's already too late 😅