Awesome, as always. Great to see how you restrict the visual elements to support your message, like the colors. Also, very helpful to see you adding the dimensions as inline comments. It makes following along code much easier. Kudos.
Legend video. Expecting two videos from your side sir one on BERT implementation and other one is how do you implement a research paper code just by seeing the GitHub code and research paper on our own. These two would be really helpful. Hope i can get that
@@mildlyoverfitted okay i tried it and weirdly increasing the number of encoder layers results in a higher minimum loss. super weird. i replaced the encoder with your decoder and it works perfectly.
Hmm, it seems like PyTorch and HuggingFace use different approximations. See paperswithcode.com/method/gelu . Anyway, it is a detail and maybe I shouldn't have bothered:)
Yeh, potentially! The problem with training models like GPT is that one is probably better off using existing frameworks (e.g. transformers) rather than writing everything yourself. So the format would have to change from "from scratch" to "how to use a 3rd party framework". There is nothing wrong about that of course. It is just that I don't think I could do a better job of explaining it than the official documentation:) But I will definitely consider making videos like these too:)
@@mildlyoverfitted I am thinking to train a GPT-like model. Thank you for your suggestion. I will consider using the transformers model to do it. And it will also be great if you could do a tutorial about it.
Awesome, Your explanation style is amazing. I understand it easily then ever. Thank you.
You're very welcome!
Awesome, as always.
Great to see how you restrict the visual elements to support your message, like the colors. Also, very helpful to see you adding the dimensions as inline comments. It makes following along code much easier. Kudos.
Thank you! Cheers!
I was looking for such tutorial for GPT last week but did not find a good one. Today your video just showed up, what a legend.
Thank you🙏
Another informative video! I am gonna need to watch it a couple of more times to digest all the knowledge. Thank you very much buddy.
Thank you!!!
Found this on Reddit. Seems like I was already a subscriber. I have stumbled upon a gold mine. I am still a beginner!!! These will be useful
Hope you will find it helpful:)
immense thanks for your explanation and useful resource links!!!!!!!
Glad you found it helpful!
Legend video. Expecting two videos from your side sir one on BERT implementation and other one is how do you implement a research paper code just by seeing the GitHub code and research paper on our own. These two would be really helpful. Hope i can get that
Thank you! Thank you for the suggestion:)
I first came to your git repo, then I found you here... Thanks for the explanation
Glad it was helpful!
Thanks a lot for this perfect (I want to emphasise) Perfect video! 👍
I am happy you found it helpful:)
This is awseome! Thank you!
You're very welcome!
What tool did you use for making these nice presentstion slides? It looks very clean.
Thanks! Inkscape:)
:-)) great video !
great video! thanks for sharing
Thanks for watching!
👏👏👏
you are just amazing 😊
Thanks, keep up the good work.
Thanks, will do!
Nice haircut:)great video!
why can’t we use the nn.Transformer Encoder and use the mask with that?
Great question:) I think we could! I just tried to implement as many things from scratch as possible to show what is "under the hood":)
@@mildlyoverfitted okay i tried it and weirdly increasing the number of encoder layers results in a higher minimum loss. super weird. i replaced the encoder with your decoder and it works perfectly.
Is the difference between HuggingFace gelu and PyTorch gelu due to the fact that the activation function is..... a Gaussian process?
Hmm, it seems like PyTorch and HuggingFace use different approximations. See paperswithcode.com/method/gelu . Anyway, it is a detail and maybe I shouldn't have bothered:)
Are you interested in making tutorial about data processing and training process of GPT model?
Yeh, potentially! The problem with training models like GPT is that one is probably better off using existing frameworks (e.g. transformers) rather than writing everything yourself. So the format would have to change from "from scratch" to "how to use a 3rd party framework". There is nothing wrong about that of course. It is just that I don't think I could do a better job of explaining it than the official documentation:) But I will definitely consider making videos like these too:)
@@mildlyoverfitted I am thinking to train a GPT-like model. Thank you for your suggestion. I will consider using the transformers model to do it. And it will also be great if you could do a tutorial about it.
BERT tutorial pls
BERT tutorial pls~