Hey, this is just the vanilla implementation. However I believe you can enable flash attention while loading the model. I suggest you check huggingface documentation.
Hi thank you :) I only did the model for this. Maybe I do the full training in the future. If you want to try however I can give a tip. Train a BERT as usual (from huggingface maybe). Instead of importing it from a library, use this model. It should work.
Great, I hope that will be more "from Scratch" like this. Thanks very much!
there will be! thank you :)
I want to learn about this topic but haven't been able find good resources. This is great for me. Thanks 🙏🙏
thank you :)
Great video with great explanations!
thank youu :)
This is great for me👍🏻
Thank you🙏🏻
Thank you :)
Çok faydalı bir video olmuş teşekkürler
this is really helpful. thanks!
Thank you :)
you are genius brooo 🤩🤩🤩
Thank you!
😊 does this new implementation include flash attention?
Hey, this is just the vanilla implementation. However I believe you can enable flash attention while loading the model. I suggest you check huggingface documentation.
please make a video on implementing gpt2 using pytorch
I'll try to. Thank you for recommendation!
Hello ! great video thanks ! Is there a code where is model is trained ? like with loss and optimizer ?
Hi thank you :) I only did the model for this. Maybe I do the full training in the future. If you want to try however I can give a tip. Train a BERT as usual (from huggingface maybe). Instead of importing it from a library, use this model. It should work.