This is the best channel on this subject. You actually made the QKV concept click for me in 15 minutes when hours of videos could not. Thank you for not just regurgitating Attention is all you need content and actually reforming it into a simpler explanation.
I thought the multi-headed aspect of the algorithm was to allow for small variants in the word positions. I think Attention is all you need used a sin and cosine to have a wave function to model the word proximity in a sentence.
thanks you for the explaination of self dot product attention and connect it with the paper! It would be even better if you can tap into the backpropogation since you mentioned it in the selecting the vectors K, Q, V.
Great explanation! Could you pls help explain how tensor q, k, v values are initiated? I've checked more than 5 videos and none of them have explained where q, k, v values come from... Thank you!
The video is fantastic! By the way, based on the beginning, I assumed the you intended to explain how context-aware embedding can be generated and then proceed to explain Q, K, and V. However, it seems that the embedding part was missed, and you moved straight to the Q, K, and V concept. Do you have any video for that?
this is probably the clearest explanation I have seen on QKV. Thanks!
This is the best channel on this subject. You actually made the QKV concept click for me in 15 minutes when hours of videos could not. Thank you for not just regurgitating Attention is all you need content and actually reforming it into a simpler explanation.
One of the best explanations on RUclips.
Wow, thanks!
the video provides an extremely clear picture of QKV in self-attention
Incredibly excellent explanation. Really good. Thank you very much.
Glad it was helpful!
best explanation i can find on the internet. thank you!
you are born to teach!! thank you
Wow, thank you!
I thought the multi-headed aspect of the algorithm was to allow for small variants in the word positions. I think Attention is all you need used a sin and cosine to have a wave function to model the word proximity in a sentence.
Excellent explanation. Thanks a lot.
Thanks for the great explanation!
thanks you for the explaination of self dot product attention and connect it with the paper! It would be even better if you can tap into the backpropogation since you mentioned it in the selecting the vectors K, Q, V.
Excelent video! Thanks a lot!
You are amazing Sir!!
you did a good job! Sir , and Thank you.
Glad to help
Thanks a lot for your explanation.
Great explanation! Could you pls help explain how tensor q, k, v values are initiated? I've checked more than 5 videos and none of them have explained where q, k, v values come from... Thank you!
The video is fantastic! By the way, based on the beginning, I assumed the you intended to explain how context-aware embedding can be generated and then proceed to explain Q, K, and V. However, it seems that the embedding part was missed, and you moved straight to the Q, K, and V concept. Do you have any video for that?
Brilliant!
❤❤❤
Thanks
You are welcome.
can you give data, number sample of how Q, K, V get calculated for text like "Welcome how are you"?