Self-Attention with Relative Position Representations - Paper explained
HTML-код
- Опубликовано: 2 июл 2024
- We help you wrap your head around relative positional embeddings as they were first introduced in the “Self-Attention with Relative Position Representations” paper.
➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....
Related videos:
📺 Positional embeddings explained: • Positional embeddings ...
📺 Concatenated, learned positional encodings: • Adding vs. concatenati...
📺 Transformer explained: • The Transformer neural...
Papers:
📄 Shaw, Peter, Jakob Uszkoreit, and Ashish Vaswani. "Self-Attention with Relative Position Representations." In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 464-468. 2018. arxiv.org/pdf/1803.02155.pdf
📄 Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. proceedings.neurips.cc/paper/...
💻 Implementation for Relative Position Embeddings: github.com/AliHaiderAhmad001/...
Outline:
00:00 Relative positional representations
02:15 How do they work?
07:59 Benefits of relative vs. absolute positional encodings
Music 🎵 : Holi Day Riddim - Konrad OldMoney
✍️ Arabic Subtitles by Ali Haidar Ahmad / ali-ahmad-0706a51bb .
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: / aicoffeebreak
Ko-fi: ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: / aicoffeebreak
Twitter: / aicoffeebreak
Reddit: / aicoffeebreak
RUclips: / aicoffeebreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research
Great series covering different kinds of positional encodings! Love it!
Such a good explanation! Thank you so much
So awesome explanation!! It really helped to understand this concept. thank you:)
Great explanations of important technical key points in an intuitive way!
Thank you so much! 😁
Thank you for pursuing this series! Love the NLP-related videos as I am not in the field of AI. Always well explained, seriously, props!
Please do more of them 🙌
Hey, thanks! But now ai am confused: What's AI is not in the field of AI? 😅
@@AICoffeeBreak I haven't worked with NLP tasks!
@@WhatsAI aaa, you mean "in the field of NLP". I think you typed AI instead of NLP by mistake. Unless you think that NLP == AI. 😅
Really a good job,I learn lot from this series of video. while could you please list some of the paper about relative position using in graph .very gratefull
Thank you very much for these videos! I like your style too :D
Thanks! 😁
👌
Oh how awesome. I've been thinking about positional encodings for images where I jabe broken the image into grids. I've been wondering exactly whether I should track both the x and y positionals or just treat it like an array and only have one dimension for all segments.
My hypothesis was that the neural net would figure it out either way.
And did they? Ah, you do not have any experimental results yet.
Plot twist: order does not matter anyway (I am half-joking and referring to those papers in NLP showing that language models care unexpectedly little about word order).
Reference: "Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little" by Sinha et al. 2021
arxiv.org/pdf/2104.06644.pdf
Thanks for the video, I have a general suggestion that I think will improve the quality of your videos.
it would be really helpful if you can provide simple numerical example in addition to the explanation of the video to make the viewer better understand the concept.
considering your talent in visualization it would not be really hard for you and will add a huge help for understanding
Great video as always. ❤
No RoPE video yet, right?
No, sorry. So many topics to cover, so little time.
Great!
Notification squad, where are you?
Quickest comment in the history of this channel. 😂 I pushed the "publish" button just a few seconds ago!
pleeease can you explain the Dual Aspect Collaborative Transformer
Can you please make video on sensor data with transformer and video on hybrid cnn transformers
I think I see an error in some slides.
The slide you show at ruclips.net/video/DwaBQbqh5aE/видео.html and several times earlier than that, seems to be wrong.
The token X3 column, seems to have a number pattern for positional embeddings, that doesn't match the patterns in the other columns. It seems it should be a31...a35 instead of a31,a12,a13,a14,a15.
Am I missing something?
I asked for this first long back but my comment not mentioned in the video😢
Anyway, great explanation as always🎉🥳
Sorry, I could not remember where you made that comment to screenshot it. I only went to the comments of the first video in the positional encoding series where I asked if people want to see relational position representations. But you are the reason I was motivated to do the whole encoding series, so thanks!
Hello! Awesome explanation!
I just got a small doubt (hope that someone can explain it).
So, self-attention is itself permutation-invariant unless you use positional encoding.
It makes sense that absolute positional encoding makes the self-attention mechanism permutation-variant. However, I couldn't figure out if the same happens with relative positional enccoding. Beacause, if the in relative positional encoding we only care about the distance between the tokens, shouldn't this make the self-attention mechanism permutation-invariant ?
So my question is: Does the use of relative positional encoding make the self-attention mechanism permutation-invariant (unlike if we use absolute positional encoding) ?
Thanks for this question, I'm happy I finally find some time to respond to this.
The short answer is: relative positional embeddings do not make / keep the transformer permutation invariant.
In other words, both absolute and relative positional embeddings make the transformer permutation variant.
Take for example a sentence of two tokens A and B. Both relative and absolute encodings assign a different value to the two positions. So exchanging A and B will assign them different vectors.
@@AICoffeeBreak Okay, thanks!!
They are invariant to isomorphisms of the graph. In a path digraph such as for sequences, there are no isomorphisms. However for cyclic path digraphs of K vertices there are K symmetries. For an undirected path graph, we would have two isomorphisms: forwards and backwards.
For a 2D lattice graph, I think mirroring is symmetrical but I’m not sure. This is assuming that you have an undirected graph.
Undirected implies, in terms of the notation in the video, that a_ij = w_{j-i} = w_{i-j} = a_ji.
I think there is typo in the matrix at 5:11
heello
Good, but you speak too fast
I think it's the perfect speed. You could slow down the video in the RUclips player settings yourself.