Oh how awesome. I've been thinking about positional encodings for images where I jabe broken the image into grids. I've been wondering exactly whether I should track both the x and y positionals or just treat it like an array and only have one dimension for all segments. My hypothesis was that the neural net would figure it out either way.
And did they? Ah, you do not have any experimental results yet. Plot twist: order does not matter anyway (I am half-joking and referring to those papers in NLP showing that language models care unexpectedly little about word order). Reference: "Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little" by Sinha et al. 2021 arxiv.org/pdf/2104.06644.pdf
Really a good job,I learn lot from this series of video. while could you please list some of the paper about relative position using in graph .very gratefull
Thank you for pursuing this series! Love the NLP-related videos as I am not in the field of AI. Always well explained, seriously, props! Please do more of them 🙌
Thanks for the video, I have a general suggestion that I think will improve the quality of your videos. it would be really helpful if you can provide simple numerical example in addition to the explanation of the video to make the viewer better understand the concept. considering your talent in visualization it would not be really hard for you and will add a huge help for understanding
Hello! Awesome explanation! I just got a small doubt (hope that someone can explain it). So, self-attention is itself permutation-invariant unless you use positional encoding. It makes sense that absolute positional encoding makes the self-attention mechanism permutation-variant. However, I couldn't figure out if the same happens with relative positional enccoding. Beacause, if the in relative positional encoding we only care about the distance between the tokens, shouldn't this make the self-attention mechanism permutation-invariant ? So my question is: Does the use of relative positional encoding make the self-attention mechanism permutation-invariant (unlike if we use absolute positional encoding) ?
Thanks for this question, I'm happy I finally find some time to respond to this. The short answer is: relative positional embeddings do not make / keep the transformer permutation invariant. In other words, both absolute and relative positional embeddings make the transformer permutation variant. Take for example a sentence of two tokens A and B. Both relative and absolute encodings assign a different value to the two positions. So exchanging A and B will assign them different vectors.
They are invariant to isomorphisms of the graph. In a path digraph such as for sequences, there are no isomorphisms. However for cyclic path digraphs of K vertices there are K symmetries. For an undirected path graph, we would have two isomorphisms: forwards and backwards.
I think I see an error in some slides. The slide you show at ruclips.net/video/DwaBQbqh5aE/видео.html and several times earlier than that, seems to be wrong. The token X3 column, seems to have a number pattern for positional embeddings, that doesn't match the patterns in the other columns. It seems it should be a31...a35 instead of a31,a12,a13,a14,a15. Am I missing something?
Sorry, I could not remember where you made that comment to screenshot it. I only went to the comments of the first video in the positional encoding series where I asked if people want to see relational position representations. But you are the reason I was motivated to do the whole encoding series, so thanks!
Great explanations of important technical key points in an intuitive way!
Thank you so much! 😁
Great series covering different kinds of positional encodings! Love it!
Such a good explanation! Thank you so much
Oh how awesome. I've been thinking about positional encodings for images where I jabe broken the image into grids. I've been wondering exactly whether I should track both the x and y positionals or just treat it like an array and only have one dimension for all segments.
My hypothesis was that the neural net would figure it out either way.
And did they? Ah, you do not have any experimental results yet.
Plot twist: order does not matter anyway (I am half-joking and referring to those papers in NLP showing that language models care unexpectedly little about word order).
Reference: "Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little" by Sinha et al. 2021
arxiv.org/pdf/2104.06644.pdf
This is incredibly useful. Thank you so much
You're very welcome!
Really a good job,I learn lot from this series of video. while could you please list some of the paper about relative position using in graph .very gratefull
Thank you for pursuing this series! Love the NLP-related videos as I am not in the field of AI. Always well explained, seriously, props!
Please do more of them 🙌
Hey, thanks! But now ai am confused: What's AI is not in the field of AI? 😅
@@AICoffeeBreak I haven't worked with NLP tasks!
@@WhatsAI aaa, you mean "in the field of NLP". I think you typed AI instead of NLP by mistake. Unless you think that NLP == AI. 😅
So awesome explanation!! It really helped to understand this concept. thank you:)
Thank you very much for these videos! I like your style too :D
Thanks! 😁
Great video as always. ❤
No RoPE video yet, right?
No, sorry. So many topics to cover, so little time.
Thanks for the video, I have a general suggestion that I think will improve the quality of your videos.
it would be really helpful if you can provide simple numerical example in addition to the explanation of the video to make the viewer better understand the concept.
considering your talent in visualization it would not be really hard for you and will add a huge help for understanding
Great!
Hello! Awesome explanation!
I just got a small doubt (hope that someone can explain it).
So, self-attention is itself permutation-invariant unless you use positional encoding.
It makes sense that absolute positional encoding makes the self-attention mechanism permutation-variant. However, I couldn't figure out if the same happens with relative positional enccoding. Beacause, if the in relative positional encoding we only care about the distance between the tokens, shouldn't this make the self-attention mechanism permutation-invariant ?
So my question is: Does the use of relative positional encoding make the self-attention mechanism permutation-invariant (unlike if we use absolute positional encoding) ?
Thanks for this question, I'm happy I finally find some time to respond to this.
The short answer is: relative positional embeddings do not make / keep the transformer permutation invariant.
In other words, both absolute and relative positional embeddings make the transformer permutation variant.
Take for example a sentence of two tokens A and B. Both relative and absolute encodings assign a different value to the two positions. So exchanging A and B will assign them different vectors.
@@AICoffeeBreak Okay, thanks!!
They are invariant to isomorphisms of the graph. In a path digraph such as for sequences, there are no isomorphisms. However for cyclic path digraphs of K vertices there are K symmetries. For an undirected path graph, we would have two isomorphisms: forwards and backwards.
For a 2D lattice graph, I think mirroring is symmetrical but I’m not sure. This is assuming that you have an undirected graph.
Undirected implies, in terms of the notation in the video, that a_ij = w_{j-i} = w_{i-j} = a_ji.
pleeease can you explain the Dual Aspect Collaborative Transformer
👌
Notification squad, where are you?
Quickest comment in the history of this channel. 😂 I pushed the "publish" button just a few seconds ago!
I think there is typo in the matrix at 5:11
Can you please make video on sensor data with transformer and video on hybrid cnn transformers
I think I see an error in some slides.
The slide you show at ruclips.net/video/DwaBQbqh5aE/видео.html and several times earlier than that, seems to be wrong.
The token X3 column, seems to have a number pattern for positional embeddings, that doesn't match the patterns in the other columns. It seems it should be a31...a35 instead of a31,a12,a13,a14,a15.
Am I missing something?
I asked for this first long back but my comment not mentioned in the video😢
Anyway, great explanation as always🎉🥳
Sorry, I could not remember where you made that comment to screenshot it. I only went to the comments of the first video in the positional encoding series where I asked if people want to see relational position representations. But you are the reason I was motivated to do the whole encoding series, so thanks!
🎉❤
heello
Good, but you speak too fast
I think it's the perfect speed. You could slow down the video in the RUclips player settings yourself.