Self-Attention with Relative Position Representations - Paper explained

Поделиться
HTML-код
  • Опубликовано: 2 июл 2024
  • We help you wrap your head around relative positional embeddings as they were first introduced in the “Self-Attention with Relative Position Representations” paper.
    ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....
    Related videos:
    📺 Positional embeddings explained: • Positional embeddings ...
    📺 Concatenated, learned positional encodings: • Adding vs. concatenati...
    📺 Transformer explained: • The Transformer neural...
    Papers:
    📄 Shaw, Peter, Jakob Uszkoreit, and Ashish Vaswani. "Self-Attention with Relative Position Representations." In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 464-468. 2018. arxiv.org/pdf/1803.02155.pdf
    📄 Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. proceedings.neurips.cc/paper/...
    💻 Implementation for Relative Position Embeddings: github.com/AliHaiderAhmad001/...
    Outline:
    00:00 Relative positional representations
    02:15 How do they work?
    07:59 Benefits of relative vs. absolute positional encodings
    Music 🎵 : Holi Day Riddim - Konrad OldMoney
    ✍️ Arabic Subtitles by Ali Haidar Ahmad / ali-ahmad-0706a51bb .
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
    Patreon: / aicoffeebreak
    Ko-fi: ko-fi.com/aicoffeebreak
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    🔗 Links:
    AICoffeeBreakQuiz: / aicoffeebreak
    Twitter: / aicoffeebreak
    Reddit: / aicoffeebreak
    RUclips: / aicoffeebreak
    #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​

Комментарии • 37

  • @nicohambauer
    @nicohambauer 2 года назад +7

    Great series covering different kinds of positional encodings! Love it!

  • @amphivalo
    @amphivalo 2 года назад +3

    Such a good explanation! Thank you so much

  • @user_2439
    @user_2439 Год назад +1

    So awesome explanation!! It really helped to understand this concept. thank you:)

  • @hkazami
    @hkazami 10 месяцев назад +2

    Great explanations of important technical key points in an intuitive way!

  • @WhatsAI
    @WhatsAI 2 года назад +9

    Thank you for pursuing this series! Love the NLP-related videos as I am not in the field of AI. Always well explained, seriously, props!
    Please do more of them 🙌

    • @AICoffeeBreak
      @AICoffeeBreak  2 года назад +1

      Hey, thanks! But now ai am confused: What's AI is not in the field of AI? 😅

    • @WhatsAI
      @WhatsAI 2 года назад +2

      @@AICoffeeBreak I haven't worked with NLP tasks!

    • @AICoffeeBreak
      @AICoffeeBreak  2 года назад +1

      @@WhatsAI aaa, you mean "in the field of NLP". I think you typed AI instead of NLP by mistake. Unless you think that NLP == AI. 😅

  • @jqd3589
    @jqd3589 2 года назад +2

    Really a good job,I learn lot from this series of video. while could you please list some of the paper about relative position using in graph .very gratefull

  • @SuilujChannel
    @SuilujChannel Год назад +3

    Thank you very much for these videos! I like your style too :D

  • @hannesstark5024
    @hannesstark5024 2 года назад +6

    👌

  • @justinwhite2725
    @justinwhite2725 2 года назад +5

    Oh how awesome. I've been thinking about positional encodings for images where I jabe broken the image into grids. I've been wondering exactly whether I should track both the x and y positionals or just treat it like an array and only have one dimension for all segments.
    My hypothesis was that the neural net would figure it out either way.

    • @AICoffeeBreak
      @AICoffeeBreak  2 года назад +4

      And did they? Ah, you do not have any experimental results yet.
      Plot twist: order does not matter anyway (I am half-joking and referring to those papers in NLP showing that language models care unexpectedly little about word order).
      Reference: "Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little" by Sinha et al. 2021
      arxiv.org/pdf/2104.06644.pdf

  • @seyeeet8063
    @seyeeet8063 2 года назад +4

    Thanks for the video, I have a general suggestion that I think will improve the quality of your videos.
    it would be really helpful if you can provide simple numerical example in addition to the explanation of the video to make the viewer better understand the concept.
    considering your talent in visualization it would not be really hard for you and will add a huge help for understanding

  • @mkamp
    @mkamp 8 месяцев назад +2

    Great video as always. ❤
    No RoPE video yet, right?

    • @AICoffeeBreak
      @AICoffeeBreak  8 месяцев назад +3

      No, sorry. So many topics to cover, so little time.

  • @AliAhmad-vm2pk
    @AliAhmad-vm2pk 6 месяцев назад +1

    Great!

  • @ambujmittal6824
    @ambujmittal6824 2 года назад +6

    Notification squad, where are you?

    • @AICoffeeBreak
      @AICoffeeBreak  2 года назад +5

      Quickest comment in the history of this channel. 😂 I pushed the "publish" button just a few seconds ago!

  • @hassenhadj7665
    @hassenhadj7665 2 года назад +2

    pleeease can you explain the Dual Aspect Collaborative Transformer

  • @gouthaamiramesh714
    @gouthaamiramesh714 Год назад

    Can you please make video on sensor data with transformer and video on hybrid cnn transformers

  • @ScottzPlaylists
    @ScottzPlaylists 8 месяцев назад

    I think I see an error in some slides.
    The slide you show at ruclips.net/video/DwaBQbqh5aE/видео.html and several times earlier than that, seems to be wrong.
    The token X3 column, seems to have a number pattern for positional embeddings, that doesn't match the patterns in the other columns. It seems it should be a31...a35 instead of a31,a12,a13,a14,a15.
    Am I missing something?

  • @DANstudiosable
    @DANstudiosable 2 года назад +4

    I asked for this first long back but my comment not mentioned in the video😢
    Anyway, great explanation as always🎉🥳

    • @AICoffeeBreak
      @AICoffeeBreak  2 года назад +5

      Sorry, I could not remember where you made that comment to screenshot it. I only went to the comments of the first video in the positional encoding series where I asked if people want to see relational position representations. But you are the reason I was motivated to do the whole encoding series, so thanks!

  • @diogoaraujo495
    @diogoaraujo495 Год назад +2

    Hello! Awesome explanation!
    I just got a small doubt (hope that someone can explain it).
    So, self-attention is itself permutation-invariant unless you use positional encoding.
    It makes sense that absolute positional encoding makes the self-attention mechanism permutation-variant. However, I couldn't figure out if the same happens with relative positional enccoding. Beacause, if the in relative positional encoding we only care about the distance between the tokens, shouldn't this make the self-attention mechanism permutation-invariant ?
    So my question is: Does the use of relative positional encoding make the self-attention mechanism permutation-invariant (unlike if we use absolute positional encoding) ?

    • @AICoffeeBreak
      @AICoffeeBreak  Год назад +1

      Thanks for this question, I'm happy I finally find some time to respond to this.
      The short answer is: relative positional embeddings do not make / keep the transformer permutation invariant.
      In other words, both absolute and relative positional embeddings make the transformer permutation variant.
      Take for example a sentence of two tokens A and B. Both relative and absolute encodings assign a different value to the two positions. So exchanging A and B will assign them different vectors.

    • @diogoaraujo495
      @diogoaraujo495 Год назад +1

      @@AICoffeeBreak Okay, thanks!!

    • @ludvigericson6930
      @ludvigericson6930 Год назад

      They are invariant to isomorphisms of the graph. In a path digraph such as for sequences, there are no isomorphisms. However for cyclic path digraphs of K vertices there are K symmetries. For an undirected path graph, we would have two isomorphisms: forwards and backwards.

    • @ludvigericson6930
      @ludvigericson6930 Год назад

      For a 2D lattice graph, I think mirroring is symmetrical but I’m not sure. This is assuming that you have an undirected graph.

    • @ludvigericson6930
      @ludvigericson6930 Год назад

      Undirected implies, in terms of the notation in the video, that a_ij = w_{j-i} = w_{i-j} = a_ji.

  • @neilteng4161
    @neilteng4161 Год назад

    I think there is typo in the matrix at 5:11

  • @wibulord926
    @wibulord926 2 года назад +2

    heello

  • @nurullahates4585
    @nurullahates4585 2 года назад

    Good, but you speak too fast

    • @SuilujChannel
      @SuilujChannel Год назад +6

      I think it's the perfect speed. You could slow down the video in the RUclips player settings yourself.