Self-Attention with Relative Position Representations - Paper explained

Поделиться
HTML-код
  • Опубликовано: 13 янв 2025

Комментарии • 41

  • @hkazami
    @hkazami Год назад +2

    Great explanations of important technical key points in an intuitive way!

  • @nicohambauer
    @nicohambauer 3 года назад +7

    Great series covering different kinds of positional encodings! Love it!

  • @amphivalo
    @amphivalo 2 года назад +3

    Such a good explanation! Thank you so much

  • @justinwhite2725
    @justinwhite2725 3 года назад +5

    Oh how awesome. I've been thinking about positional encodings for images where I jabe broken the image into grids. I've been wondering exactly whether I should track both the x and y positionals or just treat it like an array and only have one dimension for all segments.
    My hypothesis was that the neural net would figure it out either way.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +4

      And did they? Ah, you do not have any experimental results yet.
      Plot twist: order does not matter anyway (I am half-joking and referring to those papers in NLP showing that language models care unexpectedly little about word order).
      Reference: "Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little" by Sinha et al. 2021
      arxiv.org/pdf/2104.06644.pdf

  • @subhadipnandi294
    @subhadipnandi294 6 месяцев назад +1

    This is incredibly useful. Thank you so much

  • @jqd3589
    @jqd3589 2 года назад +2

    Really a good job,I learn lot from this series of video. while could you please list some of the paper about relative position using in graph .very gratefull

  • @WhatsAI
    @WhatsAI 3 года назад +9

    Thank you for pursuing this series! Love the NLP-related videos as I am not in the field of AI. Always well explained, seriously, props!
    Please do more of them 🙌

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +1

      Hey, thanks! But now ai am confused: What's AI is not in the field of AI? 😅

    • @WhatsAI
      @WhatsAI 3 года назад +2

      @@AICoffeeBreak I haven't worked with NLP tasks!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +1

      @@WhatsAI aaa, you mean "in the field of NLP". I think you typed AI instead of NLP by mistake. Unless you think that NLP == AI. 😅

  • @user_2439
    @user_2439 2 года назад +1

    So awesome explanation!! It really helped to understand this concept. thank you:)

  • @JuliusUnscripted
    @JuliusUnscripted 2 года назад +3

    Thank you very much for these videos! I like your style too :D

  • @mkamp
    @mkamp Год назад +2

    Great video as always. ❤
    No RoPE video yet, right?

    • @AICoffeeBreak
      @AICoffeeBreak  Год назад +3

      No, sorry. So many topics to cover, so little time.

  • @seyeeet8063
    @seyeeet8063 3 года назад +4

    Thanks for the video, I have a general suggestion that I think will improve the quality of your videos.
    it would be really helpful if you can provide simple numerical example in addition to the explanation of the video to make the viewer better understand the concept.
    considering your talent in visualization it would not be really hard for you and will add a huge help for understanding

  • @AliAhmad-vm2pk
    @AliAhmad-vm2pk Год назад +1

    Great!

  • @diogoaraujo495
    @diogoaraujo495 2 года назад +2

    Hello! Awesome explanation!
    I just got a small doubt (hope that someone can explain it).
    So, self-attention is itself permutation-invariant unless you use positional encoding.
    It makes sense that absolute positional encoding makes the self-attention mechanism permutation-variant. However, I couldn't figure out if the same happens with relative positional enccoding. Beacause, if the in relative positional encoding we only care about the distance between the tokens, shouldn't this make the self-attention mechanism permutation-invariant ?
    So my question is: Does the use of relative positional encoding make the self-attention mechanism permutation-invariant (unlike if we use absolute positional encoding) ?

    • @AICoffeeBreak
      @AICoffeeBreak  2 года назад +1

      Thanks for this question, I'm happy I finally find some time to respond to this.
      The short answer is: relative positional embeddings do not make / keep the transformer permutation invariant.
      In other words, both absolute and relative positional embeddings make the transformer permutation variant.
      Take for example a sentence of two tokens A and B. Both relative and absolute encodings assign a different value to the two positions. So exchanging A and B will assign them different vectors.

    • @diogoaraujo495
      @diogoaraujo495 Год назад +1

      @@AICoffeeBreak Okay, thanks!!

    • @ludvigericson6930
      @ludvigericson6930 Год назад

      They are invariant to isomorphisms of the graph. In a path digraph such as for sequences, there are no isomorphisms. However for cyclic path digraphs of K vertices there are K symmetries. For an undirected path graph, we would have two isomorphisms: forwards and backwards.

    • @ludvigericson6930
      @ludvigericson6930 Год назад

      For a 2D lattice graph, I think mirroring is symmetrical but I’m not sure. This is assuming that you have an undirected graph.

    • @ludvigericson6930
      @ludvigericson6930 Год назад

      Undirected implies, in terms of the notation in the video, that a_ij = w_{j-i} = w_{i-j} = a_ji.

  • @hassenhadj7665
    @hassenhadj7665 2 года назад +2

    pleeease can you explain the Dual Aspect Collaborative Transformer

  • @hannesstark5024
    @hannesstark5024 3 года назад +6

    👌

  • @ambujmittal6824
    @ambujmittal6824 3 года назад +6

    Notification squad, where are you?

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +5

      Quickest comment in the history of this channel. 😂 I pushed the "publish" button just a few seconds ago!

  • @neilteng4161
    @neilteng4161 Год назад

    I think there is typo in the matrix at 5:11

  • @gouthaamiramesh714
    @gouthaamiramesh714 2 года назад

    Can you please make video on sensor data with transformer and video on hybrid cnn transformers

  • @ScottzPlaylists
    @ScottzPlaylists Год назад

    I think I see an error in some slides.
    The slide you show at ruclips.net/video/DwaBQbqh5aE/видео.html and several times earlier than that, seems to be wrong.
    The token X3 column, seems to have a number pattern for positional embeddings, that doesn't match the patterns in the other columns. It seems it should be a31...a35 instead of a31,a12,a13,a14,a15.
    Am I missing something?

  • @DANstudiosable
    @DANstudiosable 3 года назад +4

    I asked for this first long back but my comment not mentioned in the video😢
    Anyway, great explanation as always🎉🥳

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +5

      Sorry, I could not remember where you made that comment to screenshot it. I only went to the comments of the first video in the positional encoding series where I asked if people want to see relational position representations. But you are the reason I was motivated to do the whole encoding series, so thanks!

  • @wilfredomartel7781
    @wilfredomartel7781 4 месяца назад +1

    🎉❤

  • @wibulord926
    @wibulord926 2 года назад +2

    heello

  • @nurullahates4585
    @nurullahates4585 2 года назад

    Good, but you speak too fast

    • @JuliusUnscripted
      @JuliusUnscripted 2 года назад +6

      I think it's the perfect speed. You could slow down the video in the RUclips player settings yourself.