Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

Поделиться
HTML-код
  • Опубликовано: 1 окт 2024

Комментарии • 223

  • @anonymousanon4822
    @anonymousanon4822 Год назад +32

    I found no explanation for this anywhere and when reading the paper missed the detail that each tokens positional encoding consists of multiple values (calculated by different sine functions). Your explanation and visual representation finally made me understand! Fourier transforms are genius and I'm amazed in how many different areas they show up.

  • @adi331
    @adi331 3 года назад +20

    +1 for more vids on positional encodings.

  • @Phenix66
    @Phenix66 3 года назад +47

    Great stuff :) Would love to see more of that, especially for images or geometry!

  • @ausumnviper
    @ausumnviper 3 года назад +5

    Great explanation !! And Yes Yes Yes.

  • @hannesstark5024
    @hannesstark5024 3 года назад +8

    + 1 for video on relative positional representations!

  • @sqripter256
    @sqripter256 11 месяцев назад +10

    This is the most intuitive explanation of the positional encoding I have come across. Everyone out there explain how to do it, even with code, but not the why which is more important.
    Keep this up. You have earned my subscription.

  • @syedmustahsan4888
    @syedmustahsan4888 23 дня назад +2

    Very Good explanation. Amazing.
    Thank You very much madam

  • @khursani8
    @khursani8 3 года назад +5

    Thanks for the explanation
    Interested to know about rotary position embedding

  • @yimingqu2403
    @yimingqu2403 3 года назад +11

    love how the "Attention is all you need" paper appears with an epic-like bgm

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +2

      It wasn't on purpose, but it is funny -- in hindsight 😅🤣

  • @PenguinMaths
    @PenguinMaths 3 года назад +6

    This is a great video! Just found your channel and glad I did, instantly subscribed :)

  • @ConsistentAsh
    @ConsistentAsh 3 года назад +6

    I was browsing through some channels after first stopping on Sean Cannells and I noticed your channel. You got a great little channel building up here. I decided to drop by and show some support. Keep up the great content and I hope you keep posting :)

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +3

      Thanks for passing by and for the comment! I appreciate it!

  • @nicohambauer
    @nicohambauer 3 года назад +6

    Sooo good!

  • @magnuspierrau2466
    @magnuspierrau2466 3 года назад +9

    Great explanation of the intuition of positional encodings used in the Transformer!

  • @hedgehog1962
    @hedgehog1962 2 года назад +2

    Really Thank you! Your video is just amazing!

  • @tonoid117
    @tonoid117 3 года назад +9

    What a great video, I'm studying my Ph.D. at NLU, so this came in very handy. Thank you very much and greetings from Ensenada Baja California Mexico :D!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +3

      Thanks, thanks for visiting from so far away! Greetings from Heidelberg Germany! 👋

  • @444haluk
    @444haluk 3 года назад +13

    This video is a clear explaination of why you shouldn't add your positional encoding but concat.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +6

      Extra dimensions dedicated exclusively to encode position! Sure, but only if you have some extra to share. 😅

    • @444haluk
      @444haluk 3 года назад +2

      @@AICoffeeBreak this method relocates the embeddings in a specific direction in the embeddings space, so that new position in the relevant embedding cluster have "another" meaning to (say there is another instance of the same word later) other words of "same kind". But that place should be reserved other semantics, else the space is literally filled with "second position" coffee and "tenth position" me, "third position" good etc etc. This can go wrong in soooo many ways. Don't get me wrong, I am a clear cut "Chinese Room Experiment" guy, I don't think you can translate "he is a good doctor" before imagining an iconic low resolution male doctor and recall a memory of satisfaction and admiration of consumatory reward, but again, the "he" in "he did again" and "man, he did it again" should literally have the same representation in the network to start discussing things.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +7

      You are entirely right. I was short in my comment because I commented on the same issue in Cristian Garcia's comment. But there is no way you would have seen it, so I will copy paste it here: 😅
      "Concatenating has the luxury of extra, exclusive dimensions dedicated to positional encoding with the upside of avoiding mixing up semantic and positional information. The downside is, you can afford those extra dimensions only if you have capacity to spare.
      So adding the positional embeddings to initial vector representations saves some capacity by using it for both semantic and positional information, but with the danger of mixing these up if there is no careful tuning on this (for tuning, think about the division by 10000 in the sine formula in "attention is all you need")."

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +6

      And you correctly read between the lines, because this was not explicitly mentioned in the video. In the video I explained what an act of balance it is between semantic and positional information, but you identified the solution: If adding them up causes such trouble, then... let's don't! 😂

    • @blasttrash
      @blasttrash 4 месяца назад +1

      @@AICoffeeBreak new to AI, but what do you mean by the word "capacity"? Do you mean RAM? Do you mean that if we concat positional encodings to original vector instead of adding, it will take up more RAM/memory and therefore make the training process slow?

  • @harshkumaragarwal8326
    @harshkumaragarwal8326 3 года назад +3

    great explanation :)

  • @conne637
    @conne637 3 года назад +2

    Great content! Can you do a video about Tabnet please? :)

  • @elinetshaaf75
    @elinetshaaf75 3 года назад +5

    great explanation of positional embeddings. Just what I need.

  • @markryan2475
    @markryan2475 3 года назад +5

    Great explanation - thanks very much for sharing this.

  • @helenacots1221
    @helenacots1221 Год назад +6

    amazing explanation!!! I have been looking for a clear explanation on how the positional encodings actually work and this really helped! thank you :)

  • @sharepix
    @sharepix 3 года назад +4

    Letitia's Explanation Is All You Need!

  • @erikgoldman
    @erikgoldman 2 года назад +2

    this helped me so much!! thank you!!!

  • @EpicGamer-ux1tu
    @EpicGamer-ux1tu 2 года назад +2

    Great video, many thanks!

  • @garisonhayne668
    @garisonhayne668 3 года назад +5

    Dang it, i learned something and my morning coffee isn't even finished.
    Its going to be one of *those* days.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +1

      Sound like a good day to me! 😅
      Whish you a fruitful day!

  • @justinwhite2725
    @justinwhite2725 3 года назад +5

    In another video I've seen, apparently it doesn't matter if positional embedding are learned or static. It seems as thiugh the rest of the model makes accurate deductions regardless.
    This is why I was not surprised that Fourier transforms seem to work nearly as well as self attention.

    • @meechos
      @meechos 2 года назад

      COuld you please elaborate using an example maybe?

  • @woddenhorse
    @woddenhorse 2 года назад +2

    Multi Dimensional Spurious Corelation Identifying Beast 🔥🔥
    That's what I am calling transformers from now on

  • @andyandurkar7814
    @andyandurkar7814 Год назад +2

    Just an amazing explanation ...

  • @DerPylz
    @DerPylz 3 года назад +6

    Thanks, as always, for the great explanation!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +2

      It was Ms. Coffee Bean's pleasure! 😅

  • @gopikrish999
    @gopikrish999 3 года назад +4

    Thank you for the explanation! Can you please make a video on Positional information in Gated Positional Self Attention in ConViT paper?

  • @ugurkap
    @ugurkap 3 года назад +6

    Explained really well, thank you 😊

  • @Cross-ai
    @Cross-ai 8 месяцев назад +2

    This is the best and most intuitive explanation of positional embeddings. THANKYOU so much for this video. btw: what software did you use to create these lovely animations?

    • @AICoffeeBreak
      @AICoffeeBreak  8 месяцев назад +2

      Thanks, glad you like it! . For everything but Ms. Coffee Bean, I use the good old Powerpoint (morph and redraw functionality FTW). The rest is Adobe Premiere or kdenlive (video editing software).

  • @karimedx
    @karimedx 3 года назад +3

    Nice explanation

  • @SyntharaPrime
    @SyntharaPrime 2 года назад +2

    Great explanation - It might be the best. I think I finally figured it out. I highly appreciate it.

  • @gauravchattree5273
    @gauravchattree5273 2 года назад +4

    Amazing content. After seeing this all the articles and research papers makes sense.

  • @WhatsAI
    @WhatsAI 3 года назад +7

    Super clear and amazing (as always) explanation of sines and cosines positional embeddings! 🙌

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +3

      Thanks! Always happy when you visit!

  • @kevon217
    @kevon217 Год назад +3

    Super intuitive explanation, nice!

  • @yyyang_
    @yyyang_ Год назад +5

    i've read numerous articles explaining the positional embedding so far.. however, it is surely the greatest & clearest ever

  • @matt96920
    @matt96920 2 года назад +4

    Excellent! Great work!

  • @oleschmitter55
    @oleschmitter55 11 месяцев назад +2

    So helpful! Thank you a lot!

  • @avneetchugh
    @avneetchugh Год назад +2

    Awesome, thanks!

  • @deepk889
    @deepk889 3 года назад +5

    I had my morning coffee with this and will make an habit!

  • @rahulchowdhury3722
    @rahulchowdhury3722 2 года назад +2

    You've solid understanding of Mathematics of Signal Processing

  • @saurabhramteke8511
    @saurabhramteke8511 2 года назад +2

    Hey, Great Explanation :). Love to see more videos.

  • @full-stackmachinelearning2385
    @full-stackmachinelearning2385 2 года назад +2

    BEST AI channel on RUclips!!!!!

  • @ColorfullHD
    @ColorfullHD 5 месяцев назад +1

    Lifesaver! Thank you for the explanation.

  • @Titu-z7u
    @Titu-z7u 3 года назад +5

    I feel lucky to have found your channel. Simply amazing ❤️

  • @timoose3960
    @timoose3960 3 года назад +4

    This was so insightful!

  • @CristianGarcia
    @CristianGarcia 3 года назад +9

    Thanks Letitia! A vid on relative positional embeddings would be nice 😃
    Implementations seems a bit involved so I've never used them in my toy examples.

    • @CristianGarcia
      @CristianGarcia 3 года назад +2

      Regarding this topic, I've seen positional embeddings sometimes being added and sometimes being concatenated with no real justification for either 😐

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +4

      Concatenating has the luxury of extra, exclusive dimensions dedicated to positional encoding with the upside of avoiding mixing up semantic and positional information. The downside is, you can have those extra dimensions only if you have capacity to spare.
      So adding the positional embeddings to initial vector representations saves some capacity by using it for both semantic and positional information with the danger of mixing these up if there is no careful tuning on this (for tuning, think about the division by 10000 in the sine formula in "attention is all you need").

  • @noorhassanwazir8133
    @noorhassanwazir8133 2 года назад +2

    Nice madam ...what a video !... outstanding

  • @omniscienceisdead8837
    @omniscienceisdead8837 2 года назад +2

    you are a genius!!

  • @gemini_537
    @gemini_537 6 месяцев назад +1

    Gemini: This video is about positional embeddings in transformers.
    The video starts with an explanation of why positional embeddings are important. Transformers are a type of neural network that has become very popular for machine learning tasks, especially when there is a lot of data to train on. However, transformers do not process information in the order that it is given. This can be a problem for tasks where the order of the data is important, such as language translation. Positional embeddings are a way of adding information about the order of the data to the transformer.
    The video then goes on to explain how positional embeddings work. Positional embeddings are vectors that are added to the input vectors of the transformer. These vectors encode the position of each element in the sequence. The way that positional embeddings are created is important. The embeddings need to be unique for each position, but they also need to be small enough that they do not overwhelm the signal from the original data.
    The video concludes by discussing some of the different ways that positional embeddings can be created. The most common way is to use sine and cosine functions. These functions can be used to create embeddings that are both unique and small. The video also mentions that there are other ways to create positional embeddings, and that these methods may be more appropriate for some types of data.░

  • @ai_station_fa
    @ai_station_fa 2 года назад +3

    Awesome. Thank you for making this great explanation. I highly appreciate it.

  • @jayktharwani9822
    @jayktharwani9822 Год назад +1

    great explanation. really loved it. Thank you

  • @jayk253
    @jayk253 Год назад +1

    Amazing explanation! Thank you so much !

  • @clementmichaud724
    @clementmichaud724 Год назад +1

    Very well explained! Thank you so much!

  • @20Stephanus
    @20Stephanus 2 года назад +2

    "A multi-dimensional, spurious correlation identifying beast..." ... wow. Douglas Adams would be proud of that.

  • @jayjiyani6641
    @jayjiyani6641 2 года назад +1

    Very intuitive. I know there is sine cosine positional encoding but it is actually effective that I got it here..👍👍

  • @klammer75
    @klammer75 Год назад +1

    This is an amazing explanation! Tku!!!🤓🥳🤩

  • @zhangkin7896
    @zhangkin7896 2 года назад +2

    Really great!

  • @smnvrs
    @smnvrs Год назад +1

    Thanks, your videos helped the most

  • @huonglarne
    @huonglarne 2 года назад +2

    This explanation is incredible

  • @nozimamurodova3574
    @nozimamurodova3574 3 года назад +3

    thanks for explanation. but one question, do we also need to use this positional encoding for other inputs, like, for security or anomaly detection datasets which consists of only observation states with corresponding timestamps?

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +4

      If the timestamps are already in the input vector, then no.
      But if you have the observation states only, you need to give the order (the timestamps) to the transformer model, otherwise it would be completely ignorant of the fact that the states it processes, are actually a sequence. :)

  • @preadaptation
    @preadaptation Год назад +2

    Thanks

  • @bdennyw1
    @bdennyw1 3 года назад +5

    Nice explanation! I’d love to hear more about multidimensional and learned position encodings

  • @aasthashukla7423
    @aasthashukla7423 11 месяцев назад +1

    Thanks Letitia, great explanation

  • @googlable
    @googlable Год назад +2

    Bro
    Where have you been hiding all this time?
    This is next level explaining

  • @parthvashisht9555
    @parthvashisht9555 18 дней назад +1

    I couldn't find a satisfying explanation anywhere. This video finally made me understand things in a bit more detail especially the use of sine and cosine functions across multiple dimensions.
    Thank you! You're awesome.

  • @kryogenica4759
    @kryogenica4759 2 года назад +3

    Make Ms. Coffee Bean spill the beans on positional embeddings for images

  • @Galinator9000
    @Galinator9000 2 года назад +2

    These videos are priceless, thank you!

  • @sborkes
    @sborkes 3 года назад +3

    I really enjoy your videos 😄!
    I would like a video about using transformers with time-series data.

  • @johannreiter1087
    @johannreiter1087 Год назад +1

    Awesome video, thanks :)

  • @DeepakKori-vn8zr
    @DeepakKori-vn8zr 3 месяца назад +1

    OMG, such a amazing video to explain Positional Embedding....

  • @ylazerson
    @ylazerson 2 года назад +2

    Just watched this again for a refresher; thee best video out there on the subject!

  • @tanmaybhayani
    @tanmaybhayani 4 месяца назад +1

    Amazing! This is the best explanation for positional encodings period. Subscribed!!

  • @maxvell77
    @maxvell77 3 месяца назад +1

    Most insightful explanation I have found on this subject so far. I was looking for it for days... Thank you! Keep going, you rock!

    • @AICoffeeBreak
      @AICoffeeBreak  3 месяца назад

      Thank you a lot! Also for the super thanks!

  • @alphabetadministrator
    @alphabetadministrator 5 месяцев назад +1

    Hi Letitia. Thank you so much for your wonderful video! Your explanations are more intuitive than almost anything else I've seen on the internet. Could you also do a video on how positional encoding works for images, specifically? I assume they are different from text because images do not have the sequential pattern text data have. Thanks!

    • @AICoffeeBreak
      @AICoffeeBreak  5 месяцев назад +1

      Thanks for the suggestion. I do not think I will come to do this in the next few months. But the idea of image position embeddings is that those representations are most often learned. The gist of it is to divide the image into patches, let's say 9. And then to number them from 1 to 9 (from the top-left to bottom right). Then let gradient descent learn better representations of these addresses.

  • @ylazerson
    @ylazerson 3 года назад +2

    amazing video - rockin!

  • @jiaxuanchen8652
    @jiaxuanchen8652 7 месяцев назад +1

    Thanks for the explanation! But why after we add position vector to the word vector, the position information is still preserved??

    • @AICoffeeBreak
      @AICoffeeBreak  7 месяцев назад +1

      It's complicated because it's super high dimensional. But on a high level, adding a vector to another vector, shifts its position in each dimension a bit. We add always the same vector to the first position, so the transformer learns that the first position is always shifted a bit around in that way. Maybe it is not recognisable in all dimensions all the time, but it has 768 dimensions to get the pattern. You should not underestimate neural networks that are correlation discovery machines.

  • @yusufani8
    @yusufani8 2 года назад +2

    Probably the clearest explanation for positional encoding:D

  • @pypypy4228
    @pypypy4228 5 месяцев назад +1

    This was awesome! I don't have a complete understanding but it definitely pushed me to the side of understanding. Did you make a video about relative positions?

    • @AICoffeeBreak
      @AICoffeeBreak  5 месяцев назад +2

      Yes, I did! ruclips.net/video/DwaBQbqh5aE/видео.html

  • @aterribleyoutuber9039
    @aterribleyoutuber9039 9 месяцев назад +1

    This was very intuitive, thank you very much! Needed this, please keep making videos

  • @amirhosseinramazani757
    @amirhosseinramazani757 2 года назад +3

    Your explanation was great! I got everything I wanted to know about positional embedding. thank you:)

    • @AICoffeeBreak
      @AICoffeeBreak  2 года назад +2

      Awesome, thanks for the visit! ☺️

  • @OleksandrAkimenko
    @OleksandrAkimenko Год назад +1

    so helpful, appreciate it!

  • @adeepak7
    @adeepak7 7 месяцев назад +1

    Very good explanation!! Thanks for this 🙏🙏

    • @AICoffeeBreak
      @AICoffeeBreak  7 месяцев назад +1

      Thank You for your wonderful message!

  • @BaronSpartan
    @BaronSpartan 7 месяцев назад +1

    I loved your simple and explicit explanation. You've earned a sub and like!

  • @kris2k10
    @kris2k10 7 месяцев назад +1

    before wathcing this I always used to be confused about position encoding.

    • @AICoffeeBreak
      @AICoffeeBreak  7 месяцев назад +1

      Wow, this is so great to hear!

  • @MaximoFernandezNunez
    @MaximoFernandezNunez Год назад +1

    I finally understand the positinal encoding! Thanks

  • @nitinkumarmittal4369
    @nitinkumarmittal4369 8 месяцев назад +1

    Loved your explanation, thank you for this video!

  • @johngrabner
    @johngrabner 2 года назад +1

    Miss Coffee Bean, excellent visualization. Is it also possible that gradient descent will cause some embeddings dimensions to become insignificant to allow position embedding to dominate these dimensions? if so, then adding position could be thought of as both shifting the representation for words like 'king' and/or forcing the representation of 'king' to fewer dimensions to allow for an orthogonal position or a combo of both. I always thought it was the latter, but have never seen a study on the subject.

  • @deepshiftlabs
    @deepshiftlabs 2 года назад +2

    Brilliant video. This was the best explanation of positional encodings I have seen. It helped a TON!!!

    • @deepshiftlabs
      @deepshiftlabs 2 года назад

      I also make AI videos. I am more into the image side(convolutions and pooling) so it was great to see more AI educators.

  • @montgomerygole6703
    @montgomerygole6703 Год назад +1

    Wow, thanks so much! This is so well explained!!

  • @bleacherz7503
    @bleacherz7503 9 месяцев назад +1

    Why are the words initially represented in as a vector of length 512 ?

    • @AICoffeeBreak
      @AICoffeeBreak  9 месяцев назад +1

      The vectors could be longer or shorter than that. 512 was just an example from the most common architecture back then (BERT base). The longer the vectors, the more parameters the model has. The larger the model, the better their performance, but also the slower their forward passes.

  • @arpitqw1
    @arpitqw1 Год назад +1

    thanks for such explanation. awesome u!

  • @antoniomajdandzic8462
    @antoniomajdandzic8462 3 года назад +2

    love your explanations !!!

  • @raoufkeskes7965
    @raoufkeskes7965 8 месяцев назад +2

    the most brilliant positional encoding explanation EVER that was GOD Level explanation

  • @yonahcitron226
    @yonahcitron226 Год назад +3

    amazing stuff! so clear and intuitive, exactly what I was looking for :)

    • @AICoffeeBreak
      @AICoffeeBreak  Год назад +2

      Thanks for watching and appreciating! 😊

  • @mbrochh82
    @mbrochh82 Год назад +3

    This is probably the best explanation of this topic on RUclips! Great work!

  • @xv0047
    @xv0047 Год назад +1

    Good explanation.

  • @lucaslee6412
    @lucaslee6412 4 месяца назад +1

    I love you coffee! You nice video sovled my problem!