The Attention Mechanism in Large Language Models

Поделиться
HTML-код
  • Опубликовано: 20 сен 2024
  • Attention mechanisms are crucial to the huge boom LLMs have recently had.
    In this video you'll see a friendly pictorial explanation of how attention mechanisms work in Large Language Models.
    This is the first of a series of three videos on Transformer models.
    Video 1: The attention mechanism in high level (this one)
    Video 2: The attention mechanism with math: • The math behind Attent...
    Video 3: Transformer models • What are Transformer M...
    Learn more in LLM University! llm.university

Комментарии • 178

  • @arvindkumarsoundarrajan9479
    @arvindkumarsoundarrajan9479 8 месяцев назад +46

    I have been reading the "attention is all you need" paper for like 2 years. Never understood it properly like this ever before😮. I'm so happy now🎉

  • @FawadMahdi-o2h
    @FawadMahdi-o2h 11 дней назад +1

    This was hands down the best explanation I've seen of attention mechanisms and multi head attention --- the fact I'm able to use these words in this sentence means I understand it

  • @Compsci-v6q
    @Compsci-v6q 8 дней назад +1

    This channel is uderrated, your explainations is the best among other channels

  • @RG-ik5kw
    @RG-ik5kw Год назад +38

    Your videos in the LLM uni are incredible. Builds up true understanding after watching tons of other material that was all a bit loose on the ends. Thank you!

  • @GrahamAnderson-z7x
    @GrahamAnderson-z7x 4 месяца назад +5

    I love your clear, non-intimidating, and visual teaching style.

    • @SerranoAcademy
      @SerranoAcademy  4 месяца назад +1

      Thank you so much for your kind words and your kind contribution! It’s really appreciated!

  • @EricMutta
    @EricMutta 9 месяцев назад +18

    Truly amazing video! The published papers never bother to explain things with this level of clarity and simplicity, which is a shame because if more people outside the field understood what is going on, we may have gotten something like ChatGPT about 10 years sooner! Thanks for taking the time to make this - the visual presentation with the little animations makes a HUGE difference!

  • @malikkissoum730
    @malikkissoum730 10 месяцев назад +15

    Best teacher on the internet, thank you for your amazing work and the time you took to put those videos together

  • @gunjanmimo
    @gunjanmimo Год назад +9

    This is one of the best videos on RUclips to understand ATTENTION. Thank you for creating such outstanding content. I am waiting for upcoming videos of this series. Thank you ❤

  • @calum.macleod
    @calum.macleod Год назад +11

    I appreciate your videos, especially how you can apply a good perspective to understand the high level concepts, before getting too deep into the maths.

  • @apah
    @apah Год назад +4

    So glad to see you're still active Luis ! You and Statquest's Josh Stamer really are the backbone of more ml professionals than you can imagine

  • @JyuSub
    @JyuSub 6 месяцев назад +3

    Just THANK YOU. This is by far the best video on the attention mechanism for people that learn visually

  • @sayamkumar7276
    @sayamkumar7276 Год назад +10

    This is one of the clearest, simplest and the most intuitive explanations on attention mechanism.. Thanks for making such a tedious and challenging concept of attention relatively easy to understand 👏 Looking forward to the impending 2 videos of this series on attention

  • @pruthvipatel8720
    @pruthvipatel8720 Год назад +7

    I always struggled with KQV in attention paper. Thanks a lot for this crystal clear explanation!
    Eagerly looking forward to the next videos on this topic.

  • @aadeshingle7593
    @aadeshingle7593 Год назад +4

    One of the best intuitions for understanding multi-head attention. Thanks a lot!❣

  • @bobae1357
    @bobae1357 6 месяцев назад +3

    best description ever! easy to understand. I've been suffered to understanding attention. Finally I can tell I know it!

  • @nealdavar939
    @nealdavar939 5 месяцев назад +1

    The way you break down these concepts is insane. Thank you

  • @drdr3496
    @drdr3496 7 месяцев назад +3

    This is a great video (as are the other 2) but one thing that needs to be clarified is that the embeddings themselves do not change (by attention @10:49). The gravity pull analogy is appropriate but the visuals give the impression that embedding weights change. What changes is the context vector.

  • @anipacify1163
    @anipacify1163 7 месяцев назад +1

    Omg this video is on a whole new level . This is prolly the best intuition behind the transformers and attention. Best way to understand. I went thro' a couple of videos online and finally found the best one . Thanks a lot ! Helped me understand the paper easily

  • @amoghjain
    @amoghjain 9 месяцев назад +2

    Thank you for making this video series for the sake of a learner and not to show off your own knowledge!! Great anecdotes and simple examples really helped me understand the key concepts!!

  • @TheMircus224
    @TheMircus224 9 месяцев назад +1

    These videos where you explain the transformers are excellent. I have gone through a lot of material however, it is your videos that have allowed me to understand the intuition behind these models. Thank you very much!

  • @saeed577
    @saeed577 7 месяцев назад +2

    THE best explanation of this concept. That was genuinely amazing.

  • @FireMach-uo9st
    @FireMach-uo9st Месяц назад

    He makes the most difficult topics so easy to understand. Thank you

  • @mohameddjilani4109
    @mohameddjilani4109 11 месяцев назад +1

    I really enjoyed how you give a clear explanation of the operations and the representations used in attention

  • @ajnbin
    @ajnbin 8 месяцев назад +1

    Fantastic !!! The explanation itself is a piece of art.
    The step by step approach, the abstractions, ... Kudos!!
    Please more of these

  • @yairbh
    @yairbh 2 месяца назад

    Great explanation with the linear transformation matrices. Thanks!

  • @MikeTon
    @MikeTon 8 месяцев назад

    This clarifies EMBEDDED matrices :
    - In particular the point on how a book isn't just a RANDOM array of words, Matrices are NOT a RANDOM array of numbers
    - Visualization for the transform and shearing really drives home the V, Q, K aspect of the attention matrix that I have been STRUGGLING to internalize
    Big, big thanks for putting together this explanation!

  • @mayyutyagi
    @mayyutyagi 2 месяца назад

    Amazing video... Thanks sir for this pictorial representation and explaining this complex topic with such an easy way.

  • @ccgarciab
    @ccgarciab 6 месяцев назад +2

    This is such a good, clear and concise video. Great job!

  • @arulbalasubramanian9474
    @arulbalasubramanian9474 10 месяцев назад +1

    Great explanation. After watching a handful of videos this one really makes it real easy to understand.

  • @abu-yousuf
    @abu-yousuf 10 месяцев назад +1

    amazing explanation Luis. Can't thank you enough for your amazing work. You have a special gift to explain things. Thanks.

  • @docodemo727
    @docodemo727 9 месяцев назад +1

    this video is really teaching you the intuition. much better than the others I went through that just throw formula to you. thanks for the great job!

  • @epistemophilicmetalhead9454
    @epistemophilicmetalhead9454 4 месяца назад +1

    Word embeddings
    Vectorial representation of a word. The values in a word embedding describe various features of the words. Similar words' embeddings have a higher cosine similarity value.
    Attention
    The same word may mean different things in different contexts. How similar the word is to other words in that sentence will give you an idea as to what it really means.
    You start with an initial set of embeddings and take into account different words from the sentence and come up with new embeddings (trainable parameters) that better describe the word contextually. Similar/dissimilar words gravitate towards/away from each other as their updated embeddings show.
    Multi-head attention
    Take multiple possible transformations to potentially apply to the current embeddings and train a neural network to choose the best embeddings (contributions are scaled by how good the embeddings are)

  • @kevon217
    @kevon217 Год назад +1

    Wow, clearest example yet. Thanks for making this!

  • @agbeliemmanuel6023
    @agbeliemmanuel6023 Год назад +3

    Wooow thanks so much. You are a treasure to the world. Amazing teacher of our time.

  • @karlbooklover
    @karlbooklover Год назад +2

    best explanation of embeddings I've seen, thank you!

  • @rikiakbar4025
    @rikiakbar4025 2 месяца назад

    Thanks Luis, been following your contents for a while. This video about attention mechanism is very intuitive and easy to follow

  • @VenkataraoKunchangi-uy4tg
    @VenkataraoKunchangi-uy4tg 4 месяца назад

    Thanks for sharing. Your videos are helping me in my job. Thank you.

  • @soumen_das
    @soumen_das Год назад +2

    Hey Louis, you are AMAZING! Your explanations are incredible.

  • @dr.mikeybee
    @dr.mikeybee Год назад +2

    Nicely done! This gives a great explanation of the function and value of the projection matrices.

  • @pranayroy
    @pranayroy 7 месяцев назад +1

    Kudos to your efforts in clear explanation!

  • @kafaayari
    @kafaayari Год назад

    Well the gravity example is how I understood this after a long time. you are true legend.

  • @dragolov
    @dragolov Год назад +1

    Deep respect, Luis Serrano! Thank you so much!

  • @eddydewaegeneer9514
    @eddydewaegeneer9514 5 месяцев назад

    Great video and very intuitive explenation of attention mechanism

  • @DeepakSharma-xg5nu
    @DeepakSharma-xg5nu 6 месяцев назад

    I did not even realize this video is 21 minutes long. Great explanation.

  • @cyberpunkdarren
    @cyberpunkdarren 6 месяцев назад

    Very impressed with this channel and presenter

  • @iliasp4275
    @iliasp4275 3 месяца назад +1

    Excellent video. Best explanation on the internet !

  • @唐伟祚-j4v
    @唐伟祚-j4v 6 месяцев назад

    It's so great, I finally understand these qkvs, it bothers me so long. Thank you so much !!!

  • @hyyue7549
    @hyyue7549 8 месяцев назад +3

    If I understand correctly, the transformer is basically a RNN model which got intercepted by bunch of different attention layers. Attention layers redo the embeddings every time when there is a new word coming in, the new embeddings are calculated based on current context and new word, then the embeddings will be sent to the feed forward layer and behave like the classic RNN model.

    • @lohithArcot
      @lohithArcot 26 дней назад

      Can anyone confirm this?

  • @sari54754
    @sari54754 9 месяцев назад +1

    The most easy to understand video for the subject I've seen.

  • @hkwong74531
    @hkwong74531 8 месяцев назад

    I subscribe your channel immediately after watching this video, the first video I watch from your channel but also the first making me understand why embedding needs to be multiheaded. 👍🏻👍🏻👍🏻👍🏻

  • @satvikparamkusham7454
    @satvikparamkusham7454 Год назад

    This is the most amazing video on "Attention is all you need"

  • @JorgeMartinez-xb2ks
    @JorgeMartinez-xb2ks 9 месяцев назад

    El mejor video que he visto sobre la materia. Muchísimas gracias por este gran trabajo.

  • @orcunkoraliseri9214
    @orcunkoraliseri9214 6 месяцев назад

    Wooow. Such a good explanation for embedding. Thanks 🎉

  • @sathyanukala3409
    @sathyanukala3409 7 месяцев назад

    Excellent explanation. Thank you very much.

  • @tanggenius3371
    @tanggenius3371 2 месяца назад

    Thanks, the explaination is so intuitive. Finally understood the idea of attention.

  • @Cdictator
    @Cdictator 2 месяца назад

    This is amazing explanation! Thank you so much 🎉

  • @BhuvanDwarasila-y8x
    @BhuvanDwarasila-y8x 2 дня назад

    Thank you so much for the attention to the topic!

  • @perpetuallearner8257
    @perpetuallearner8257 Год назад +1

    You're my fav teacher. Thank you Luis 😊

  • @caryjason4171
    @caryjason4171 5 месяцев назад

    This video helps to explain the concept in a simple way.

  • @orcunkoraliseri9214
    @orcunkoraliseri9214 6 месяцев назад

    I watched a lot about attentions. You are the best. Thank you thank you. I am also learning how to explain of a subject from you 😊

  • @davutumut1469
    @davutumut1469 Год назад +1

    amazing, love your channel. It's certainly underrated.

  • @ignacioruiz3732
    @ignacioruiz3732 6 месяцев назад

    Outstanding video. Amazing to gain intuition.

  • @arshmaanali714
    @arshmaanali714 Месяц назад

    Superb explanation❤ please make more videos like this

  • @LuisOtte-pk4wd
    @LuisOtte-pk4wd 7 месяцев назад

    Luis Serrano you have a gift for explain! Thank you for sharing!

  • @RamiroMoyano
    @RamiroMoyano Год назад +1

    This is amazingly clear! Thank for your your work!

  • @justthefactsplease
    @justthefactsplease 6 месяцев назад +1

    What a great explanation on this topic! Great job!

  • @Omsip123
    @Omsip123 3 месяца назад +1

    Outstanding, thank you for this pearl of knowledge!

  • @erickdamasceno
    @erickdamasceno Год назад +2

    Great explanation. Thank you very much for sharing this.

  • @maysammansor
    @maysammansor 7 месяцев назад

    you are a great teacher. Thank you

  • @bananamaker4877
    @bananamaker4877 10 месяцев назад +1

    Explained very well. Thank you so much.

  • @HoussamBIADI
    @HoussamBIADI 2 месяца назад

    Thank you for this amazing explanation

  • @jayanthAILab
    @jayanthAILab 6 месяцев назад

    Wow wow wow! I enjoyed the video. Great teaching sir❤❤

  • @alijohnnaqvi6383
    @alijohnnaqvi6383 7 месяцев назад +1

    What a great video man!!! Thanks for making such videos.

  • @bbarbny
    @bbarbny 3 месяца назад

    Amazing video, thank you very much for sharing!

  • @sukhpreetlotey1172
    @sukhpreetlotey1172 6 месяцев назад

    First of all thank you for making these great walkthroughs of the architecture. I would really like to support your effort on this channel. let me know how I can do that. thanks

    • @SerranoAcademy
      @SerranoAcademy  6 месяцев назад

      Thank you so much, I really appreciate that! Soon I'll be implementing subscriptions, so you can subscribe to the channel and contribute (also get some perks). Please stay tuned, I'll publish it here and also on social media. :)

  • @drintro
    @drintro 7 месяцев назад

    Excellent description.

  • @ThinkGrowIndia
    @ThinkGrowIndia Год назад +1

    Amazing! Loved it! Thanks a lot Serrano!

  • @tvinay8758
    @tvinay8758 Год назад

    This is an great explanation of attention mechanism . I have enjoyed your maths for machine learning on coursera. Thank you for creating such wonderful videos

  • @debarttasharan
    @debarttasharan Год назад +1

    Incredible explanation. Thank you so much!!!

  • @WhatsAI
    @WhatsAI Год назад +1

    Amazing explanation Luis! As always...

  • @SulkyRain
    @SulkyRain 8 месяцев назад

    Amazing explanation 🎉

  • @mostinho7
    @mostinho7 9 месяцев назад +1

    7:00 even with word embedding, words can be missing context and there’s no way to tell like the word apple. Are you taking about the company or the fruit?
    Attention matches each word of the input with every other word, in order to transform it or pull it towards a different location in the embedding based on the context. So when the sentence is “buy apple and orange” the word orange will cause the word apple to have an embedding or vector representation that’s closer to the fruit
    8:00

  • @赵赵宇哲
    @赵赵宇哲 8 месяцев назад

    This video is really clear!

  •  6 месяцев назад

    My comment is just an array of letters for our algorithmic gods..Good stuff.

  • @shashankshekharsingh9336
    @shashankshekharsingh9336 4 месяца назад

    thank you sir 🙏, love from india💌

  • @vishnusharma_7
    @vishnusharma_7 Год назад

    You are great at teaching Mr. Luis

  • @bankawat1
    @bankawat1 Год назад

    Thanks for the amazing videos! I am eagrly waiting for the third video. If possible please do explain the bit how the K,Q,V matrices are used on the decoder side. That would be great help.

  • @benhargreaves5556
    @benhargreaves5556 8 месяцев назад

    Unless I'm mistaken, I think the linear transformations in this video incorrectly show the 2D axis as well as the object changing position, but in fact the 2D axis would stay exactly the same but with the 2D object rotating around it for example.

  • @bengoshi4
    @bengoshi4 Год назад

    Yeah!!!! Looking forward to the second one!! 👍🏻😎

  • @TemporaryForstudy
    @TemporaryForstudy Год назад

    oh my god never understood V,K,Q as matrix transformations, thanks luis, love from india

  • @today-radio-in-the-zone
    @today-radio-in-the-zone 4 месяца назад

    Thanks for your great effort to make people understand it. I, however, would like ask one thing such that you have explained V is the scores. scores of what? My opninion is that the V is the key vector so that the V makes QKT matrix to vector space again. Please make it clear for better understanding. Thanks!

  • @divikchoudhary8873
    @divikchoudhary8873 4 месяца назад

    This is just Gold!!!!!

  • @notprof
    @notprof Год назад

    Thank you so much for making these videos!

  • @JimKuo-t9h
    @JimKuo-t9h Месяц назад

    best definition ever !!!

  • @muhammadsaqlain3720
    @muhammadsaqlain3720 10 месяцев назад

    Thanks my friend.

  • @SergeyGrebenkin
    @SergeyGrebenkin 5 месяцев назад

    At last someone explained the meaning of Q, K and V. I read original article and it just says "Ok, let's have 3 additional matrix Q, K and V to transform input embedding" ... What for? Thanks for explanation, this video really helps!

  • @aaalexlit
    @aaalexlit 11 месяцев назад

    That's an awesome explanation! Thanks!

  • @neelkamal3357
    @neelkamal3357 6 дней назад +1

    Sir please tell why did you do a shear transformation , i watched it like 10 times but still didn't geti it

  • @serkansunel
    @serkansunel 7 месяцев назад

    Excellent job

  • @neelkamal3357
    @neelkamal3357 10 дней назад +1

    I didn't get it on why do we add linear transformation like earlier too we had embeddings in other planes then why do shear transformation ? Please someone answer