All The Math You Need For Attention In 15 Minutes

Поделиться
HTML-код
  • Опубликовано: 14 ноя 2024

Комментарии • 32

  • @ritvikmath
    @ritvikmath  Месяц назад +7

    Ongoing Notes:
    1. I should note that the concept of which words pay attention to which others doesn't always line up with our human expectations. In this video, I frequently claim that "meal" should attend to "savory" and "delicious" but if you look at the attention weights matrix at 9:25, "meal" attends the most to "savory" but not so much to "delicious". In reality, the model is going to do what it needs to do to excel at next word prediction, which might mean taking a different approach to setting the attention layer weights than what our human brains would "neatly expect". Still, the illustration of "meal" attending to "savory" and "delicious" is usually correct, but I wanted to clarify that it's not guaranteed and that's not a bad thing.

  • @kumaraman147
    @kumaraman147 21 день назад +1

    Best video I have ever seen for explaining attention mechanism and now I got cleared about attention ❤

  • @YourDailyR
    @YourDailyR Месяц назад

    This channel is gold!!!

  • @eingram
    @eingram Месяц назад +3

    Perfect timing, learning about this in class right now!

  • @tantzer6113
    @tantzer6113 Месяц назад +5

    Question: LLMs obviously 1) account for hierarchies of concepts/abstractions, 2) perform complicated logical operations, decision-tree-like, on those concepts (and words). Having read about attention and having watched a dozen videos on it, I have never encountered an explanation of how attention can do these things. My guess is that the layering of attention layers is instrumental in all of that but I have seen no discussion or explanation of this.

    • @ScilentE
      @ScilentE Месяц назад

      I'm not sure if I would say LLMs "obviously" do those two things, but they are certainly emergent behaviors due to increases in compute. Scaling laws are pretty cool!

  • @rubncarmona
    @rubncarmona Месяц назад +1

    great job. I've been studying the subject by myself and had missed the visualization of vector sums in the value space. thanks for posting.

  • @Omsip123
    @Omsip123 Месяц назад +1

    Liked , subscribed, and commented. This is pure gold!

  • @Darkev77
    @Darkev77 Месяц назад +2

    Oof, I really needed this a while ago, finally!

    • @ritvikmath
      @ritvikmath  Месяц назад

      Sorry to be late but I hope it was worth it!

  • @softerseltzer
    @softerseltzer Месяц назад

    Great explanation, loved it!

  • @bin4ry_d3struct0r
    @bin4ry_d3struct0r Месяц назад

    Fantastic explanation! For the next videos in this series, please touch upon the role of the residual connection. I'm still iffy on what it's doing.

  • @horoshuhin
    @horoshuhin Месяц назад

    yessssss. let's talk about those in the next videos. this is a great channel for the way you explain things. I don't know if it;s too far ahead but it would be awesome to see some small code examples too.

  • @jessicatran5467
    @jessicatran5467 Месяц назад

    thank you for these videos !!!

  • @vedantvashi9051
    @vedantvashi9051 Месяц назад +1

    Can you do a video on how input(ex. Words, videos, audio) are tokenized into vectors

  • @juanluisesteban7394
    @juanluisesteban7394 Месяц назад

    This great. Thanks

  • @sonaliganguli6553
    @sonaliganguli6553 Месяц назад +3

    I waited for it for months..

    • @ritvikmath
      @ritvikmath  Месяц назад +1

      sorry for the wait! hope it is worth it 😎

  • @TheGhostWinner
    @TheGhostWinner Месяц назад

    The values for the attention of a word on the Attention Matrix are on the lines? What does the columns represent? I always imagined this matrix to be like a covariance matrix, but by the looks of it I could be more wrong

  • @TechWithAbee
    @TechWithAbee Месяц назад

    ❤thanks

  • @radionnazmiev546
    @radionnazmiev546 Месяц назад

    Amazing video! Would be nice to see how its actually calculated on a small few words sentence