How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024

Комментарии • 2,5 тыс.

  • @3blue1brown
    @3blue1brown  6 месяцев назад +936

    Edit: The finalized version of the next chapter is out ruclips.net/video/eMlx5fFNoYc/видео.html
    Early feedback on video drafts is always very important to me. Channel supporters always get a view of new videos before their release to help inform final revisions. Join at 3b1b.co/support if you’d like to be part of that early viewing group.

    • @bbrother92
      @bbrother92 6 месяцев назад +9

      @3Blue1Brown thanks explaining these things - it is very hard for web programmer to undestand math

    • @JohnSegrave
      @JohnSegrave 6 месяцев назад +17

      Grant, this is so good! I've worked in ML about 8 years and this is one of the best descriptions I've seen. Very nicely done. Big 👍👍👍

    • @deker0954
      @deker0954 6 месяцев назад +1

      Is this worth understanding?

    • @bbrother92
      @bbrother92 6 месяцев назад +2

      @@JohnSegrave sir, could you recomend video analysis framework any video description model?

    • @didemyldz1317
      @didemyldz1317 6 месяцев назад +2

      Could you share the name of the model that is used for text-to-speech generation ? Me and my teammate are working on a Song Translator as a senior design project. This might be very helpful. Thanks in advance :)

  • @DynestiGTI
    @DynestiGTI 6 месяцев назад +2693

    Grant casually uploading the best video on Transformers on RUclips

    • @drgetwrekt869
      @drgetwrekt869 6 месяцев назад +10

      i was expecting froggin electromagnets to be honest :-)

    • @brandonmoore644
      @brandonmoore644 6 месяцев назад +17

      This video was insanely good!

    • @shoam2103
      @shoam2103 6 месяцев назад +5

      Even having a basic understanding of what it is, this was still extremely helpful!

    • @yigitpolat
      @yigitpolat 6 месяцев назад +4

      yeah but it did not talk about transformers in this chapter

    • @stefchristensen47
      @stefchristensen47 5 месяцев назад

      I wish I could retweet this post.

  • @JustinLe
    @JustinLe 6 месяцев назад +5486

    here's to hoping this is not an April fools

    • @anuragpranav
      @anuragpranav 6 месяцев назад +663

      it is - you would be a fool to not watch this video

    • @tinku-n8n
      @tinku-n8n 6 месяцев назад +103

      It's 2nd April here

    • @TheUnderscore_
      @TheUnderscore_ 6 месяцев назад +21

      @@anuragpranavEven if you already know the subject? 😂

    • @me0101001000
      @me0101001000 6 месяцев назад +91

      @@TheUnderscore_ it's never a bad idea to review what you know

    • @anuragpranav
      @anuragpranav 6 месяцев назад +67

      @@TheUnderscore_ you are almost certainly limiting what you might know with that approach

  • @Silent_Knife
    @Silent_Knife 6 месяцев назад +1213

    The return of the legend! This series is continuing, that is the best surprise of RUclips, thanks Grant, you have no idea how much the young population of academia is indebted to you.

    • @kikiroy5178
      @kikiroy5178 6 месяцев назад +7

      I'm 26, young engineer. Thinking the same. Well said.

    • @youonlytubeonce
      @youonlytubeonce 6 месяцев назад +6

      I liked your comment because I'm sure you're right but don't be ageist! 😊 Us olds love him too!

    • @samad.chouihat4222
      @samad.chouihat4222 6 месяцев назад

      Young and seniors alike

    • @robertwilsoniii2048
      @robertwilsoniii2048 4 месяца назад

      And by logic, Grant is indebted to his 2015 era Stanford education. That was a high point in the faculty and curriculum in general there.

  • @newxceo
    @newxceo 6 месяцев назад +103

    Those who watched more than once gather here 😂

  • @tempo511
    @tempo511 6 месяцев назад +743

    The fact that meaning behind tokens is embedded into this 12000 dimensional space, and you get relationships in terms of coordinates and direction, that exists across topics is mind blowing. Like, Japan -> sushi is similar to Germany -> bratwurst is just so darn neat

    • @amarissimus29
      @amarissimus29 6 месяцев назад

      And it makes the absurdly ham fisted model tampering behind debacles like the Gemini launch look even more absurd. I can hear the troglodytes mobbing in the nth dimension.

    • @dayelu2679
      @dayelu2679 6 месяцев назад +10

      I‘ve come to this realization long time ago then I want to find isomorphic structures of concepts across different disciplines

    • @TheKoekiemonster1234
      @TheKoekiemonster1234 6 месяцев назад

      @@dayelu2679🤓

    • @stefchristensen47
      @stefchristensen47 5 месяцев назад +26

      You can actually try this out in your nearest large language model, like ChatGPT, CoPilot, Gemini, or Mistral. Just ask it to do vector math on the words. Since there isn't a predefined vector word calculus in English, the LLM defaults to just using a version of its own internal representation, and so it can eke out pretty results. I was able to duplicate Hitler - Germany + Italy = Mussolini and sushi - Japan + Germany = sausage (or bratwurst, bother score highly) in GPT-3.5-Turbo Complete.
      It also figured out sushi - Japan + Lebanon = shawarma; sushi - Japan + Korea = kimchi; Hitler - Germany + Spain = Franco; and Hitler - Germany + Russia = Stalin.

    • @Flako-dd
      @Flako-dd 5 месяцев назад +10

      Super disappointed. The German Sushi is called Rollmops.

  • @iau
    @iau 6 месяцев назад +2731

    I graduated from Computer Science in 2017. Back then, the cutting edge of ML were Recurrent Neural Networks, in which I based my thesis. This video (and I'm sure the rest of this series) just allowed me to catch up to years of advancements in so little time.
    I cannot describe how important your teaching style is to the world. I've been reading articles, blogs, papers on embeddings and these topics for years now and I never got it quite like I got it today. In less than 30 minutes.
    Imagine a world in which every teacher taught like you. We would save millions and millions of man hours every hour.
    You truly have something special with this channel and I can only wish more people started imitating you with the same level of quality and care. If only this became the standard. You'd deserve a Noble Prize for propelling the next thoustand Nobel Prizes.

    • @lucascorreaaa
      @lucascorreaaa 6 месяцев назад +38

      Second that!

    • @kyo250996
      @kyo250996 6 месяцев назад +44

      Same, I did a thesis about vectorize word back in 2017 and no one ever talked about the whole vector of word gives rise to meaning and context when you generate phrases.
      Too bad since noone was interested in ML back then, I leaned on web development and drop the ML :(

    • @iankrasnow5383
      @iankrasnow5383 6 месяцев назад +22

      Funny enough, the other 6 videos in this series all came out in 2017, so you probably didn't miss much.

    • @XMysticHerox
      @XMysticHerox 6 месяцев назад +23

      Well transformers were first developed in 2017 so it was the cutting edge exactly when you graduated ^^

    • @rock_sheep4241
      @rock_sheep4241 6 месяцев назад +2

      This is explained in layman terms, but in reality is more complicated than this

  • @billbill1235
    @billbill1235 6 месяцев назад +1669

    I was trying to understand chatGPT through videos and texts on the Internet. I always said: I wish 3b1b releases a video about it, it's the only way for someone inexperienced to understand, and here it is. Thank you very much for your contributions to youtube!!

    • @lmao8207
      @lmao8207 6 месяцев назад +20

      no even the other videos are kinda meh, even if youre not inexperienced because they dont go in depth, i feel here people get a nice understanding of the concepts captured by the models instead of just the architecture of the models

    • @goldeer7129
      @goldeer7129 6 месяцев назад

      It's kind of true, but if I had to recommend a good place to actually understand transformers and even other machine learning things I would definitely recommend StatQuest, its levels of clearly explaining what's going on are very high. But I'm also very excited to see how 3B1B is going to render all that visually as always

    • @himalayo
      @himalayo 6 месяцев назад +2

      I was also just looking into transformers due to their extreme takeover in computer vision!

    • @baconheadhair6938
      @baconheadhair6938 6 месяцев назад +3

      shoulda just asked chatgpt

    • @ironmancloud9759
      @ironmancloud9759 6 месяцев назад +1

      NLP specialization by Andrew covered everything 😅

  • @parenchyma
    @parenchyma 6 месяцев назад +412

    I don't even know how many times I'm going to rewatch this.

    • @chlodnia
      @chlodnia 6 месяцев назад

      True

    • @RaoBlackWellizedArman
      @RaoBlackWellizedArman 6 месяцев назад +8

      3B1B doens't need to be saved in watch later folder because all his videos are worth watching later.

    • @synthclub
      @synthclub 6 месяцев назад +1

      What will you set your weights n biases too?

    • @oofsper
      @oofsper 6 месяцев назад

      same

    • @arthurgames9610
      @arthurgames9610 Месяц назад

      Me fr

  • @tomasretamalvenegas9294
    @tomasretamalvenegas9294 5 месяцев назад +6

    CHILE MENTIONED 🇨🇱🇨🇱❤️❤️🇨🇱🇨🇱🇨🇱 COME TO SANTIAGO GRANT!!!

  • @mahdimoradkhani6610
    @mahdimoradkhani6610 4 месяца назад +17

    The genius in what you do is taking complicated concepts and making them easy to digest. That's truly impressive!

  • @shaqtaku
    @shaqtaku 6 месяцев назад +126

    I can't believe Sam Altman has become a billionaire just by multiplying some matrices

    • @Dr.Schnizzle
      @Dr.Schnizzle 5 месяцев назад +34

      You'd be surprised at how many billionaires got there from multiplying some matrices

    • @tiborsaas
      @tiborsaas 5 месяцев назад +7

      It's too much reduction, he added value on a higher level. But yeah, when you look deep enough, everything stops looking like magic.

    • @FinnishSuperSomebody
      @FinnishSuperSomebody 5 месяцев назад +2

      @@tiborsaas And that is a good thing in many cases, it casts away illogical fears when you understand that there is no any kind of magic or thinking behind this. In practice it is just overhpyed guessing machine what word normally might come after X.

    • @kylev.8248
      @kylev.8248 5 месяцев назад

      @@FinnishSuperSomebody this concept comes from 2017. We should actually be very very worried and keeping our eye closely on the progress that AI is making. The amount of progress they have made since the 2017 paper 📝 “Attention is all you need “ is insane.

    • @TheRevAlokSingh
      @TheRevAlokSingh 5 месяцев назад

      He doesn’t own any shares in OpenAI. His money is from before

  • @nicholaitukanov1162
    @nicholaitukanov1162 6 месяцев назад +408

    I have been working on transformers for the past few years and this is the greatest visualization of the underlying computation that I have seen. Your videos never disappoint!!

    • @brian8507
      @brian8507 6 месяцев назад +3

      So if we "stop" you... then we avoid judgement day? We should meet for coffee

    • @giacomobarattini1130
      @giacomobarattini1130 6 месяцев назад +17

      ​@@brian8507 "judgement day" 😭

    • @beProsto
      @beProsto 6 месяцев назад

      ​@@brian8507 bro's got underlying psychological issues

    • @talharuzgarakkus7768
      @talharuzgarakkus7768 6 месяцев назад

      I agree with you . Visualization is perfect way to understanding transformer architecture. Specifically attention mechanism

    • @jawokenn8766
      @jawokenn8766 6 месяцев назад

      @@giacomobarattini1130its later than you thinj

  • @chase_like_the_bank
    @chase_like_the_bank 6 месяцев назад +405

    You *must* turn the linguistic vector math bit into a short. -Japan+sushi+germany=bratwurst is pure gold.

    • @XMysticHerox
      @XMysticHerox 6 месяцев назад +5

      I am slightly offended it did not result in "Fischbrötchen".

    • @marshmellominiapple
      @marshmellominiapple 6 месяцев назад +5

      @@XMysticHerox It was trained in English words only.

    • @XMysticHerox
      @XMysticHerox 6 месяцев назад +11

      @@marshmellominiapple ChatGPT supports 95 languages. Not all equally well. But as a German yes it works just as well with german as it does with english.

    • @-Meric-
      @-Meric- 6 месяцев назад +5

      @@marshmellominiapple Word2Vec and other vector embeddings of words like glove or whatever don't care about language. They don't "understand" the meaning of the words, they just eventually find patterns in unstructured data to create the embeddings. It works in any language and GPT has a ton of other languages in its training data

    • @stefchristensen47
      @stefchristensen47 5 месяцев назад +10

      You can actually try this out in your nearest large language model, like ChatGPT, CoPilot, Gemini, or Mistral. Just ask it to do vector math on the words. Since there isn't a predefined vector word calculus in English, the LLM defaults to just using a version of its own internal representation, and so it can eke out pretty results. I was able to duplicate Hitler - Germany + Italy = Mussolini and sushi - Japan + Germany = sausage (or bratwurst, bother score highly) in GPT-3.5-Turbo Complete.
      It also figured out sushi - Japan + Lebanon = shawarma; sushi - Japan + Korea = kimchi; Hitler - Germany + Spain = Franco; and Hitler - Germany + Russia = Stalin.

  • @HarrisonBorbarrison
    @HarrisonBorbarrison 6 месяцев назад +2

    It'll be a long time until I eventually understand this. I thought the word embedding vector 3D diagram was cool though.

  • @haorancheng4870
    @haorancheng4870 5 месяцев назад +12

    I listened to my professor explaining the crazy equation of softmax for a semester already, and you explained it so well with how temperature also plays a role there. Big RESPECT!

  • @Mutual_Information
    @Mutual_Information 6 месяцев назад +455

    Grant shows just how creative you can get with linear algebra. Who would have guessed language (?!) was within its reach?

    • @abrokenmailbox
      @abrokenmailbox 6 месяцев назад

      Look up "Word2Vec", it's an interestingly explored idea.

    • @Jesin00
      @Jesin00 6 месяцев назад +62

      Linear algebra would not be enough, but a nonlinear activation function (even one as simple as max(x, 0)) makes it enough to approximate anything you want just by adding more neurons!

    •  6 месяцев назад +10

      Given words are descriptors and numbers are just arbitrarily precise adjectives... aka descriptions...

    • @Mutual_Information
      @Mutual_Information 6 месяцев назад +3

      @@Jesin00 Yes, lin alg alone isn't enough.

    • @psychic8872
      @psychic8872 6 месяцев назад +1

      Well ML uses linear algebra and he just explains it

  • @shubhamz2464
    @shubhamz2464 6 месяцев назад +86

    This series should continue. I thought it was dead after the 4th video. Lots of love and appreciation for your work

  • @xiangzhang5279
    @xiangzhang5279 6 месяцев назад +13

    I have always been blown away by how great your visualization is for explaining ML concepts. Thanks a lot!

  • @fvsfn
    @fvsfn Месяц назад +9

    I am a math teacher and one of my classes is about AI. I am making watching this mini-series a mandatory requirement. This is just what my students need. Thanks for the exceptional quality of the content on your channel.

    • @StacyMcCabe
      @StacyMcCabe Месяц назад +1

      What class and grade do you teach?

    • @fvsfn
      @fvsfn Месяц назад +1

      It is a master 2 class on the mathematical foundations of AI.

  • @jortand
    @jortand 6 месяцев назад +31

    Damit nice April fools joke, I got fooled into learning something.

  • @yashizuko
    @yashizuko 6 месяцев назад +59

    Its astonishing, amazing that this kind of info and explaination quality is available for free, this is way better than a University would explain it

    • @lonnybulldozer8426
      @lonnybulldozer8426 6 месяцев назад +1

      Universities are buildings. Buildings can't talk. Therefore, they cannot explain.

  • @avishshah2186
    @avishshah2186 6 месяцев назад +60

    You made my day!! This topic was taught at my grad school and I needed some intuition today and you have uploaded the video!!! It seems you heard me!!Thanks a ton!! Please upload video of Vision Transformers, if possible

  • @CODE7X
    @CODE7X 6 месяцев назад +7

    Im in highschool, and i only knew broken pieces of how it works , but you really connected all the pieces together and added the missing ones

  • @anearthian894
    @anearthian894 6 месяцев назад +1

    Imagine completing a 30 min video in 2 hr and then thinking "That was very efficient".

  • @PiercingSight
    @PiercingSight 6 месяцев назад +59

    Straight up the best video on this topic. The idea that the dimensions of the embedding space represent different properties of a token that can be applied across tokens is just SO cool!

    • @JonnySolomon
      @JonnySolomon 6 месяцев назад +1

      i felt that

    • @MagicGonads
      @MagicGonads 6 месяцев назад +1

      orienting and ordering the space (called the 'latent' space) so that the most significant directions come first is called 'principal component analysis' (useful for giving humans the reigns to some degree since we get to turn those knobs and see something interesting but vaguely predictable happen)

    • @andrewdunbar828
      @andrewdunbar828 6 месяцев назад

      I agree. I starting writing about that in a comment about 2 seconds into the video before I knew how well he was going to cover it since it's usually glossed over way too much in other introductions to these topics.

  • @lewebusl
    @lewebusl 6 месяцев назад +203

    This is heaven for visual learners. Animations are correlated smoothly with the intended learning point ...

    • @gorgolyt
      @gorgolyt 5 месяцев назад +14

      There's no such thing as visual learners. Other than the blind, all humans are visual creatures. It's heaven for anyone who wants to learn.

    • @lewebusl
      @lewebusl 5 месяцев назад +5

      @@gorgolyt You are right. The human get input from 5 senses , but 90 percent of the brain receptors are directly connected to optical and auditory nerves. That is where the visual dominates the other senses ... For blind people the auditory dominates...

    • @rinkashikachi
      @rinkashikachi 3 месяца назад +3

      @@lewebusl you said an obvious fact and then made a nonsensical bs conclusion out of it. there are no visual learners and it is a proven scientific fact

    • @HydrogenAlpha
      @HydrogenAlpha 3 месяца назад

      @@gorgolyt Yeah Veritasium did an excellent video debunking the pop-science nonsense behind this very commonly held misconception / fake science.

  • @keesdekarper
    @keesdekarper 6 месяцев назад +190

    This video is gonna blow up. The visualizations will help many people that aren't familiar with NN's or Deep Learning to at least grasp a little bit what is happening under the hood. And with the crazy popularity of LLM's nowadays, this will for sure interest a lot of people

    • @TheScarvig
      @TheScarvig 6 месяцев назад +3

      as someone who gave a lot of fellow students lessons in stem field classes i can tell you that the sheer amount of numbers arranged in matrices will immediately shut down the average persons brain...

    • @lesselp
      @lesselp 5 месяцев назад

      No, normal people just want to party.

  • @bhayovah
    @bhayovah 4 месяца назад +2

    The whole freaking thing is built out of stereotypifications!

  • @kalashshah6234
    @kalashshah6234 6 месяцев назад +5

    This is absolutely one of the best videos for explaining the workings of LLMs. Love the visualisation and the innate ease with which the concepts were explained.
    Hats off!!

  • @z-beeblebrox
    @z-beeblebrox 6 месяцев назад +5

    3blue1brown released a normal video today. So did Numberphile. So did nearly all the channels in my subsd. There's no wacky bullshit on the google homepage. No stupid gimmick feature in Maps. Have we done it? Have we finally killed off the lamest holiday? Is it finally dead?

  • @1bird_d
    @1bird_d 6 месяцев назад +208

    I always thought when people in the media say, "NO ONE actually understands how chat GPT works" they were lying, but no one was ever able to explain it in layman's terms regardless. I feel like this video is exactly the kind of digestible info that people need, well done.

    • @alexloftus8892
      @alexloftus8892 6 месяцев назад +117

      Machine learning engineer here - plenty of people understand how the architecture of chatGPT works on a high level. When people in the media say that, what they mean is that nobody understands the underlying processing that the parameters are using to go from a list of tokens to a probability distribution over possible next tokens.

    • @kevinscales
      @kevinscales 6 месяцев назад +76

      It's not a lie, it's just not very precise. No one can tell you exactly why one model decided the next word is "the" while another decided the next word is "a" and in that sense no one understands how a particular model works. The mechanism for how you train and run the model are understood however.

    • @lolololo-cx4dp
      @lolololo-cx4dp 6 месяцев назад +7

      ​@@kevinscalesyeah just like any deep ANN

    • @metachirality
      @metachirality 6 месяцев назад +45

      Think of it as the difference between knowing how genetics and DNA and replication works vs. knowing why a specific nucleotide in the human genome is adenine rather than guanine.
      There is an entire field of machine learning research dedicated to understanding how neural nets work beyond the architecture called AI interpretability.

    • @KBRoller
      @KBRoller 6 месяцев назад +9

      No one fully understands what the learned parameters mean. Many people understand the process by which they were learned.

  • @DaxSudo
    @DaxSudo 6 месяцев назад +87

    Writing my first academically published paper on AI rn and I have to say as a engineer in this space, this is one of the most complete and well nuanced explanations of these tools. Gold, nay platinum standard for educational content on this topic for decades to come.

  • @claudiazeng5668
    @claudiazeng5668 3 месяца назад +14

    I am a non-AI software engineer and I’ve been watching multiple transformer and LLM talks from OpenAI, Stanford online, NLP PhDs, and even some AI founding researchers. Some with code, some with the encoder-decoder diagram, some with Attention is all you need paper, some with ML histories. Still, visualization helps the best when incorporating everything in mind. It’s just beautiful, and love the way you organize the academic terminologies. Salute to this series 100%!

  • @jessylikesithard
    @jessylikesithard 6 месяцев назад +5

    This is by far the most organized explanation i've seen about transformers.

  • @Kargalagan
    @Kargalagan 6 месяцев назад +78

    I wish i had a friend as passionate as this channel is. It's like finding my family I've always wanted to have

    • @katech6020
      @katech6020 6 месяцев назад +3

      I wish the same thing

    • @sumedh-girish
      @sumedh-girish 5 месяцев назад +7

      become friends already you both

    • @TheXuism
      @TheXuism 5 месяцев назад +1

      here we are 3b1bro now

    • @cagataydemirbas7259
      @cagataydemirbas7259 5 месяцев назад +1

      Lets become friends

    • @NishantSingh-zx3cd
      @NishantSingh-zx3cd 4 месяца назад +2

      Be that friend to the younger people in your family.

  • @TheMuffinMan
    @TheMuffinMan 6 месяцев назад +101

    Im a mechanical engineering student, but I code machine learning models for fun. I was telling my girlfriend just last night that your series on dense neural networks is the best to gain an intuitive understanding on the basic architecture of neural networks. You have no idea what a pleasant surprise it was to wake up to this!

    • @baconheadhair6938
      @baconheadhair6938 6 месяцев назад

      good man

    • @keesdekarper
      @keesdekarper 6 месяцев назад +6

      It doesn't have to be just for fun. I was also in Mechanical Engineering, picked a master in control theory. And now I get to use Deep learning and NN's for intelligent control systems. Where you learn a model or a controller by making use of machine learning

  • @codediporpal
    @codediporpal 6 месяцев назад +21

    18:45 This the the clearest layman explanation of how attention works that I've ever seen. Amazing.

  • @zix2421
    @zix2421 2 месяца назад +1

    It’s so interesting, I hope once I’ll be able to make one gpt by myself

  • @jaafars.mahdawi6911
    @jaafars.mahdawi6911 5 месяцев назад +8

    Man! You never fail to enlighten, entertain, and inspire us, nor do we get enough of your high-quality, yet very digestible, content! Thank you, Grant!

    • @setarehami23
      @setarehami23 2 месяца назад

      Shame on Ruhollah Khomeini! He destroyed my country. He is a terrorist; a wolf in sheep's clothing.

  • @ai_outline
    @ai_outline 6 месяцев назад +49

    We need more Computer Science education like this! Amazing 🔥

    • @examforge
      @examforge 6 месяцев назад +5

      Honestly I hope that in future, AI can produce such great content. This will probably tend to take a couple of years more, but I guess its possible. Even better: You got your own Curriculum based on your strengthens and weaknesses. For me this would be a combination of fireship and 3blue1brown content...

  • @Skyace13
    @Skyace13 6 месяцев назад +20

    So you’re telling me computer models can quantify “a few” or “some” based on how close the value is to a given word of a number from its usage from training data?
    I love this

    • @andrewdunbar828
      @andrewdunbar828 6 месяцев назад +1

      Well, a bit.

    • @XMysticHerox
      @XMysticHerox 6 месяцев назад +7

      Well it can encode any semantic meaning only really limited by the number of parameters and quality of training data.

    • @gpt-jcommentbot4759
      @gpt-jcommentbot4759 6 месяцев назад +2

      @@XMysticHerox quantity*

  • @lucasamadsen
    @lucasamadsen 6 месяцев назад +21

    2 years ago I started studying transformers, backpropagation and the attention mechanism. Your videos were a corner stone for my understanding of those concepts!
    And now, partially thanks to you, I can say: “yeah, relatively smooth to understand”

  • @junjalapeno7773
    @junjalapeno7773 6 месяцев назад +2

    As someone working in IT, I can imagine how much blood, sweat and tears were involved in coding and testing this, the number of meetings and arguments with the product owner, project manager and the management to come up with this complex and beautiful infrastructure. We can't even deploy a fucking CRM in peace

  • @moleculist7978
    @moleculist7978 4 месяца назад +1

    Anyone claiming this is simplistic does not understand the idea of abstraction/specification. This video gives you an abstract frame of reference from which you can hang multiple layers of specification to whatever degree you want. This is a critically important way of thinking about anything.

  • @owenleynes7086
    @owenleynes7086 6 месяцев назад +11

    this channel is so good at making math interesting, all my friends think im wack for enjoying math videos but its not hard to enjoy when you make them like this

  • @punkdigerati
    @punkdigerati 6 месяцев назад +17

    I appreciate that you explain tokenization correctly and the usefulness of simplifying it. Many explanations skip all that and just state that the tokens are words.

    • @pw7225
      @pw7225 6 месяцев назад +3

      Apart from the fact that tokens CAN actually be longer than a word, too. :) Sub-word token does not mean that tokens must be smaller than a word.

    • @ratvomit874
      @ratvomit874 5 месяцев назад

      There is a related idea here in how Roombas navigate houses. They clearly are forming a map of your house in their memory, but there is no guarantee they see it the same way we do i.e. the different zones they see in your house may not correspond nicely to the actual rooms in the house. In the end, though, it doesn't really matter, as long as the job gets done correctly

  • @eloyfernandez8668
    @eloyfernandez8668 6 месяцев назад +8

    The best video explaining the transformer architecture that I've seen so far... and there are really good videos covering this topic. Thank you!!

  • @EranM
    @EranM 2 месяца назад +2

    My guy, 3Blue1Brown, your content is an invaluable piece of knowledge to mankind, especially for ML practitioners like myself. I cannot express my gratitude enough. Don't grow old please. If you find that impossible, please come up with a way to embed yourself in a neural network for the next generations of humanity. I trust you to figure it out!

  • @ogginger
    @ogginger 5 месяцев назад +4

    You are such an AMAZING teacher. I feel like you've really given thought to the learners perception and are kind enough to take the time and address asides and gotchas while you meticulously build components and piece them together all with a very natural progression that's moving towards "something" (hopefully comprehension). Thank you so much for your time, effort, and the quality of your work.

  • @cone10ceramics
    @cone10ceramics 6 месяцев назад +7

    I know the material of this chapter very well. Still, I watched it in its entirety just for the pleasure of watching a masterful presentation, the restful and authoritative cadence of the voice, and the gorgeous animation. Well done, Grant, yet again.

  • @jerryanyu8467
    @jerryanyu8467 6 месяцев назад +10

    Thank you! You're so late 3Blue1Brown, it took me 10 hours of videos + blogs last year to understand what a transformer is! This is the long waited video! I'm sending this to all my friends.

  • @André-b3w
    @André-b3w 6 месяцев назад +10

    So sad that so many people think AI picks bits of text and images directly from data and just makes a collage...

  • @y337
    @y337 5 месяцев назад +10

    This guy taught me how to build a neural network from scratch, I was waiting for this video, I even posted a request for it in the subreddit for this channel. I’m very glad this finally exists

  • @Clubexify
    @Clubexify 5 месяцев назад +1

    At minute 13:55 it is said that the word vectors have a dimension of 12,288. It seems to be the old Davinci embedding model but I'm curious about the high dimensionality number as the newer models such as V3 Large from OpenAi offer output dimensions of 3072. All the other top open source models offer dimensions in the 2000-4000 range. Is the old embedding model that superior or way too big?

  • @connorgoosen2468
    @connorgoosen2468 6 месяцев назад +8

    This couldn't have come at a better time for me! I'm very excited for this continuation of the series. Thanks Grant!

  • @StephaneDesnault
    @StephaneDesnault 6 месяцев назад +5

    Thank you so much for the immense work and talent that goes into your videos!

  • @MaxGuides
    @MaxGuides 6 месяцев назад +5

    Amazing work, your simple explanations in other videos in this series really helped me get a better understanding of what my masters classes were covering. Glad to see you’re continuing this series! ❤

  • @christianquintili
    @christianquintili 5 месяцев назад +4

    This video is an act of democracy. Thank you

  • @vikrambhutani
    @vikrambhutani 6 месяцев назад +3

    I love 3Blue1Brown series ..the linear algebra series was really the state-of-the-art and recognized globally for AI enthusiasts like myself. Now hot topics such as Transformers and GenAI , this is really the best explanation by far. Its short and precise and that's what we want.

  • @SidharthSisawesome
    @SidharthSisawesome 6 месяцев назад +15

    The idea of describing a vector basis as a long list of questions you need to answer is exactly the teaching tool I needed in my kit!! I love that perspective!

  • @scolton
    @scolton 6 месяцев назад +7

    Most exciting part of my week by far

  • @LambdaMotivation
    @LambdaMotivation 6 месяцев назад +8

    I wish I had you as a teacher. You make math so much more fun than I know it already❤

  • @thatasianboii
    @thatasianboii 6 месяцев назад +1

    I was watching the other videos (on the same series) and I wished there was a video about GPTs, especially with the recent hype around AI. Perfect timing!

  • @j.d.4697
    @j.d.4697 3 месяца назад +1

    🤔 So it's like multi-dimensional neurons that reflect patterns.
    Should be the same way the human brain works, with different concepts representing different dimensions.
    This makes me think these GPTs are like specialized human brain parts, and that combining different GPTs trained in the same areas a human brain was "trained" for should somewhat resemble a human mind.
    Because a human brain is "interdisciplinary" in how it constructs "pictures" of the surrounding reality.

  • @RyNiuu
    @RyNiuu 6 месяцев назад +4

    ok, you read my mind. From all of the channels, I am so glad, it's you explaining Transformers.

  • @ranajakub
    @ranajakub 6 месяцев назад +4

    this is the best series from you by far. excited for its revival

  • @ahmedivy
    @ahmedivy 6 месяцев назад +5

    Without watching i can say that this is going to be the best transformers video on yt

  • @LLMTokenizationID
    @LLMTokenizationID 5 месяцев назад +1

    Today, artificial intelligence is well developed. Even a lot of programs, various engineering technologies are coming out. I would say Sam Altman and Greg Brockman who brought artificial intelligence to this day. You should really ask Sam, Greg, Ilya about artificial intelligence.

  • @eyalhamtsany9472
    @eyalhamtsany9472 6 месяцев назад +1

    Great video! (as always...) A question about the softmax with Temperature: if you use the Temperature (T) as a dividing factor - how can you set it to 0?

  • @BobbyL2k
    @BobbyL2k 6 месяцев назад +15

    As an ML researcher this is an amazing video ❤. But please allow me to nitpick a little at 21:45
    It’s important to note that while the “un-embedding layer” of a Transformer typically have a different set of weights from the embedding layer, in OpenAI’s GPT model each vector for each word in the un-embedding layer is exactly the same vector as ones in the embedding layer.
    This is not the case for Transformer models that has the output be in a different domain than the input (e.g, translating to a different language), but since the video is specifically talking about GPT. This is the specific of the implementation detailed in the “Improving Language Understanding by Generative Pre-Training” paper by OpenAI.
    The reusing weights make sense here because each the vector from the embedding is a sort of “context free” representation of the word. So there is not need to learn another set of weights.

  • @bridgeon7502
    @bridgeon7502 6 месяцев назад +4

    Hang on, I thought this series was done! I'm delighted!

  • @viola_case
    @viola_case 6 месяцев назад +39

    Deep learning is back baby!

    • @kevinscales
      @kevinscales 6 месяцев назад +5

      A short 6 year 5 month wait!

  • @ichbuttermirdenlachs
    @ichbuttermirdenlachs 6 месяцев назад +1

    24:28 setting T=0 would lead to divide by 0, though, or am i seeing it wrong? Or does GPT-3 catch that and corrects it to something like 0.0..1?

  • @eventyonwhite686
    @eventyonwhite686 2 месяца назад +1

    amazingly well structured information ! loved every rich second :3

  • @dima13693
    @dima13693 6 месяцев назад +3

    I usually gloss over your videos as it gets more technical. But whatever you did this time, kept me hooked the whole time.

  • @Astronomer6573
    @Astronomer6573 6 месяцев назад +4

    Your explanation tends to always be the best! Love how you visualise all these.

  • @actualBIAS
    @actualBIAS 6 месяцев назад +7

    OH MY GOODNESS
    Your timing is just right! I'm learning about deep neural nets and transformers will be my next topic this week.
    I'M SO EXCITED, I JUST CAN'T HIDE IT!
    I'M ABOUT TO LOSE MY MIND AND I THINK I LIKE IT!

  • @didemyldz1317
    @didemyldz1317 5 месяцев назад +1

    Could you share the name of the model that is used for text-to-speech generation ? Me and my teammate are working on a Song Translator as a senior design project. This might be very helpful. Thanks in advance :)

  • @samarthsingla8184
    @samarthsingla8184 5 месяцев назад +1

    A quick doubt: it was mentioned that during training, each of the columns of the final matrix is used to predict what comes next. Does that mean that the first column predicts the first word, the second column predicts the next to next word and so on? If yes, then during actual prediction tasks, shouldn't we predict using the first column instead of the last, since the last would actually represent a very far away word?
    Thanks for the amazing explanations as always!

  • @davidm2.johnston684
    @davidm2.johnston684 6 месяцев назад +6

    Hello 3b1b, I wanted to say a huge thank you for this specific video. This was exactly what I've been needing. Every now and again, I thought to myself, as someone who's been interested in machine learning for my whole adult life, that I should really get a deep understanding of how a transformer works, to the point that I could implement a functional, albeit not efficient, one myself.
    Well, I'm on my way to that, this is at least a great introduction (and knowing your channel I really mean GREAT), and I really wanted to thank you for that!
    I know this is not much, but I'm not in a position to support this channel in a more meaningful way at the moment.
    Anyways, take care, and thanks again!

    • @3blue1brown
      @3blue1brown  6 месяцев назад +12

      I'm glad you enjoyed. In case some how you haven't already come across them, I'd recommend the videos Andrej Karpathy does on coding up a GPT. In general, anything he makes is gold.

  • @Jackson_Zheng
    @Jackson_Zheng 6 месяцев назад +13

    YOU DID IT!
    I emailed you about this video idea about 8 months ago and I've been patiently waiting for you to release this since!

    • @enpassant-d3y
      @enpassant-d3y 6 месяцев назад +1

      wow, great idea!

    • @melihozcan8676
      @melihozcan8676 6 месяцев назад +4

      YOU DID IT JACKSON! I texted you to email him this idea about 9 months ago. Now the bab- video is there!

  • @minds_and_molecules
    @minds_and_molecules 6 месяцев назад +6

    The different sampling has to do with the search algorithm, like beam search, or any search involving topk or some tally of probabilities for the final score of the output. Any temperature will not change that the most probable token is the most probable token, so in a greedy search the temperature does not affect the output. This is a very common misconception, I'm a bit disappointed that it was slightly misleading here.

    • @alfredwindslow1894
      @alfredwindslow1894 6 месяцев назад +1

      agree, what he said wasn’t logically complete and didn’t really make sense because of it

    • @minds_and_molecules
      @minds_and_molecules 6 месяцев назад +1

      To be clear, rest of the video was great!

  • @matty-oz6yd
    @matty-oz6yd 3 месяца назад +1

    I highly recommend transcribing the video to ask questions to afterwards with a long context model like claude. Ask it to test you or let your curiosity drive your exploration.

  • @krupeshparmar5858
    @krupeshparmar5858 5 месяцев назад +1

    if there are about 170,000 words currently being used in English language, how there are only 50,257 tokens being used here? not to mention the tokens generated from the other languages.

  • @dhruvshah3909
    @dhruvshah3909 6 месяцев назад +5

    I started my deep learning journey from your original videos on deep learning. They inspired me to work in this field. I am about to start my first internship as a researcher in this field. Thank you 3blue1brown for this.

    • @dhruvshah3909
      @dhruvshah3909 6 месяцев назад +3

      Also this is the best video that I have seen through my many hundred videos from when I was stuck in tutorial hell on many of these concepts

    •  6 месяцев назад

      Just in time to be replaced by them >:).

  • @pradachan
    @pradachan 3 месяца назад +1

    i like the fact that you are also a vegeterian, thank you for all the efforts you have made in making these type of videos. love you 💖💖

  • @sfulibarri
    @sfulibarri 6 месяцев назад +1

    I've been using copilot and chat gpt while programming at work for almost a year now. It seemed truly magical and transformative at first but over time that faded away and it became clearer that its really just autocomplete on steroids. Which is useful for sure and a real time saver for certain kinds of work but this video confirmed for me the impression I've had that this is not *the* ai revolution that many people (typically people running ai startups) claim it is. These models will certainly impact society and the economy in meaningful ways but they are fundamentally not intelligent in any sense. They are not even on the path to AGI, they simply cannot be by their very nature.

  • @americankid7782
    @americankid7782 5 месяцев назад +2

    The idea that Chat GPT is just using probability to figure out what a Helpful Chatbot would say to a response is quite funny to me.

  • @raymondannas4496
    @raymondannas4496 2 месяца назад +1

    My wife uses chat gpt. I am a mountain man. She told me about it, I asked for a plot line and script for the continuation of the X Files, it delivered.

  • @michaelhughes6634
    @michaelhughes6634 5 месяцев назад +1

    How do AI deal with uncertainty for example in softmax e^1 is an irrational number which mean computers have to approximate it do this for billions of calculations and the uncertainty grows fast. How do computers minimise these uncertainties so that between input and output your not get rubbish?

  • @MassDefibrillator
    @MassDefibrillator 5 месяцев назад +1

    one thing to keep in mind, is the kinds of data access and manipulation you are doing to show these directions, cannot be done by the AI. The AI cannot access a specific weight and then perform manipulations on it. At any particular point, the only values the AI can access is one that has thrown all the proceeding steps of weight1*input1+ weight2*input2, without any way to reverse the maths, and access individual weights or input values.
    So, while it might appear to us, that the AI has generated a conceptual class of femaleness, it cannot access this in a general way, like a human can. It can only access this vector by taking in an input that leads to utilising this vector at some point in the process: it's entirely input dependent what it can access. This is what gives AI the kind of input specific tunnel vision that it gets. One nasty version shown, was if you start inputting text in a way characteristic of how a black person in particular area might speak, the model starts outputting racists nonsense. But if you create the same input, without the phrasal differences, no such racist output arises. So it has this vector space of racism, but cannot access it in a general way, like the conceptual systems human have. Instead, it's ability to access this is entirely dependent on what kind of input it gets.

  • @muayyadalsadi
    @muayyadalsadi 6 месяцев назад +1

    23:57 for T I typically use half the sum of the top two logits (called best vs second best). Think of it as normalizing logits by how far the top two are apart.

  • @ReeshavGhosh200
    @ReeshavGhosh200 6 месяцев назад +1

    Algorithm: _takes 2 secs to figure out_
    Implementation:

  • @meeeeeeauuuuuuuu
    @meeeeeeauuuuuuuu 5 месяцев назад +1

    Watch this multiple video times if you are confused. there is an answer somewhere. you just need another iteration just like a Transformer to be able to figure out what is going on.

  • @zhoudan4387
    @zhoudan4387 5 месяцев назад +1

    “Baked into the last vector of the sequence” But why the previous vectors of the sequence have not been discarded during the feed forward process? Maybe they are used again in the decoder stack?

  • @HumanBeing11011
    @HumanBeing11011 4 месяца назад +1

    This guy deserves my university fees. After my death my son will not inherit all my life long earnings.

  • @amineaitsaidi5919
    @amineaitsaidi5919 5 месяцев назад +1

    I watched the video in 2x speed, and it feels something, I feel myself dizzy.

  • @greatestgrasshopper9210
    @greatestgrasshopper9210 28 дней назад +1

    Why do the subtitles have every language but English lol

  • @djayjp
    @djayjp 6 месяцев назад +1

    I can't find the other Chapters on your channel. Could you create a playlist please?