Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy

Просмотров 5 млн

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 18 дек 2024

Комментарии • 2,6 тыс.

@fgfanta Год назад ⁺⁷²⁸²
Imagine being between your job at Tesla and your job at OpenAI, being a tad bored and, just for fun, dropping on RUclips the best introduction to deep-learning and NLP from scratch so far, for free. Amazing people do amazing things even for a hobby.
@crimpers5543 Год назад ⁺¹⁸⁸
he's probably bored at both of those jobs. once people get to high level director positions, they are far removed from the trenches of code. Lots of computer scientists have passion in actually writing and explaining code, not just managing things.
@aaronhpa Год назад ⁺¹¹²
and yet people still say socialism isn't viable when most of the great stuff in the internet was done for free/ without expectation of compensation
@shyvanatop4777 Год назад ⁺¹³²
@@aaronhpa free market developed the skills but sure man
@aaronhpa Год назад ⁺⁵³
@@shyvanatop4777 did it? I think hard work and dedication by all this people did and not the ability of selling it.
@jayakrishnankr7501 Год назад ⁺⁸⁵
@@aaronhpa , it's all about incentives. Why would you ever do anything if you could get anything without effort? In a fictional utopia, socialism might be viable, but human beings don't work like that. For example, this platform came about because of capitalism. I think achieving a balance between the two would be the best. Like building, this platform came from capitalism content here is socialism, maybe something like that.
@8LFrank Год назад ⁺⁴⁶⁸¹
Living in a world where a world-class top guy posts a 2-hour video for free on how to make such cutting-edge stuff. I barely started this tutorial but at first I just wanted to say thank you mate!
@FobosLee Год назад ⁺⁴⁸
Wait. It’s him! I didn’t understand at first. Thought it was random IT RUclipsr
@DavitBarbakadze Год назад ⁺⁷
How did it go?
@Tinjinladakh Год назад ⁺²
hey jake, what should i do before learn programming, is all basic language is same or different. should i learn only python?
@ChrisSmith-lk2vq Год назад ⁺¹
Totally agree!!
@atlantic_love Год назад ⁺¹³
"Cutting edge"? The only cutting will be your job. Think before getting your panties all wet. The only people excited for this crap are investors, employers and failed programmers looking for some sort of edge.
@softwaredevelopmentwiththo9648 Год назад ⁺¹⁹⁸⁴
Thank you for taking the time to create these lectures. I am sure it takes a lot of time and effort to record and cut these. Your effort to level up the the community is greatly appreciated. Thanks Andrej.
@davidananias8239 Год назад ⁺⁸
Emphasis on appreciation.
@photosone7160 Год назад ⁺³
ditto 🙂
@evanacharya4153 Год назад ⁺⁵
Thank you
@MarkTimeMiles Год назад ⁺³
🙏 You're 🙏 a 🙏 mensch 🙏 Andrej 🙏💪
@DennisXiloj Год назад ⁺³
Thank you! for real. You are an awesome person Andrej.
@BAIR68 Год назад ⁺³⁵⁵
I am a college professor and learning GPT from Andrej. Every time I watch this video, I not only I learn the contents, also how to deliver any topic effectively. I would vote him as the "Best AI teacher in RUclips”. Salute to Andrej for his outstanding lectures.
@noadsensehere9195 11 месяцев назад
which university?
@bohaning 10 месяцев назад
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
@ocanehauncanedichieilcane 7 месяцев назад
please don't
@Dom-zy1qy 3 месяца назад
@@bohaningDamn youtube integrated your SaaS into youtube natively
@ShihgianLee Год назад ⁺⁴²
This lecture answers ALL my questions from the 2017 Attention Is All You Need paper. I am alway curious about the code behind Transformer. This lecture quenched my curiosity with a colab to tinker with. Thank you so much for your effort and time in creating the lecture to spread the knowledge!
@JainPuneet Год назад ⁺⁸⁴⁷
Andrej, I cannot comprehend how much effort you have put in making these videos. Humanity is thankful to you for making these publically available and educating us with your wisdom. One thing is to know the stuff and apply it in corp setting and another thing is to use that instead to educate millions for free. This is one of the best kind of charity a CS major can do. Kudos to you and thank you so much for doing this.
@vicyt007 Год назад ⁺⁹
Making this video is super simple for a specialist like him. It’s like creating a Hello World program for a computer scientist.
@JainPuneet Год назад ⁺³⁴
@@vicyt007 I beg to differ. I am from the area and I can imagine how much time he must have spent offline to come up with the right abstraction.
@vicyt007 Год назад ⁺¹
@@JainPuneet I agree that it took him some time to make this video, but I don’t believe it was a tough task.
@hpmv Год назад ⁺¹⁶
@@vicyt007 People who has expertise in an area aren't always good teachers. Being able to show others how it works in an organized, easy-to-understand manner is very tricky. On the surface it looks easy, but if you try doing a video like this yourself, chances are you'll find it much harder than you think.
@vicyt007 Год назад ⁺¹
@@hpmv I Know it was not an easy task but at least he knows what he is saying, it’s just a matter of explaining concepts. He was a teacher for a long time, then it’s his job, that he is doing for free here !
But in my opinion, this video did not target people with 0 knowledge in maths / ML / IA / Python, because in this case you must admit that it is quite hard to understand. But it was watched by nearly 2M people. Those people are not skilled correctly to understand. Briefly, I think that this video targeted skilled people but was watched by anybody. Why not ?
@antopolskiy Год назад ⁺¹⁷⁵
It is difficult to comprehend how lucky we are to have you teaching us. Thank you, Andrej.
@jamesfraser7394 Год назад ⁺⁸⁶⁷
Wow! I knew nothing and now I am enlightened! I actually understand how this AI/ML model works now. As a near 70 year old that just started playing with Python, I am a living example of how effective this lecture is. My humble thanks to Andrej Karpathy for allowing to see into and understand this emerging new world.
@RichardCampbell-l9j Год назад ⁺⁶⁹
Good for you youngster. 75 and will be doing this kind of thing till I drop ... Still run my technology company and doing contract work. Cheers.
@mrcharm767 Год назад ⁺¹⁰
what makes u learn these at age of 70?
@jamesfraser7394 Год назад ⁺²¹
@@mrcharm767 Want to analyze more stocks , the way I would, in a shorter time. ;)
@fawzishafei5565 Год назад ⁺⁶
@@mrcharm767 The sky is the limit.....!
@fmailscammer Год назад ⁺¹¹
I’m always excited to learn new things, hope I’m still learning at 70!
@fslurrehman Год назад ⁺¹⁶³
I knew only python, math and definitions of NN, GA, ML and DNN. In 2 hours, this lecture has not only given me the understanding of GPT model, but also taught me how to read AI papers and turn them into code, how to use pytoch, and tons of AI definitions. This is the best lecture and practical application on AI. Because it not only gives you an idea of DNN, but also give you code directly from research papers and a final product. Looking forward to more lectures like these. Thanks Andrej Karpathy.
@aojiao3662 Год назад ⁺²⁶
Most clear and intuitive and well explained transformer video I've ever seen. Watched it as if it were a tv show and that's how down-to-earth this video is. Shoutout to the man of legend.
@gokublack4832 Год назад ⁺²⁷⁹
Wow! Having the ex-lead of ML at Tesla make tutorials on ML is amazing. Thank you for producing these resources!
@SzTz100 Год назад ⁺⁷
I know, I couldn't believe it.
@VultureGamerPL Год назад ⁺²⁰
Can you believe it? God bless this man and I'm not even religious!
@cane870 Год назад ⁺⁵
@@VultureGamerPL cringe
@lookupverazhou8599 Год назад ⁺¹³
@@cane870 Cope.
@learnomics 7 месяцев назад ⁺²
@@VultureGamerPL No only ex-lead of ML at Tesla. He is also cofounder of OpenAi
@meghanaiitb Год назад ⁺⁶⁹
What a feeling ! Just finished sitting on this for the weekend, building along and finally understanding Transformers. More than anything, a sense of fulfilment. Thanks Andrej.
@rafaelsouza4575 Год назад ⁺²²³
I was always scared of Transformer's diagram. Honestly, I never understood how such schema could make sense until this day when Andrej enlightened us with his super teaching power. Thank you so much! Andrej, please save the day again by doing one more class about Stable Diffusion!! Please, you are the best!
@bohaning 10 месяцев назад
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
@DavidAttwater Год назад ⁺³⁸
I cannot thank you enough for this material. I've been a spoken language technologist for 20 years and this plus your micro-grad and make more videos has given me a graduate level update in less than 10 hours. Astonishingly well-prepared and presented material. Thank you.
@themenon 5 месяцев назад ⁺⁵
Thanks for this well explained and wonderful series! Hope you will cover qunatization for people with low power GPU.
@NicholasRenotte Год назад ⁺¹⁰⁴
This is AMAZING! You're an absolute legend for sharing your knowledge so freely like this Andrej! I'm finally getting some time to get into transformer architectures this is a brilliant deep dive, going to spend the weekend walking through it!! Thank you🙏🏽
@varunahlawat9013 Год назад ⁺¹
Waiting for your take on this too!
@eliotharreau7627 Год назад ⁺¹
Hi Nicholas , I dont understand all this code . I just have one question is it working ?? And is it like ChatGPT ? Thnx Bro.
@kyriakospelekanos6355 Год назад ⁺¹
@@eliotharreau7627 This is a demonstration of HOW chatgpt works
@eliotharreau7627 Год назад
@@kyriakospelekanos6355 I think it is not only how ChatGPT work. But it s a code hoe can do LIKE ChatGPT. That's why I m surprise !!! Thank you anyway.
@satoshinakamoto5710 Год назад
bro can't wait for your video on this!
@yusufsalk1136 Год назад ⁺⁵⁵²
The best notification ever.
@ninadgandhi9040 Год назад ⁺³
Indeed!
@TTTrouble Год назад ⁺⁴
Literally took the words out of my mouth. It’s been a while since I’ve instaclicked and watched a 2hr long video. Very much worth it.
@andrewm4894 Год назад ⁺¹
Ohhhj sheeeeet, clear my schedule!
@Shaunmcdonogh-shaunsurfing Год назад ⁺¹
Absolutely agree
@ChrisSmith-lk2vq Год назад
True!!
@JoseLopez-ox7sq Год назад ⁺¹⁸⁴
This is simply fantastic. I think it would be beneficial for people learning to see the actual process of training, the graphs in W&B and how they can try to train something like this.
@AndrejKarpathy Год назад ⁺¹⁸⁷
makes sense, potentially the next video, this one was already getting into 2 hours so I wrapped things up, would rather not go too much over movie length.
@jdejota1029 Год назад ⁺⁷⁷
@@AndrejKarpathy Please don't bother to be over movie length, I enjoyed every minute of the video. It's the first time I attended a in depth class of what's under the hood of a model.
@nikitaandriievskyi3448 Год назад ⁺²⁶
@@AndrejKarpathy I think people would watch these videos even if they were 10 hours long, so don't worry about making them too long :)
@patpearce8221 Год назад ⁺⁹
@@AndrejKarpathy don't listen to these sycophants. Size matters.
@riochuong105 6 месяцев назад ⁺²
thanks again for the great lecture. I am able to follow line by line and train it on labmda lab with light effort. Hope to buy you a coffee for all this hard work. Off to the next 4hr GPT2 repro 🧠🏋
@RemKim Год назад ⁺³
I suggest watching this video multiple times in order to understand how transformers work. This is by far the best hands on explanation + example.
@amazedsaint Год назад ⁺⁹⁰⁰
All other youtube videos: There is this amazing thing called ChatGPT
Andrej: Hold my beer 🍺
Seriously - we really appreciate your time and effort to create this Andrej. This will do a lot of good for humanity - by making the core concepts accessible to mere mortals.
@syedshoaibshafi4027 Год назад
u can do it more easily using lstm
@zuu2051 Год назад ⁺¹⁶
@@syedshoaibshafi4027 do you really saying that out and loud. dude is still living in 2010 🤣
@kevinremmy5812 Год назад
lit😅
@redsnflr Год назад ⁺²
Mere mortals with at least basic programming and python knowledge, but yes.
@kemalware4912 Год назад ⁺¹
🍺
@rcuzzy Год назад ⁺⁹⁰
Andrej, I know there is probably a million other things you could be working on or efforts you could put your mind towards, but seriously thank you for these videos, they are important, they matter, and are providing many of us with a foundation of which to learn, build. and understand A.I. from and how to develop these models further. Thank you again and please keep doing these
@reinhodl7377 Год назад ⁺⁴
Seriously, Andrej is just so very kind in his way of explaining things. His shakespeare LSTM article way back ("The Unreasonable Effectiveness of Recurrent Neural Networks") was what got me seriously into ML in the first place. And while i've since (professionally) moved to different development work unrelated to ML/AI, this is the exact kind of thing that hooks me back in. Andrej knows people watching this are not idiots and doesn't treat them as such, but at the same time fully understands how opaque even basic AI concepts can be if all you ever really interact with is pre-trained models. There's tons of value in explaining this stuff in such a practical way.
@rw-kb9qv Год назад ⁺⁷
I think this style of teaching is much better than a lecture with powerpoint and whiteboard. This way you can actually see what the code is doing instead of guessing what all the math symbols mean. So thank you very much for this video!
@13thbiosphere Год назад
By 2030 will be the dominant method of learning..... Varsity more efficient..... Any University failing to embrace this method will crumble
@I_am_who_I_am_who_I_am 8 месяцев назад ⁺⁶⁹
I did something like this in 1993. I took a ling text and calculated the probability of one word (i worked with words, not tokens) being after another by parsing the full text.
And I successfully created a single layer perceptron parrot which can spew almost meaningful sentences.
My professors told me I should not pursue the neural network path because it's practically abandoned. I never trusted them. I'm glad to see neural networks' glorious comeback.
Thank you Andrej Karpathy for what you have done for our industry and humanity by popularizing this.
@mlock1000 10 месяцев назад ⁺⁶
I only just noticed that this is set up in a perfect 2 column layout so a person can have the script/notebook they are working on side by side with yours and not have to jump around at all. And it's clean and clutter free. That is some classy action, my deepest respect and gratitude.
@Milark 7 месяцев назад
now that's a level of detail I hadn't noticed
@vaibhavgupta006 2 месяца назад
what is that? I didnt get it
@miladaghajohari2308 Год назад ⁺⁴
These videos are awesome. It has been 3 years that I am doing DL research but the way you explain things is so pleasing that I sit through the whole 2 hours. Kudos to you Andrej.
@TheWayfarer1 Год назад ⁺¹⁴
"Andrej , your willingness to share your knowledge and insights on RUclips is truly inspiring. Your passion for teaching and helping others understand complex concepts is evident in your videos, and it's clear that you have a drive to make a positive impact in the field of AI. Keep up the amazing work, and thank you for making this knowledge accessible to all!" ps this comment was generated using GPT
@rockapedra1130 Год назад ⁺⁹
This is fantastic. I am amazed that Andrej takes so much of his time to impart this incredibly valuable knowledge for free to all and sundry. He is not only a top researcher but also a fantastic communicator. We have gotten used to big corporations hoarding knowledge and talent to become exploitative monopolies but every so often, humanity puts forth a gem like Mr. Karpathy to keep us all from going head first into the gutter. Thank you!!!
@ajitkirpekar4251 2 месяца назад
I have read this paper and its variants so many times over and yet this is BY FAR the best most comprehensive tutorial on it I have ever experienced. I applaud Andrej for really nailing all of the different components in a very structured way such that it doesn't overwhelm like it did/does for most people who pound their head at it. I will be recommending this video to anyone and everyone - not just practitioners of NLP, ML, or data science.
@Grey_197 Год назад ⁺¹²
Broke my back just to finish this video in single sitting. Its a lot to take at once, i think I'll have to implement it bit by bit in a span of day to actually assimilate everything.
I am very happy from the lecture/tutorial, waiting for more. Time and effort in making this video possible is highly admirable and respectable.
Thank you Andrej.
@coemgeincraobhach236 Год назад ⁺¹¹
Day 2 of implementing this down, about one more evening to go I think. Thanks so much for this! I spent so long down the rabbit hole of CNNs that its really refreshing to try a completely different type of model. No way I could have done it without a lecture of this quality! Legend
@WannabeALU Год назад ⁺⁴⁶
I don't have words to describe how grateful I am to you and the work you are doing. Thank you!
@klauszinser Год назад ⁺³
The world has got a very good teacher back. Very appreciated.
@RKELERekhaye Год назад
Fantastic video Andre, your the best and so nice.😊
@chung-shienwang6248 Год назад ⁺¹²
Can't be more grateful. We're literally living in the best of times because of you! Thank you so much
@iantaggart3064 11 месяцев назад ⁺²
The first ten minutes alone taught me more than a quick google search could. You're good at this.
@PrakharSrivastav Год назад ⁺¹⁰
Truly phenomenal to live in an age where we can learn all this for free from experts like you. Thank you so much Andrej for your contribution. What a gift you have given.
@lkothari Год назад ⁺⁷
This was incredible Andrej! Really appreciate how you intersperse teaching a concept with coding and building step-by-step. This is the first of your videos that I have watched and I can't wait to watch all the others.
@aureliencobb199 Год назад ⁺⁸
Giving us these lectures for free. I do not know how to thank you. Great job explaining to us NN so clearly.
@kunalll24 11 дней назад ⁺⁴³
anyone from Harkirat Singh's video?
@Superiox0 11 дней назад ⁺²
Yeh.🙋
@Pratiyank 10 дней назад ⁺¹
Yes
@AdityaNarayan-h2d 9 дней назад ⁺¹
Yeh
@Paradox_1 9 дней назад ⁺¹
Yes its me
@Chanu_OG 9 дней назад ⁺¹
😅😅
@clamr6122 8 месяцев назад ⁺¹
I've watched a lot of explanations of Transformers and this is easily the best. You are a gifted teacher.
@zechordlord Год назад ⁺¹²
Thanks so much for making this! I could grasp about 80% of everything with my programming/little bit of university-level machine learning background, but it does not feel like magic anymore. This format of hands-on coding along with the thought process behind it is way better than reading a paper and trying to piece things together.
@lipingxiong1376 Год назад ⁺²⁷
Thank you so much for creating such valuable content. A few years ago, I watched your 2016 Stanford computer vision course, which was instrumental in helping me understand backpropagation and other important neural network concepts. Andrew Ng's courses initially led me into the world of machine learning, but I find your videos to be equally educational, focused on fundamental concepts, and presented in a very accessible way. I've also been following your blog and was thrilled to learn about your new RUclips channel. Your dedication to creating these resources is truly appreciated.
Growing up in rural China, I didn't have many opportunities to learn outside of textbooks. But now, thanks to people like you, I find myself swimming in a sea of knowledge. Thank you for making such a significant impact on my learning journey.
BTW, I edited this with chatGPT to make me sounds more like a native speaker. :)
@eva__4380 Год назад ⁺³
Similar experience here . I too watched Stanfords computer vision and Nlp and a few other courses a while back. I also did lectures of linear algebra,calc, probability stats etc from mit ocw to have a strong grasp of the fundamentals . Without RUclips it wouldn't be possible for me to have access to such high quality education
@raghulponnusamy9034 Год назад
can you please share me that link @eva__4380
@bohaning 10 месяцев назад
Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!
@khalobert1588 Год назад ⁺⁴¹
I think this man is a singularity, because the world has not seen such a combination of talent and good character. Thanks mate 🙏
@Charless_Martel Год назад
slime
@tamilselvan9942 Год назад ⁺³
This is "insane amount of knowledge packed in a video of 2 hours". Hats Off Man!!
@sifar__ 3 месяца назад
The calmness with which you deliver these complex topics makes the thing more intuitive and easy to understand, kudos to to you Andrej!
@Marius12358 Год назад ⁺²⁴
I'm enjoying this whole series so much Andrej. They make me understand neural networks much better then anything so far in my Bachelor. As an older student that has a large incentive to be time efficient, this has been a gold send. Thank you so much!! :D
@artukikemty Год назад ⁺¹⁵
Amazing, watching these videos I can still believe in human kind, seeing a guy like Andrej sharing his knowledge and his time with the rest of the world is something that we do not see every day. Thanks for posting it!
@jwalk121 Год назад
He's a very good teacher, but there are still islands
@aistamp Год назад ⁺⁵
Welcome to RUclips in 2023 where one of the top AI researchers is just casually making videos explaining in detail how to build some of the best ML models. Seriously though, these videos are amazing!
@BlckshpWll 6 месяцев назад
I gonna say this is the best Transformer Tutorial in the world, not one of. Easy and straightforward to understand with detailed step by step explanations even for people with very limited contexts of ML. This video is such a treasury
@rangilanaoermajhi1820 Год назад ⁺¹⁴
Just gone through all of his videos - MLP, Gradients and of course the backprop :), and finally finishing with the transformer model (decoder part). As we all know Andrej is the hero of deep learning and we are very much blessed to get this much of rich contents for free in RUclips, also from a teacher like him. Fascinating staff from a fascinating contributor in the field of AI 🙏
@juxyper Год назад ⁺⁶
I have some experience in understanding the maths behind all this stuff but I kind of had problems with advancing to creating and training models, these videos are a godsend. Big thanks
@thegrumpydeveloper Год назад ⁺⁷
So happy to see Andrej back teaching more. His articles before Tesla were so illuminating and distilled complicated concepts into things we could all learn from. A true art. Amazing to see videos too.
@SchultzC Год назад ⁺⁴
From CS231n and RL Pong to this… there is something special about the way you beak down and explain things. I have benefited immensely from it and I’m obviously not the only one. Thank You!
@calincraiu3761 5 месяцев назад
I am speechless - your talent for communicating complex ideas in detail in an understandable way, and re-explaining them with metaphors, is incredible. I feel like I'm genuinely learning more watching these videos and coding along than I ever had in uni. Thank you so much for creating this content!
@priyasengar9683 3 месяца назад
I started this gem in the morning which is one of the best things I did today. Huge respect to you Andrej as there are very few people like you who provide such a valuable content to the people all around the world for free. Thank you very much
@NPT95 Год назад ⁺⁹³
Wow. I thought you gonna use the Transformer library but you essentially build the entire transformer architecture from scratch. Well done!!
@gokulakrishnanr8414 9 месяцев назад
Thanks! Yeah, it was a fun challenge building the Transformer from scratch. Glad you're enjoying the video!
@alankarmisra Год назад ⁺⁴
I found an intuitive explanation for Query/Key/Value in Batool Haider's video which said that Q x K.T / |Q||K| is basically computing the cosine similarity between Q and K which is higher if the vectors are pointing in the same or similar direction which is what yields the "affinity". And this Q, K.T product becomes a mask to the V to see what V we should focus on which is why Q x K.T x V yielding high values for correct predictions becomes the target for the neural network and it learns to do just that. And because it pushes the vectors (indirectly) for C towards similarity, strongly connected items land up "closer" together in the embed space. If this intuition is incorrect, I'm happy to hear how so I can learn.
@curatorsshelf393 Год назад ⁺⁴
Andrej, Thank you so much for sharing your knowledge and expertise. I've been following your video series and it has been truly amazing. I remember you were saying in one of the interviews that to prepare 1hour video, it takes more than 10hrs. I cannot thank you enough for what you are doing!
@wsss6088 Год назад
Thank you Andrej! I'm an engineer in automous driving industry. Learned DL from your cs231n assignments in 2017, the best computer science course (online and offline) I've taken. I really admire your work at Tesla. From LKA -> NoA -> FSD beta in 3 years, pioneering the whole industry in engineering and ideology （software2.0, BEV, one model). And again this video is the best Transformer intro course on youtube. I feel very fortunate to have followed you in this AI era.
@sampsonleo7475 7 месяцев назад
This is truly a step-by-step tutorial for building a Transformer system. So impressive by the way you teach! Very clear and very easy to follow. You are a highly talented educator!
@eggcorn0x Год назад ⁺⁵
Thank you for taking the time and effort to share this, Andrej! This is of great help to lift the veil of abstractions that made it all seem inaccessible and opening up that world to ML/AI uninitiated like me. I don’t understand all of it yet but I’m now oriented and you’ve given me a lot of threads I can pull on.
@karanacharya18 Год назад ⁺⁴
Absolutely amazing lecture. Thank you so much Andrej! I finally understand Attention and Transformers. "Code is the ultimate truth". And the way to set the stage and explain the concepts and the code is brilliant.
@8eck Год назад ⁺⁵
Reward model and reinforcement learning using that reward model would be super cool to learn. Thank you for the current lecture!
@vibhorgoel8355 10 месяцев назад
We are grateful that talented people like you believe in teaching and helping! This is an amazing video. Clear, precise, brings out a tough topic to a layperson! So much to learn on how to make technical videos.
@christianhetling3793 Год назад ⁺⁴
Hey Andrej i greatly appreciate you making these videos. Next semester i am taking the course Machine learning for nlp. I think these kinds of implementation videos are incredible for learning a subject deeply
@GPTBot1123 Год назад ⁺⁴⁵
Ive watched this 3 times and I only understand about 80% of it 😂--a testament to how great Adrej is at explaining these models. I'm not a programmer by trade, so a lot of this is totally foreign to me.
@TheNewton 7 месяцев назад
Yeah, there some good explanation in this video nd build up but some of it gets really dense really quickly it goes back to feeling like reading an inscrutable math research paper.
@ProductivityMo Год назад ⁺²³
Thank you Andrej! I can't imagine the amount of time and effort it took to put this 2 hour video together! Very very educational in breaking down how GPT is constructed. Would love to see a follow-up on tuning the model to answer questions on small scale!
@ComPuPur 6 месяцев назад ⁺¹
Grateful for the times we are living in and the easy access to information that we can enjoy. Thanks for sharing your knowledge, much appreciated!
@dikshantgupta22 8 месяцев назад ⁺¹
Amazing work. I was struggling to understand transformers at both theoretical and practical level but thanks to the brilliant lecture, I struggle no more.
@judahb3ar Год назад ⁺⁴
The students at Stanford who had Andrej as a professor are incredibly lucky; he’s an excellent teacher, breaking down complex topics with high precision and fluidity.
@1gogo76 Год назад ⁺⁴
Andrej is pure genius wrapped in a humble person 🙌
@IllIl Год назад ⁺⁷
Dude, thank you so much for this. It was a seriously awesome dive into the implementation with great explanations along the way. I've read/watched a lot of ML content and this has got to be one of the clearest lectures I've come across - even better than the usual famous online uni lectures. Thank you! (And I'll be rewatching it too! :)
@avishakeadhikary 7 месяцев назад ⁺²
Its amazing how Andrej is such a polite guy. Thanks for sharing this amazing content. :)
@GigaFro 6 месяцев назад
Thoroughly enjoyed this Andrej! :) Keep on teaching... Amazing to imagine that 4.3 Million people watched it. Your impact is huge!
@haleemaramzan Год назад ⁺⁴
I built this same thing alongside watching the lecture, and loved it! I'm trying to get better at understanding and coding these concepts, and this was extremely helpful. Thank you so much :)
@MihaiNicaMath Год назад ⁺⁹
Just finished watching this (at 2x speed). I love how hands on this is...every other tutorial I have seen always has a step where they say "its roughly like this...." but this one really shows you what is actually needed to make it work. Looking forward to trying this on some fun problems!
@JamesBradyGames Год назад ⁺⁵⁸
What a wonderful gift to the world. Amazing tutorial. Again. Thank you!
@AlexanderEgeler Год назад ⁺²
James! So funny to see your comment here :-) Hope all is well ...
@JamesBradyGames Год назад ⁺¹
@@AlexanderEgeler small world! 🙂
@ArunKumar-iz8bi Год назад ⁺¹²
Thanks a lot Andrej for making such good videos that explain core concepts of neural nets. It would be really helpful if you could make a tutorial/video on the entire workflow and the structured thought process you would follow to train a neural network end to end( to arrive at the final model to be used for production). I mean given a problem statement, how would you train a neural network to solve it , how do you design the experiments to choose the right set of hyperparameters and so on. A hands on tutorial video which would demonstrate this process would definitely help a lot of practitioners trying to use neural networks to solve interesting problems
@RogerBarraud Год назад ⁺⁴
To my great surprise I understood most of this at at least a conceptual level.
[Probably helps that I watched Stanford EE263 and MIT Gilbert Strang Linear Algebra videos already 🙂]
Thanks very much for this, Andrej!
@PeterKim-no8cr Год назад ⁺⁴
Your tutorial finally made me understand what self-attention is. Amazing tutorial and thank you for making these videos!
Just as a suggestion, using C as the channel dimension can be confusing to follow when cross-referencing the pytorch documentation on the cross entropy function. There, C is used as the # of classes. As I was reimplementing your bigram model on my custom dataset with a vocab size larger than the channel size, I ran into IndexErrors. It's worth re-emphasizing that the last dimension expected by the cross entropy function is not the channel size but the # of classes we're trying to predict i.e. the vocab size.
@sr3090 Год назад ⁺¹
Thank you Andrej for this wonderful session. I a tech enthusiast and wanted to understand how GPT works and came across your video. I have always found the research papers difficult to comprehend and never understood how they actually get implemented. Your video completely changed that. You are such a good teacher and make things so easy to understand. Your fan club just got a new member!! :)
@petervogt8309 Год назад ⁺¹
Nothing new in this comment. Just want to say 'thank you!' for this amazing tutorial, ...and all the others! The completeness, the information density and pace, the choice of examples and language.... Everything is *just right* , delivered right from the heart and the mind!! Thank you so much Andrej, for taking your time to educate and inspire all of us.
@linkin543210 Год назад ⁺⁴
andrej is single handedly putting the open in openai
@Rydn Год назад ⁺⁹
1:41:03 Just for reference. This training took 3 hours, 5 minutes on an 2020 M1 Macbook Air. You can use the "mps" device instead of cuda or cpu.
Is not great, but is not that bad y you are just trying stuff out.
Thank you for your great videos!
@timhold2016 2 месяца назад
I have m2 pro, cpu was much faster for me
@fooger Год назад ⁺¹⁵
As always fantastic video and sharing... Would be really cool if you would have a part II on this and how we could use PPO/RL to do the fine-tuning part of some basic interactive flow. doesn't have to be like ChatGPT (Q/A). Thank you so much Andrej for such amazing video
!
@kleber_sampaio 2 месяца назад
Dear Andrej, thank you very much for preparing this material for us! It is certainly a valuable asset. I was able to clarify many of the questions I had regarding the architecture of LLMs just by looking at your code, as you explained.
@abhisekpanigrahi-qx3dg 7 месяцев назад
The explanation of such difficult concepts is so simple! You deserve a lot of attention to your channel.
@carykh Год назад ⁺¹⁰
Thanks for posting this lesson so freely on the internet, Andrej!
Man, all this AI educational content on RUclips recently makes me want to get back into doing AI experiments
@midnightwa4261 Год назад
Well i have watched all your video's... i think its time for more 😆
@JohnVanderbeck Год назад ⁺⁵
ChatGPT feels like more than just a large language model to me. It seems to , or at least projects, an understanding of concepts that I wouldn't expect a pure language model to have.
@ankile Год назад ⁺⁴
It would be incredibly cool to see a very simple implementation of the second fine-tuning phase! Good lessons in RL to be had for sure :)
@Mercedes-Scott 9 месяцев назад
People like you continue to encourage many of us that everything is not about money, considering how much this beautiful lecture would have cost somewhere else! Beautiful!
@bensphysique6633 5 месяцев назад
This was the most amazing programming lecture I have ever watched. Usually I listen to 1.5-2.0x, but now I must have put it to 1x speed, and probably have to rewatch it multiple times. Thank you so much for it, Andrej. Awesome work!
@Gaurav-pq2ug 4 месяца назад ⁺⁹
- 00:00 🤖 ChatGPT is a system that allows interaction with an AI for text-based tasks.
- 02:18 🧠 The Transformer neural network from the "Attention is All You Need" paper is the basis for ChatGPT.
- 05:46 📊 NanoGPT is a repository for training Transformers on text data.
- 07:23 🏗 Building a Transformer-based language model with NanoGPT starts with character-level training on a dataset.
- 10:11 💡 Tokenizing involves converting raw text to sequences of integers, with different methods like character-level or subword tokenizers.
- 13:36 📏 Training a Transformer involves working with chunks of data, not the entire dataset, to predict sequences.
- 18:43 ⏩ Transformers process multiple text chunks independently as batches for efficiency in training.
- 22:59 🧠 Explaining the creation of a token embedding table.
- 24:09 🎯 Predicting the next character based on individual token identity.
- 25:19 💡 Using negative log likelihood loss (cross entropy) to measure prediction quality.
- 26:44 🔄 Reshaping logits for appropriate input to cross entropy function.
- 28:22 💻 Training the model using the optimizer Adam with a larger batch size.
- 31:21 🏗 Generating tokens from the model by sampling via softmax probabilities.
- 34:38 🛠 Training loop includes evaluation of loss and parameter updates.
- 41:23 📉 Using `torch.no_grad()` for efficient memory usage during evaluation.
- 45:59 🧮 Tokens are averaged out to create a one-dimensional vector for efficient processing
- 47:22 🔢 Matrix multiplication can efficiently perform aggregations instead of averages
- 50:27 🔀 Manipulating elements in a multiplying matrix allows for incremental averaging based on 'ones' and 'zeros'
- 54:51 🔄 Introduction of softmax helps in setting interaction strengths and affinities between tokens
- 58:27 🧠 Weighted aggregation of past elements using matrix multiplication aids in self-attention block development
- 01:02:07 🔂 Self-attention involves emitting query and key vectors to determine token affinities and weighted aggregations
- 01:05:13 🎭 Implementing a single head of self-attention involves computing queries and keys and performing dot products for weighted aggregations.
- 01:10:10 🧠 Self-attention mechanism aggregates information using key, query, and value vectors.
- 01:11:46 🛠 Attention is a communication mechanism between nodes in a directed graph.
- 01:12:56 🔍 Attention operates over a set of vectors without positional information, requiring external encoding.
- 01:13:53 💬 Attention mechanisms facilitate data-dependent weighted sum aggregation.
- 01:15:46 🤝 Self-attention involves keys, queries, and values from the same source, while cross-attention brings in external sources.
- 01:17:50 🧮 Scaling the attention values is crucial for network optimization by controlling variance.
- 01:21:27 💡 Implementing multi-head attention involves running self-attention in parallel and concatenating results for improved communication channels.
- 01:26:36 ⚙ Integrating communication and computation in Transformer blocks enhances network performance.
- 01:28:29 🔄 Residual connections aid in optimizing deep networks by facilitating gradient flow and easier training.
- 01:32:16 🧠 Adjusting Channel sizes in the feed forward network can affect validation loss and lead to potential overfitting.
- 01:32:58 🔧 Layer Norm in deep neural networks helps optimize performance, similar to batch normalization but normalizes rows instead of columns.
- 01:35:19 📐 Implementing Layer Norm in a Transformer involves reshuffling layer norms in pre-norm formulation for better results.
- 01:37:12 📈 Scaling up a neural network model by adjusting hyperparameters like batch size, block size, and learning rate can greatly improve validation loss.
- 01:39:30 🔒 Using Dropout as a regularization technique helps prevent overfitting when scaling up models significantly.
- 01:51:21 🌐 ChatGPT undergoes pre-training on internet data followed by fine-tuning to become a question-answering assistant by aligning model responses with human preferences.
@BeaternPlays 2 месяца назад
The hero we need, but do not deserve. Thank you.
@neilpatel7156 Год назад ⁺¹¹
Consider pairing this lecture with Andrej's more recent lecture from Stanford's CS25 course on transformers. For those of you who got to this video through the makemore series, this lecture fills in some gaps in some ways when jumping from makemore to transformers, covering the what and why of the attention and transformer architecture:
ruclips.net/video/XfpMkf4rD6E/видео.htmlsi=RuOEaN-VBGCI96pm.
Thank you Andrej for an incredible series of videos! As a fellow computer scientist (with limited exposure to AI), this series has brought me back to grad school/undregrad re-discovering the passion and joy of learning, thinking about concepts in off time, and getting hands on with the exercises. I've been recommending these to my friends and team.
@Itz_ashishyadav 10 дней назад ⁺⁹
who came here after Harkirat video
@asatorftw 10 месяцев назад ⁺¹
Andrej you are a blessing for all people who want to learn more about AI. Hands down the best explanations, clear instructions and general genuinity. Hope I can thank you in person one day!
@slayermm1122 3 месяца назад ⁺¹
I watched 10 times and build my very first llm, thanks andrej for the kind sharing.
@LibriDaSalvare 5 дней назад ⁺⁴
Can someone please tell me how necessary it is to watch Andrej's previous videos on neural networks? (I have never used PyTorch before)
@Treeskrub 11 месяцев назад ⁺³
That's crazy 🐢 That's actually crazy 🐢 That's messed up 🐢
@瀬尾結月瀬尾結月 11 месяцев назад ⁺⁵
vedal-sama bring me here
@joshuan. 11 месяцев назад
Lol I was hoping I'd find these comments here
@yigalkassel8456 8 месяцев назад
It's probably the best RUclips video I've ever seen.
So down to earth explanations yet going really deep to the cutting edge tech of transformers.
WOW
Thank you for that!

Следующие

Автовоспроизведение

[1hr Talk] Intro to Large Language Models