Building makemore Part 5: Building a WaveNet

Andrej Karpathy

Просмотров 195 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 18 дек 2024

Комментарии • 204

@antonclaessonn 2 года назад ⁺⁹⁴
This series is the most interesting resource for DL I've come across, being a junior ML engineer myself. To be able to watch such a knowledgeable domain expert as Andrej explaining everything in the most understandable ways is a real privilege. A million thanks for you time and effort, looking forward to the next one and hopefully many more.
@khisrowhashimi 2 года назад ⁺²²⁴
I love how we are all so stressed and worried that Andrej might grow apathetic to his RUclips channel, so everyone wants to be extra supportive 😆 Really shows how awesome of a communicator he is.
@TL-fe9si Год назад ⁺²
I'm literally thinking about it when I saw this comment
@jordankuzmanovik5297 Год назад ⁺⁵
Unfortunately he did it :(
@isaac10231 Год назад
@@jordankuzmanovik5297Hopefully he comes back.
@Тима-щ2ю Месяц назад
haha, so true!
@GlennGasner Год назад ⁺²⁴
I really, really appreciate you putting in the work to create these lectures. I hope you can really feel the weight of the nearly hundred thousand humans who pushed through 12 hours of lectures on this because you've made it accessible. And that's just through now. These videos are such an incredible gift. Half of the views are me because I needed to watch each so many times in order to understand what's happening, because I started from so little. Also, it's super weird how different you are from other RUclipsrs and yet how likable you become as a human during this series. You are doing this right, and I appreciate it.
@crayc3 2 года назад ⁺³⁷
Notification for a new andrej video guide feels like a new season of game of thrones just dropped at this point.
@nervoushero1391 2 года назад ⁺¹¹⁴
As a independent deep learning undergrad student ur videos helps me a lot. Thank u andrej Never stop this series.
@anrilombard1121 2 года назад ⁺⁸
We're on the same road!
@tanguyrenaudie1261 Год назад ⁺²
Love the series as well ! Coding through all of it. Would love to get together with people to replicate deep learning papers, like Andrej does here, to learn faster and not by myself.
@raghavravishankar6262 Год назад ⁺¹
@@tanguyrenaudie1261 I'm in the same boat as well do you have a discord or something where we can talk further?
@raghavravishankar6262 Год назад
@Anri Lombard @ Nervous Hero
@Katatonya 8 месяцев назад
@@raghavravishankar6262Andrej does have a server, we could meet there and then start our own. My handle is vady. (with a dot) if anyone wants to add me, or ping me in Andrej's server.
@timelapseguys4042 2 года назад ⁺²¹
Andrej, thanks a lot for the video! Please do not stop continuing the series. It's an honor to learn from you.
@RishikeshS-nv7ol 7 месяцев назад ⁺¹²
Please don't stop making these videos, these are gold !
@Jmelly11 2 месяца назад ⁺¹
I rarely comment on videos, but thank you so much for this series, Andrej. I love learning, and learn many things that interest me. The reason I say that is that I have experienced a lot tutors/educators over time. And for what it's worth, I'd like you to know you're truly gifted when it comes to your understand of AI development and communicating that understanding.
@takeiteasydragon 13 дней назад
He even switched equipment after realizing the recording quality wasn’t great. How can someone be so talented and still so humble and thoughtful? It’s just incredible!
@rajeshparekh 11 месяцев назад ⁺¹
Thank you so much for creating this video lecture series. Your passion for this topic comes through so vividly in your lectures. I learned so much from every lecture and especially appreciated how the lectures started from the foundational concepts and built up to the state-of-the art techniques. Thank you!
@maestbobo Год назад ⁺¹⁰
Best resource by far for this content. Please keep making more of these; I feel I'm learning a huge amount from each video.
@hintzod 2 года назад ⁺⁹
Thank you so much for these videos. I really enjoy these deep dives, things make so much more sense when you're hand coding all the functions and running through examples. It's less of a black box and more intuitive. I hope this comment will encourage you to keep this going!
@mipmap256 Год назад ⁺⁷
Can't wait for part 6! So clear and I can follow step by step. Thanks so much
@Zaphod42Beeblebrox 2 года назад ⁺⁵⁷
I experimented a bit with the MLP with 1 hidden layer and managed to scale it up to your fancy hierarchical model. :)
Here is what i got:
MLP(105k parameters):
block_size = 10
emb_dim = 18
n_hidden = 500
lr = 0.1 # used the same learning rate decay as in the video
epochs = 200000
mini_batch = 32
lambd = 1 ### added L2 regularization
seed is 42
Training error: 1.7801
Dev error: 1.9884
Test error: 1.9863 (I checked this only becouse I was worried that somehow I overfitted the dev set)
Some examples generated from the model that I kinda liked:
Angelise
Fantumrise
Bowin
Xian
Jaydan
@oklm2109 Год назад
What's the formula to calculate the number of parameters of an MLP model?
@amgad_hasan Год назад ⁺²
@@oklm2109 You just add the trainable parameters of every layer.
If the model contains only Fully connected layers (aka linear in pytorch or dense in tf), the number of parameters for each layer is:
n_weights = n_in*n_hidden_units
n_biases = n_hidden units
n_params = n_weights + n_biases = (1+n_input)*(n_hidden_units)
n_in: number of inputs (think of it as the number of outputs(or hidden units) from the last layer.
This formula is valid for Linear layers, other types of layers may have different formula.
@glebzarin2619 Год назад ⁺¹
I'd say that it is slightly not fair not to compare models with different block sizes. Because it not only influences the number of parameters but also the amount of information given as input.
@vhiremath4 6 месяцев назад
@Zaphod42Breeblebrox out of curiosity, what do your losses and examples look like without the L2 regularization?
Also, love the username :P
@vhiremath4 6 месяцев назад ⁺²
^ Update - just tried this architecture locally and got the following without L2 regularization:
train 1.7468901872634888
val 1.9970593452453613
How were you able to validate that there was an overfitting to the training dataset?
Some examples:
arkan.
calani.
ellizee.
coralym.
atrajnie.
ndity.
dina.
jenelle.
lennec.
laleah.
thali.
nell.
drequon.
grayson.
kayton.
sypa.
caila.
jaycee.
kendique.
javion.
@vivekpadman5248 2 года назад ⁺⁶
Absolutely love this series Andrej sir... It not only teaches me stuff but gives me confidence to work even harder to share whatever I know already.. 🧡
@NarendraBME 11 месяцев назад
So far THE BEST lecture series I came across on RUclips. Along side learning the neural networks in this series, I have learned the PyTorch more than learning it by waching a PyTorch video series of 26 hrs from a youtuber.
@ShinShanIV 2 года назад ⁺¹
Thank you so much Andrej for the series, it helps me a lot. You are one of the reasons I was able to get into ML and build a career there. I admire your teaching skills!
I didn't get why the sequence dim has to be part of the batch dimension, and I didn't hear Andrej talk about it explicitly, so here is my reasoning:
The sequence dimension is an additional batch dimension because the output before batch norm is created by a linear layer with (32, 4, 20) @ (20, 68) + (68) which performs the matrix multiplication only with the last dimension (.., .., 20) and in parallel on the first two. So, the matrix multiplication is performed 32 * 4 times with (20) @ (20, 68). Thus, it's the same as having a (128, 20) @ (20, 68) calculation, where (32 * 4) = 128 is the batch dimension. So, the sequence dimension is treated effectively as if it was a "batch dimensions" in the linear layer and must be treated that way in batch norm too.
(would be great if someone could confirm)
@S0meM0thersson 3 месяца назад
Your work ethic and happy personality really move me. Respect to you, Andrej, you are great.🖖
@sakthigeek2458 9 месяцев назад
Learned a lot of practical tips and theoretical knowledge of why we do what we do and also the history of how Deep Learning evolved. Thanks a lot for this series. Requesting you to continue the series.
@willr0073 4 месяца назад
All the makemore lessons have been awesome Andrej! Huge thanks for helping me understand better how this world works.
@stanislawcronberg3271 2 года назад ⁺¹
My favorite way to start a Monday morning is to wake up to a new lecture in Andrej's masterclass :)
@AndrewOrtman Год назад ⁺³
When I did the mean() trick at ~8:50 I left out an audible gasp! That was such a neat trick, going to use that one in the future
@aanchalagarwal6886 Год назад ⁺²
Thank you Andrej for creating this series. It has been very helpful. I just hope you get the time to continue with it.
@brittaruiters6309 2 года назад ⁺⁵
I love this series so much :) it has profoundly deepened my understanding of neural networks and especially backpropagation. Thank you
@aurelienmontmejat1077 2 года назад ⁺²
This is the best deep learning course I've followed! Even better than the one on Coursera. Thanks!
@1knmd Год назад ⁺²
Everytime a new video is out is like christmas for me!, please don't stop doing this, best ML content out there.
@stracci_5698 Год назад ⁺¹
This is truly the best dl content out there. Most courses just focus on the theory but lack deep understanding.
@kshitijbanerjee6927 Год назад ⁺³⁷
Hey Andrej! I hope you continue and give us the RNN, GRU & Transformer lectures as well! The chatGPT one is great, but I feel like we missed the story in the middle, and jumped the story because of ChatGPT
@SupeHero00 Год назад ⁺¹
The ChatGPT lecture is the Transformer lecture.. And regarding RNNs, I don't see why would anyone still use it...
@kshitijbanerjee6927 Год назад ⁺⁶
transformers yes . but it’s not like anyone will build bigrams either, it’s about learning the concepts like BPTT etc from roots
@SupeHero00 Год назад ⁺¹
@kshitijbanerjee6927 Bigrams and MLPs help you understand Transformers (which is the SOA).. Anyway IMO it would be a waste of time creating a lecture on RNNs, but if the majority want it, then maybe he should do it.. I don't care
@kshitijbanerjee6927 Год назад ⁺⁵
Fully disagree that it’s not useful.
I think the concepts of how they came up unrolling and BPTT, the gates used to solve long term memory problems are invaluable to appreciate and understand why transformers are such a big deal.
@attilakun7850 Год назад ⁺¹
@@SupeHero00 RNNs are coming back due to SSMs like Mamba.
@eustin 2 года назад ⁺³
Yes! I've been telling everyone about these videos. I've been checking whether you posted the next video everyday. Thank you.
@newbie6036 11 дней назад ⁺¹
I just completed parts 1 to 5 on neural networks, and I can’t express how much I’ve learned. Huge thanks! Would you consider creating a playlist on Reinforcement Learning as well?
@brianwhite9137 2 года назад
Very grateful for these. An early endearing moment was in the Spelled-Out Intro when you took a moment to find the missing parentheses for 'print.'
@gurujicoder 2 года назад
My best way to learn is to learn from one of the most experienced person in the field. Thanks for everything Andrej
@cktse_jp 10 месяцев назад
Just wanna say thank you for sharing your experience -- love this from-scratch series starting from first principles!
@timowidyanvolta Год назад ⁺³
Please continue, I really like this series. You are an awesome teacher!
@thanikhurshid7403 2 года назад ⁺²
Andrej you are the absolute greatest. Keep making your videos. Anxiously waiting to implement Transformers with you
@WarrenLacefield 2 года назад ⁺¹
Enjoying these video so much. To refresh most of what I've forgotten about Python and to begin playing with pytorch. Last I did this stuff myself was with C# and CNTK. Now going back to rebuild and rerun old models and data (much faster even & "better" results). Thank you.
@art4eigen93 Год назад
Please continue this series Sir Andrje. You are the savior!
@phrasedparasail9685 Месяц назад
These videos are amazing, please never stop making this type of content!
@Leon-yp9yw 2 года назад ⁺³
I was worried I was going to have to wait a couple of months for the next video as I finished part 4 just last week. Can't wait to get into this one, thanks a lot for this series Andrej
@milankordic 2 года назад ⁺³
Was looking forward to this one. Thanks, Andrej!
@bensphysique6633 4 месяца назад
This guy is literally my definition of wholesomeness. Again, Thank you, Andrej!
@panagiotistseles1118 Год назад
Totally amazed by the amount of good work you put in. You've helped a lot of people Andrej. Keep up the good work
@AlienLogic775 Год назад ⁺²
Thanks so much Andrej! Hope to see a Part 6
@ephemer 2 года назад
Thanks so much for this series, I feel like this is the most important skill I might ever learn and it’s never been more accessible than in your lectures. Thank you!
@timandersen8030 Год назад ⁺¹
Thank you, Andrej! Looking forward to the rest of the series!
@uncoded0 2 года назад ⁺²
Thanks again Andrej! Love these videos! Dream come true to watch and learn these!
Thanks for all you do to help people! You're helpfulness ripples throughout the world!
Thanks again! lol
@kimiochang 2 года назад
Finally Completed this one. As always thank you Andrej for your generosity! Next I will practice through all five parts again and learn how to accelerate the training process by using GPUs.
@meisherenow Год назад
How cool is it that anyone with an internet connection has access to such a great teacher? (answer: very)
@ERRORfred2458 Год назад
Andrej, thanks for all you do for us. You're the best.
@yanazarov 2 года назад ⁺¹
Absolutely awesome stuff Andrej. Thank you for doing this.
@thehazarika Год назад ⁺¹
This is philanthropy! I love you man!
@nikitaandriievskyi3448 2 года назад
I just found your youtube channel, and this is just amazing, please do not stop doing these videos, they are incredible
@kemalware4912 Год назад
Deliberate errors on the right spot.. Your lectures are great.
@ayogheswaran9270 Год назад
@Andrej thank you for making this. Please continue making such videos. It really helps beginners like me. If possible, could you please make a series of how actual development and production is done.
@veeramahendranathreddygang1086 2 года назад ⁺²
Thank you Sir. Have been waiting for this.
@flwi 2 года назад ⁺¹
Great series! I really enjoy the progress and good explanations.
@pablofernandez2671 Год назад
Andrej, we all love you. You're amazing!
@SK-ke8nu Месяц назад
Great video again Andrej, keep up the good work and thank you as always!
@enchanted_swiftie Год назад
The sentence that Anderej said at 49:26 made me realize something, something very deep. 🔥
@ThemeParkTeslaCamping360 2 года назад
Incredible video this helps a lot. Thank you for videos, especially I loved your Stanford videos regarding machine learning from scratch and that's how you do it without any libraries like tensorflow and pytorch. Keep going and thank you for helping hungry learners like me!!! Cheers 🥂
@sunderrajan6172 2 года назад
Beautifully explained as always - thanks. It shows how much passion you have to come up with these awesome videos. We all blessed!
@kaushik333ify Год назад ⁺⁴
Thank you so much for these lectures ! Can you please make a video on the “experimental harness” you mention at the end of the video? It would be super helpful and informative.
@kindoblue 2 года назад
Every video another solid pure gold bar
@chineduezeofor2481 6 месяцев назад ⁺¹
Thank you for this beautiful tutorial 🔥
@SupratimSamanta 4 месяца назад ⁺¹
Andrej is literally the bridge between worried senior engineers and the world of gen ai.
@4mb127 2 года назад
Thanks for continuing this fantastic series.
@creatureOfnature1 2 года назад
Much appreciated, Andrej. Your tutorials are gem!
@Anfera236 2 года назад ⁺¹
Great content, Andrej! Keep them coming!
@EsdrasSoutoCosta 2 года назад
Awesome! Well explained and clear what's being done. Please keep doing this fantastic videos!!!
@vivekpandit7417 2 года назад ⁺¹
Been waiting for awhile. Thankyouuu !!
@Abhishekkumar-qj6hb Год назад
So I ended up this lecture series and I was expecting RNN/LSTM/GRU but was not there however throughout learnt a lot that can definitely on my own. Thanks Andrej
@kaenovama Год назад
Thank you! Love the series! Helped me a lot with my learning experience with PyTorch
@Leo-sy4vu 2 года назад
Thank you soo much for the series i recently started it and its the best thing on the entire youtube. keep it up
@davidespinosa1910 3 месяца назад
At 38:00, it sounds like we compared two architectures, both with 22k parameters and an 8 character window:
* 1 layer, full connectivity
* 3 layers, tree-like connectivity
In a single layer, full connectivity outperforms partial connectivity.
But partial connectivity uses fewer parameters, so we can afford to build more layers.
@michaelmuller136 11 месяцев назад
That was a very great playlist, easy to understand and very helpfull, thank you very much!!
@wolpumba4099 7 месяцев назад
*Abstract*
This video continues the "makemore" series, focusing on improving the character-level language model by transitioning from a simple multi-layer perceptron (MLP) to a deeper, tree-like architecture inspired by WaveNet. The video delves into the implementation details, discussing PyTorch modules, containers, and debugging challenges encountered along the way. A key focus is understanding how to progressively fuse information from input characters to predict the next character in a sequence. While the video doesn't implement the exact WaveNet architecture with dilated causal convolutions, it lays the groundwork for future explorations in that direction. Additionally, the video provides insights into the typical development process of building deep neural networks, including reading documentation, managing tensor shapes, and using tools like Jupyter notebooks and VS Code.
*Summary*
*Starter Code Walkthrough (**1:43**)*
- The starting point is similar to Part 3, with minor modifications.
- Data generation code remains unchanged, providing examples of three characters to predict the fourth.
- Layer modules like Linear, BatchNorm1D, and Tanh are reviewed.
- The video emphasizes the importance of setting BatchNorm layers to training=False during evaluation.
- Loss function visualization is improved by averaging values.
*PyTorchifying Our Code: Layers, Containers, Torch.nn, Fun Bugs (**9:19**)*
- Embedding table and view operations are encapsulated into custom Embedding and Flatten modules.
- A Sequential container is created to organize layers, similar to torch.nn.Sequential.
- The forward pass is simplified using these new modules and container.
- A bug related to BatchNorm in training mode with single-example batches is identified and fixed.
*Overview: WaveNet (**17:12**)*
- The limitations of the current MLP architecture are discussed, particularly the issue of squashing information too quickly.
- The video introduces the WaveNet architecture, which progressively fuses information in a tree-like structure.
- The concept of dilated causal convolutions is briefly mentioned as an implementation detail for efficiency.
*Implementing WaveNet (**19:35**)*
- The dataset block size is increased to 8 to provide more context for predictions.
- The limitations of directly scaling up the context length in the MLP are highlighted.
- A hierarchical model is implemented using FlattenConsecutive layers to group and process characters in pairs.
- The shapes of tensors at each layer are inspected to ensure the network functions as intended.
- A bug in the BatchNorm1D implementation is identified and fixed to correctly handle multi-dimensional inputs.
*Re-training the WaveNet with Bug Fix (**45:25**)*
- The network is retrained with the BatchNorm1D bug fix, resulting in a slight performance improvement.
- The video notes that PyTorch's BatchNorm1D has a different API and behavior compared to the custom implementation.
*Scaling up Our WaveNet (**46:07**)*
- The number of embedding and hidden units are increased, leading to a model with 76,000 parameters.
- Despite longer training times, the validation performance improves to 1.993.
- The need for an experimental harness to efficiently conduct hyperparameter searches is emphasized.
*Experimental Harness (**46:59**)*
- The lack of a proper experimental setup is acknowledged as a limitation of the current approach.
- Potential future topics are discussed, including:
- Implementing dilated causal convolutions
- Exploring residual and skip connections
- Setting up an evaluation harness
- Covering recurrent neural networks and transformers
*Improve on My Loss! How Far Can We Improve a WaveNet on This Data? (**55:27**)*
- The video concludes with a challenge to the viewers to further improve the WaveNet model's performance.
- Suggestions for exploration include:
- Trying different channel allocations
- Experimenting with embedding dimensions
- Comparing the hierarchical network to a large MLP
- Implementing layers from the WaveNet paper
- Tuning initialization and optimization parameters
i summarized the transcript with gemini 1.5 pro
@mellyb.1347 9 месяцев назад ⁺¹
Loved this series. Would you please be willing to continue it so we get to work through the rest of CNN, RNN, and LSTM? Thanks!
@wholenutsanddonuts5741 2 года назад
Fant wait for this next step in the process!
@utkarshsingh1663 2 года назад
Thanks Andrej this course is awesome for base building..
@Joker1531993 2 года назад ⁺¹
I am subscribing Andrej, just to support someone from our country, Slovakia. Even I don't understand nothing from the video >D
@BlockDesignz 2 года назад ⁺¹
Please keep these coming!
5 месяцев назад
Thanks. Very helpful and intuitive.
@adsuabeakufea 11 месяцев назад
great video, been learning a ton from you recently. thank you andrej!
@shouryamann7830 Год назад ⁺³
ive been using this step loss function and I've been consistently getting slight better training and validation losses. for this i got 1.98 val loss.
lr = 0.1 if i < 100000 else (0.01 if i < 150000 else 0.001)
@Erosis 2 года назад ⁺²
Numpy / torch / tf tensor reshaping always feels like handwaivy magic.
@mobkiller111 2 года назад
Thanks for the content & explanations Andrej and have a great time in Kyoto :)
@studiostaufer Год назад ⁺¹
Hi Andrej, thank you for taking the time to create these videos. In this video, for the first time, I'm having difficulties understanding what the model is actually learning. I've watched it twice and tried to understand the WaveNet paper, but that isn't really helping. Given an example input “emma“, the following character is supposed to be “.“, why is it beneficial to create a hidden layer to process “em“, “ma“, and then “emma“? Are we essentially encoding that given a 4 character word, IF the first two characters are “em“ it is likely that the 5th character is “.“, no matter what the third and fourth characters are? In other words, this implementation would probably assign a higher probability that “.“ is the fifth character after an unseen name, e.g. “emli“, simply because it starts with the bigram “em“? Thanks in advance, Dimitri.
@fajarsuharyanto8871 2 года назад
Rarely finish entire episode. He'i Andrej 👌
@sam.rodriguez Год назад ⁺¹
How can we help you keep putting these treasures out Andrej? I think the expected value of helping hundreds of thousands of ML practitioners improve their understanding of the building blocks might be proportional (or even outweigh) the value of your individual contributions at OpenAI. Thats not to say that your technical contributions are not valuable, on the contrary, I'm using their value as a point of comparison because I want to emphasise how amazingly valuable I think your work on education is. A useful analogy would be to ask which ended up having more impact on our progress in the field of physics: Richard Feynman's Lectures that motivated many to pursue science and improved the intuitions of everyone OR his individual contributions to the field?. At the end of the day is not about one or the other but just finding the right balance given the effective impact of each and, of course, your personal enjoyment.
@DanteNoguez 2 года назад
Thanks, Andrej, you're awesome!
@nickgannon7466 2 года назад
You're crushing it, thanks a bunch.
@8eck Год назад
Finally finished all the lectures and i understood that i have a bad math understanding and bad understanding of dimensionality and operations over it. Anyways, thank you for helping out with the rest concepts and practices, i do better understand now of how backprop is working and what it is doing and what for.
@Ali-lm7uw Год назад
Jon Krohn has some a full playlist of algebra and calculus before starting machine learning
@mynameisZhenyaArt_ Год назад ⁺¹
Hi Andrej. Is there going to be RNN, LSTN, GRU video? or maybe even part 2 on the topic of WaveNet with the residual connections?
@venkateshmunagala205 2 года назад
AI Devil is back . Thanks for the video @Andrej Karpathy.
@zz79ya 11 месяцев назад ⁺²
Um, can I find Part 6 somewhere?(RNN, LSTM, GRU..) I was under the impression that the next video in the playlist is about building GPT from skretch.
@Jack-vv7zb 9 месяцев назад
i love it when you say bye and then pop back up 😂😂😂😂
@aga1nstall0dds 4 месяца назад
Thanks for the masterclass!!!!!! .... btw i found u through an interview of geohotz with lex.... i heard u like to teach and they r right about that statement :)
@daniellu8104 9 месяцев назад ⁺¹
Haven't watched this video (yet) but i'm wondering if Andrej discussed WaveNet vs transformer. I know that the WaveNet paper came about around the same time as Attention is All You Need. It seems like both WaveNet and transformers can do sequence prediction/generation, but transformers have taken off. Is that because of transformers' better performance in most problem domains? Does WaveNet still outperform transformers in certain situations?
@IamtheDill 2 года назад ⁺¹⁰
Is there a word to describe watching something completely outside of your league but very much enjoying witnessing masters work?
@niazhimselfangels 2 года назад
Andrejfied!
@hedleyfurio 2 года назад
Erdawardtwertwirk
@kevinaud6461 Год назад ⁺³
Have you followed along from lesson 1 and actually typed the code from the videos yourself? I think this series did an incredible job of starting from the very basics and incrementally building from there
@joekharris Год назад
I'm learning so much. I really appreciate the lucidity and simplicity of your approach. I do have a question. Why not initialize running_mean and running_var to None and then set them on the first batch? That would seem to be a better approach than to start them at zero and would be consistent with making them exponentially weighted moving averages - which they are except for the initialization at 0.0.

Следующие

Автовоспроизведение

Building makemore Part 3: Activations & Gradients, BatchNorm