Building makemore Part 5: Building a WaveNet

Поделиться
HTML-код
  • Опубликовано: 7 ноя 2024

Комментарии • 201

  • @antonclaessonn
    @antonclaessonn Год назад +88

    This series is the most interesting resource for DL I've come across, being a junior ML engineer myself. To be able to watch such a knowledgeable domain expert as Andrej explaining everything in the most understandable ways is a real privilege. A million thanks for you time and effort, looking forward to the next one and hopefully many more.

  • @khisrowhashimi
    @khisrowhashimi Год назад +215

    I love how we are all so stressed and worried that Andrej might grow apathetic to his RUclips channel, so everyone wants to be extra supportive 😆 Really shows how awesome of a communicator he is.

    • @TL-fe9si
      @TL-fe9si Год назад +2

      I'm literally thinking about it when I saw this comment

    • @jordankuzmanovik5297
      @jordankuzmanovik5297 Год назад +4

      Unfortunately he did it :(

    • @isaac10231
      @isaac10231 Год назад

      ​@@jordankuzmanovik5297Hopefully he comes back.

    • @Тима-щ2ю
      @Тима-щ2ю 8 дней назад

      haha, so true!

  • @GlennGasner
    @GlennGasner Год назад +22

    I really, really appreciate you putting in the work to create these lectures. I hope you can really feel the weight of the nearly hundred thousand humans who pushed through 12 hours of lectures on this because you've made it accessible. And that's just through now. These videos are such an incredible gift. Half of the views are me because I needed to watch each so many times in order to understand what's happening, because I started from so little. Also, it's super weird how different you are from other RUclipsrs and yet how likable you become as a human during this series. You are doing this right, and I appreciate it.

  • @nervoushero1391
    @nervoushero1391 Год назад +111

    As a independent deep learning undergrad student ur videos helps me a lot. Thank u andrej Never stop this series.

    • @anrilombard1121
      @anrilombard1121 Год назад +7

      We're on the same road!

    • @tanguyrenaudie1261
      @tanguyrenaudie1261 Год назад +1

      Love the series as well ! Coding through all of it. Would love to get together with people to replicate deep learning papers, like Andrej does here, to learn faster and not by myself.

    • @raghavravishankar6262
      @raghavravishankar6262 Год назад +1

      @@tanguyrenaudie1261 I'm in the same boat as well do you have a discord or something where we can talk further?

    • @raghavravishankar6262
      @raghavravishankar6262 Год назад

      @Anri Lombard @ Nervous Hero

    • @Katatonya
      @Katatonya 7 месяцев назад

      @@raghavravishankar6262Andrej does have a server, we could meet there and then start our own. My handle is vady. (with a dot) if anyone wants to add me, or ping me in Andrej's server.

  • @crayc3
    @crayc3 Год назад +34

    Notification for a new andrej video guide feels like a new season of game of thrones just dropped at this point.

  • @Zaphod42Beeblebrox
    @Zaphod42Beeblebrox Год назад +54

    I experimented a bit with the MLP with 1 hidden layer and managed to scale it up to your fancy hierarchical model. :)
    Here is what i got:
    MLP(105k parameters):
    block_size = 10
    emb_dim = 18
    n_hidden = 500
    lr = 0.1 # used the same learning rate decay as in the video
    epochs = 200000
    mini_batch = 32
    lambd = 1 ### added L2 regularization
    seed is 42
    Training error: 1.7801
    Dev error: 1.9884
    Test error: 1.9863 (I checked this only becouse I was worried that somehow I overfitted the dev set)
    Some examples generated from the model that I kinda liked:
    Angelise
    Fantumrise
    Bowin
    Xian
    Jaydan

    • @oklm2109
      @oklm2109 Год назад

      What's the formula to calculate the number of parameters of an MLP model?

    • @amgad_hasan
      @amgad_hasan Год назад +2

      @@oklm2109 You just add the trainable parameters of every layer.
      If the model contains only Fully connected layers (aka linear in pytorch or dense in tf), the number of parameters for each layer is:
      n_weights = n_in*n_hidden_units
      n_biases = n_hidden units
      n_params = n_weights + n_biases = (1+n_input)*(n_hidden_units)
      n_in: number of inputs (think of it as the number of outputs(or hidden units) from the last layer.
      This formula is valid for Linear layers, other types of layers may have different formula.

    • @glebzarin2619
      @glebzarin2619 Год назад +1

      I'd say that it is slightly not fair not to compare models with different block sizes. Because it not only influences the number of parameters but also the amount of information given as input.

    • @vhiremath4
      @vhiremath4 5 месяцев назад

      @Zaphod42Breeblebrox out of curiosity, what do your losses and examples look like without the L2 regularization?
      Also, love the username :P

    • @vhiremath4
      @vhiremath4 5 месяцев назад +1

      ^ Update - just tried this architecture locally and got the following without L2 regularization:
      train 1.7468901872634888
      val 1.9970593452453613
      How were you able to validate that there was an overfitting to the training dataset?
      Some examples:
      arkan.
      calani.
      ellizee.
      coralym.
      atrajnie.
      ndity.
      dina.
      jenelle.
      lennec.
      laleah.
      thali.
      nell.
      drequon.
      grayson.
      kayton.
      sypa.
      caila.
      jaycee.
      kendique.
      javion.

  • @RishikeshS-nv7ol
    @RishikeshS-nv7ol 6 месяцев назад +12

    Please don't stop making these videos, these are gold !

  • @timelapseguys4042
    @timelapseguys4042 Год назад +21

    Andrej, thanks a lot for the video! Please do not stop continuing the series. It's an honor to learn from you.

  • @rajeshparekh
    @rajeshparekh 10 месяцев назад +1

    Thank you so much for creating this video lecture series. Your passion for this topic comes through so vividly in your lectures. I learned so much from every lecture and especially appreciated how the lectures started from the foundational concepts and built up to the state-of-the art techniques. Thank you!

  • @maestbobo
    @maestbobo Год назад +10

    Best resource by far for this content. Please keep making more of these; I feel I'm learning a huge amount from each video.

  • @hintzod
    @hintzod Год назад +9

    Thank you so much for these videos. I really enjoy these deep dives, things make so much more sense when you're hand coding all the functions and running through examples. It's less of a black box and more intuitive. I hope this comment will encourage you to keep this going!

  • @vivekpadman5248
    @vivekpadman5248 Год назад +6

    Absolutely love this series Andrej sir... It not only teaches me stuff but gives me confidence to work even harder to share whatever I know already.. 🧡

  • @mipmap256
    @mipmap256 Год назад +7

    Can't wait for part 6! So clear and I can follow step by step. Thanks so much

  • @Jmelly11
    @Jmelly11 Месяц назад

    I rarely comment on videos, but thank you so much for this series, Andrej. I love learning, and learn many things that interest me. The reason I say that is that I have experienced a lot tutors/educators over time. And for what it's worth, I'd like you to know you're truly gifted when it comes to your understand of AI development and communicating that understanding.

  • @S0meM0thersson
    @S0meM0thersson Месяц назад

    Your work ethic and happy personality really move me. Respect to you, Andrej, you are great.🖖

  • @willr0073
    @willr0073 2 месяца назад

    All the makemore lessons have been awesome Andrej! Huge thanks for helping me understand better how this world works.

  • @brittaruiters6309
    @brittaruiters6309 Год назад +5

    I love this series so much :) it has profoundly deepened my understanding of neural networks and especially backpropagation. Thank you

  • @kshitijbanerjee6927
    @kshitijbanerjee6927 Год назад +37

    Hey Andrej! I hope you continue and give us the RNN, GRU & Transformer lectures as well! The chatGPT one is great, but I feel like we missed the story in the middle, and jumped the story because of ChatGPT

    • @SupeHero00
      @SupeHero00 Год назад +1

      The ChatGPT lecture is the Transformer lecture.. And regarding RNNs, I don't see why would anyone still use it...

    • @kshitijbanerjee6927
      @kshitijbanerjee6927 Год назад +6

      transformers yes . but it’s not like anyone will build bigrams either, it’s about learning the concepts like BPTT etc from roots

    • @SupeHero00
      @SupeHero00 Год назад +1

      @kshitijbanerjee6927 Bigrams and MLPs help you understand Transformers (which is the SOA).. Anyway IMO it would be a waste of time creating a lecture on RNNs, but if the majority want it, then maybe he should do it.. I don't care

    • @kshitijbanerjee6927
      @kshitijbanerjee6927 Год назад +5

      Fully disagree that it’s not useful.
      I think the concepts of how they came up unrolling and BPTT, the gates used to solve long term memory problems are invaluable to appreciate and understand why transformers are such a big deal.

    • @attilakun7850
      @attilakun7850 11 месяцев назад +1

      @@SupeHero00 RNNs are coming back due to SSMs like Mamba.

  • @sakthigeek2458
    @sakthigeek2458 8 месяцев назад

    Learned a lot of practical tips and theoretical knowledge of why we do what we do and also the history of how Deep Learning evolved. Thanks a lot for this series. Requesting you to continue the series.

  • @aanchalagarwal6886
    @aanchalagarwal6886 Год назад +2

    Thank you Andrej for creating this series. It has been very helpful. I just hope you get the time to continue with it.

  • @aurelienmontmejat1077
    @aurelienmontmejat1077 Год назад +2

    This is the best deep learning course I've followed! Even better than the one on Coursera. Thanks!

  • @ShinShanIV
    @ShinShanIV Год назад +1

    Thank you so much Andrej for the series, it helps me a lot. You are one of the reasons I was able to get into ML and build a career there. I admire your teaching skills!
    I didn't get why the sequence dim has to be part of the batch dimension, and I didn't hear Andrej talk about it explicitly, so here is my reasoning:
    The sequence dimension is an additional batch dimension because the output before batch norm is created by a linear layer with (32, 4, 20) @ (20, 68) + (68) which performs the matrix multiplication only with the last dimension (.., .., 20) and in parallel on the first two. So, the matrix multiplication is performed 32 * 4 times with (20) @ (20, 68). Thus, it's the same as having a (128, 20) @ (20, 68) calculation, where (32 * 4) = 128 is the batch dimension. So, the sequence dimension is treated effectively as if it was a "batch dimensions" in the linear layer and must be treated that way in batch norm too.
    (would be great if someone could confirm)

  • @NarendraBME
    @NarendraBME 10 месяцев назад

    So far THE BEST lecture series I came across on RUclips. Along side learning the neural networks in this series, I have learned the PyTorch more than learning it by waching a PyTorch video series of 26 hrs from a youtuber.

  • @cktse_jp
    @cktse_jp 9 месяцев назад

    Just wanna say thank you for sharing your experience -- love this from-scratch series starting from first principles!

  • @AndrewOrtman
    @AndrewOrtman Год назад +3

    When I did the mean() trick at ~8:50 I left out an audible gasp! That was such a neat trick, going to use that one in the future

  • @eustin
    @eustin Год назад +3

    Yes! I've been telling everyone about these videos. I've been checking whether you posted the next video everyday. Thank you.

  • @Leon-yp9yw
    @Leon-yp9yw Год назад +3

    I was worried I was going to have to wait a couple of months for the next video as I finished part 4 just last week. Can't wait to get into this one, thanks a lot for this series Andrej

  • @phrasedparasail9685
    @phrasedparasail9685 10 дней назад

    These videos are amazing, please never stop making this type of content!

  • @stracci_5698
    @stracci_5698 Год назад +1

    This is truly the best dl content out there. Most courses just focus on the theory but lack deep understanding.

  • @WarrenLacefield
    @WarrenLacefield Год назад +1

    Enjoying these video so much. To refresh most of what I've forgotten about Python and to begin playing with pytorch. Last I did this stuff myself was with C# and CNTK. Now going back to rebuild and rerun old models and data (much faster even & "better" results). Thank you.

  • @1knmd
    @1knmd Год назад +2

    Everytime a new video is out is like christmas for me!, please don't stop doing this, best ML content out there.

  • @timowidyanvolta
    @timowidyanvolta Год назад +3

    Please continue, I really like this series. You are an awesome teacher!

  • @gurujicoder
    @gurujicoder Год назад

    My best way to learn is to learn from one of the most experienced person in the field. Thanks for everything Andrej

  • @brianwhite9137
    @brianwhite9137 Год назад

    Very grateful for these. An early endearing moment was in the Spelled-Out Intro when you took a moment to find the missing parentheses for 'print.'

  • @panagiotistseles1118
    @panagiotistseles1118 Год назад

    Totally amazed by the amount of good work you put in. You've helped a lot of people Andrej. Keep up the good work

  • @stanislawcronberg3271
    @stanislawcronberg3271 Год назад +1

    My favorite way to start a Monday morning is to wake up to a new lecture in Andrej's masterclass :)

  • @kimiochang
    @kimiochang Год назад

    Finally Completed this one. As always thank you Andrej for your generosity! Next I will practice through all five parts again and learn how to accelerate the training process by using GPUs.

  • @thanikhurshid7403
    @thanikhurshid7403 Год назад +2

    Andrej you are the absolute greatest. Keep making your videos. Anxiously waiting to implement Transformers with you

  • @milankordic
    @milankordic Год назад +3

    Was looking forward to this one. Thanks, Andrej!

  • @art4eigen93
    @art4eigen93 Год назад

    Please continue this series Sir Andrje. You are the savior!

  • @AlienLogic775
    @AlienLogic775 Год назад +2

    Thanks so much Andrej! Hope to see a Part 6

  • @ephemer
    @ephemer Год назад

    Thanks so much for this series, I feel like this is the most important skill I might ever learn and it’s never been more accessible than in your lectures. Thank you!

  • @meisherenow
    @meisherenow 10 месяцев назад

    How cool is it that anyone with an internet connection has access to such a great teacher? (answer: very)

  • @timandersen8030
    @timandersen8030 Год назад +1

    Thank you, Andrej! Looking forward to the rest of the series!

  • @shouryamann7830
    @shouryamann7830 Год назад +3

    ive been using this step loss function and I've been consistently getting slight better training and validation losses. for this i got 1.98 val loss.
    lr = 0.1 if i < 100000 else (0.01 if i < 150000 else 0.001)

  • @kaushik333ify
    @kaushik333ify Год назад +4

    Thank you so much for these lectures ! Can you please make a video on the “experimental harness” you mention at the end of the video? It would be super helpful and informative.

  • @bensphysique6633
    @bensphysique6633 2 месяца назад

    This guy is literally my definition of wholesomeness. Again, Thank you, Andrej!

  • @ayogheswaran9270
    @ayogheswaran9270 Год назад

    @Andrej thank you for making this. Please continue making such videos. It really helps beginners like me. If possible, could you please make a series of how actual development and production is done.

  • @uncoded0
    @uncoded0 Год назад +2

    Thanks again Andrej! Love these videos! Dream come true to watch and learn these!
    Thanks for all you do to help people! You're helpfulness ripples throughout the world!
    Thanks again! lol

  • @SK-ke8nu
    @SK-ke8nu 17 дней назад

    Great video again Andrej, keep up the good work and thank you as always!

  • @nikitaandriievskyi3448
    @nikitaandriievskyi3448 Год назад

    I just found your youtube channel, and this is just amazing, please do not stop doing these videos, they are incredible

  • @flwi
    @flwi Год назад +1

    Great series! I really enjoy the progress and good explanations.

  • @ERRORfred2458
    @ERRORfred2458 Год назад

    Andrej, thanks for all you do for us. You're the best.

  • @sunderrajan6172
    @sunderrajan6172 Год назад

    Beautifully explained as always - thanks. It shows how much passion you have to come up with these awesome videos. We all blessed!

  • @thehazarika
    @thehazarika Год назад +1

    This is philanthropy! I love you man!

  • @yanazarov
    @yanazarov Год назад +1

    Absolutely awesome stuff Andrej. Thank you for doing this.

  • @enchanted_swiftie
    @enchanted_swiftie Год назад

    The sentence that Anderej said at 49:26 made me realize something, something very deep. 🔥

  • @Abhishekkumar-qj6hb
    @Abhishekkumar-qj6hb Год назад

    So I ended up this lecture series and I was expecting RNN/LSTM/GRU but was not there however throughout learnt a lot that can definitely on my own. Thanks Andrej

  • @kemalware4912
    @kemalware4912 Год назад

    Deliberate errors on the right spot.. Your lectures are great.

  • @sam.rodriguez
    @sam.rodriguez Год назад +1

    How can we help you keep putting these treasures out Andrej? I think the expected value of helping hundreds of thousands of ML practitioners improve their understanding of the building blocks might be proportional (or even outweigh) the value of your individual contributions at OpenAI. Thats not to say that your technical contributions are not valuable, on the contrary, I'm using their value as a point of comparison because I want to emphasise how amazingly valuable I think your work on education is. A useful analogy would be to ask which ended up having more impact on our progress in the field of physics: Richard Feynman's Lectures that motivated many to pursue science and improved the intuitions of everyone OR his individual contributions to the field?. At the end of the day is not about one or the other but just finding the right balance given the effective impact of each and, of course, your personal enjoyment.

  • @veeramahendranathreddygang1086
    @veeramahendranathreddygang1086 Год назад +2

    Thank you Sir. Have been waiting for this.

  • @studiostaufer
    @studiostaufer Год назад +1

    Hi Andrej, thank you for taking the time to create these videos. In this video, for the first time, I'm having difficulties understanding what the model is actually learning. I've watched it twice and tried to understand the WaveNet paper, but that isn't really helping. Given an example input “emma“, the following character is supposed to be “.“, why is it beneficial to create a hidden layer to process “em“, “ma“, and then “emma“? Are we essentially encoding that given a 4 character word, IF the first two characters are “em“ it is likely that the 5th character is “.“, no matter what the third and fourth characters are? In other words, this implementation would probably assign a higher probability that “.“ is the fifth character after an unseen name, e.g. “emli“, simply because it starts with the bigram “em“? Thanks in advance, Dimitri.

  • @mellyb.1347
    @mellyb.1347 7 месяцев назад +1

    Loved this series. Would you please be willing to continue it so we get to work through the rest of CNN, RNN, and LSTM? Thanks!

  • @kindoblue
    @kindoblue Год назад

    Every video another solid pure gold bar

  • @pablofernandez2671
    @pablofernandez2671 Год назад

    Andrej, we all love you. You're amazing!

  • @ThemeParkTeslaCamping360
    @ThemeParkTeslaCamping360 Год назад

    Incredible video this helps a lot. Thank you for videos, especially I loved your Stanford videos regarding machine learning from scratch and that's how you do it without any libraries like tensorflow and pytorch. Keep going and thank you for helping hungry learners like me!!! Cheers 🥂

  • @chineduezeofor2481
    @chineduezeofor2481 5 месяцев назад +1

    Thank you for this beautiful tutorial 🔥

  • @wolpumba4099
    @wolpumba4099 6 месяцев назад

    *Abstract*
    This video continues the "makemore" series, focusing on improving the character-level language model by transitioning from a simple multi-layer perceptron (MLP) to a deeper, tree-like architecture inspired by WaveNet. The video delves into the implementation details, discussing PyTorch modules, containers, and debugging challenges encountered along the way. A key focus is understanding how to progressively fuse information from input characters to predict the next character in a sequence. While the video doesn't implement the exact WaveNet architecture with dilated causal convolutions, it lays the groundwork for future explorations in that direction. Additionally, the video provides insights into the typical development process of building deep neural networks, including reading documentation, managing tensor shapes, and using tools like Jupyter notebooks and VS Code.
    *Summary*
    *Starter Code Walkthrough (**1:43**)*
    - The starting point is similar to Part 3, with minor modifications.
    - Data generation code remains unchanged, providing examples of three characters to predict the fourth.
    - Layer modules like Linear, BatchNorm1D, and Tanh are reviewed.
    - The video emphasizes the importance of setting BatchNorm layers to training=False during evaluation.
    - Loss function visualization is improved by averaging values.
    *PyTorchifying Our Code: Layers, Containers, Torch.nn, Fun Bugs (**9:19**)*
    - Embedding table and view operations are encapsulated into custom Embedding and Flatten modules.
    - A Sequential container is created to organize layers, similar to torch.nn.Sequential.
    - The forward pass is simplified using these new modules and container.
    - A bug related to BatchNorm in training mode with single-example batches is identified and fixed.
    *Overview: WaveNet (**17:12**)*
    - The limitations of the current MLP architecture are discussed, particularly the issue of squashing information too quickly.
    - The video introduces the WaveNet architecture, which progressively fuses information in a tree-like structure.
    - The concept of dilated causal convolutions is briefly mentioned as an implementation detail for efficiency.
    *Implementing WaveNet (**19:35**)*
    - The dataset block size is increased to 8 to provide more context for predictions.
    - The limitations of directly scaling up the context length in the MLP are highlighted.
    - A hierarchical model is implemented using FlattenConsecutive layers to group and process characters in pairs.
    - The shapes of tensors at each layer are inspected to ensure the network functions as intended.
    - A bug in the BatchNorm1D implementation is identified and fixed to correctly handle multi-dimensional inputs.
    *Re-training the WaveNet with Bug Fix (**45:25**)*
    - The network is retrained with the BatchNorm1D bug fix, resulting in a slight performance improvement.
    - The video notes that PyTorch's BatchNorm1D has a different API and behavior compared to the custom implementation.
    *Scaling up Our WaveNet (**46:07**)*
    - The number of embedding and hidden units are increased, leading to a model with 76,000 parameters.
    - Despite longer training times, the validation performance improves to 1.993.
    - The need for an experimental harness to efficiently conduct hyperparameter searches is emphasized.
    *Experimental Harness (**46:59**)*
    - The lack of a proper experimental setup is acknowledged as a limitation of the current approach.
    - Potential future topics are discussed, including:
    - Implementing dilated causal convolutions
    - Exploring residual and skip connections
    - Setting up an evaluation harness
    - Covering recurrent neural networks and transformers
    *Improve on My Loss! How Far Can We Improve a WaveNet on This Data? (**55:27**)*
    - The video concludes with a challenge to the viewers to further improve the WaveNet model's performance.
    - Suggestions for exploration include:
    - Trying different channel allocations
    - Experimenting with embedding dimensions
    - Comparing the hierarchical network to a large MLP
    - Implementing layers from the WaveNet paper
    - Tuning initialization and optimization parameters
    i summarized the transcript with gemini 1.5 pro

  • @zz79ya
    @zz79ya 9 месяцев назад +2

    Um, can I find Part 6 somewhere?(RNN, LSTM, GRU..) I was under the impression that the next video in the playlist is about building GPT from skretch.

  • @michaelmuller136
    @michaelmuller136 10 месяцев назад

    That was a very great playlist, easy to understand and very helpfull, thank you very much!!

  • @daniellu8104
    @daniellu8104 8 месяцев назад +1

    Haven't watched this video (yet) but i'm wondering if Andrej discussed WaveNet vs transformer. I know that the WaveNet paper came about around the same time as Attention is All You Need. It seems like both WaveNet and transformers can do sequence prediction/generation, but transformers have taken off. Is that because of transformers' better performance in most problem domains? Does WaveNet still outperform transformers in certain situations?

  • @adsuabeakufea
    @adsuabeakufea 10 месяцев назад

    great video, been learning a ton from you recently. thank you andrej!

  • @Joker1531993
    @Joker1531993 Год назад +1

    I am subscribing Andrej, just to support someone from our country, Slovakia. Even I don't understand nothing from the video >D

  • @vivekpandit7417
    @vivekpandit7417 Год назад +1

    Been waiting for awhile. Thankyouuu !!

  • @mynameisZhenyaArt_
    @mynameisZhenyaArt_ 10 месяцев назад +1

    Hi Andrej. Is there going to be RNN, LSTN, GRU video? or maybe even part 2 on the topic of WaveNet with the residual connections?

  • @Erosis
    @Erosis Год назад +2

    Numpy / torch / tf tensor reshaping always feels like handwaivy magic.

  • @8eck
    @8eck Год назад

    Finally finished all the lectures and i understood that i have a bad math understanding and bad understanding of dimensionality and operations over it. Anyways, thank you for helping out with the rest concepts and practices, i do better understand now of how backprop is working and what it is doing and what for.

    • @Ali-lm7uw
      @Ali-lm7uw Год назад

      Jon Krohn has some a full playlist of algebra and calculus before starting machine learning

  • @BlockDesignz
    @BlockDesignz Год назад +1

    Please keep these coming!

  • @creatureOfnature1
    @creatureOfnature1 Год назад

    Much appreciated, Andrej. Your tutorials are gem!

  • @davidherdoizamorales7832
    @davidherdoizamorales7832 Год назад +3

    Bro pls release part 6, I know you are busy with chatgpt but the word needs you

  • @SupratimSamanta
    @SupratimSamanta 2 месяца назад

    Andrej is literally the bridge between worried senior engineers and the world of gen ai.

  • @kaenovama
    @kaenovama Год назад

    Thank you! Love the series! Helped me a lot with my learning experience with PyTorch

  • @4mb127
    @4mb127 Год назад

    Thanks for continuing this fantastic series.

  • @davidespinosa1910
    @davidespinosa1910 Месяц назад

    At 38:00, it sounds like we compared two architectures, both with 22k parameters and an 8 character window:
    * 1 layer, full connectivity
    * 3 layers, tree-like connectivity
    In a single layer, full connectivity outperforms partial connectivity.
    But partial connectivity uses fewer parameters, so we can afford to build more layers.

  • @EsdrasSoutoCosta
    @EsdrasSoutoCosta Год назад

    Awesome! Well explained and clear what's being done. Please keep doing this fantastic videos!!!

  •  4 месяца назад

    Thanks. Very helpful and intuitive.

  • @fajarsuharyanto8871
    @fajarsuharyanto8871 Год назад

    Rarely finish entire episode. He'i Andrej 👌

  • @Anfera236
    @Anfera236 Год назад +1

    Great content, Andrej! Keep them coming!

  • @aga1nstall0dds
    @aga1nstall0dds 3 месяца назад

    Thanks for the masterclass!!!!!! .... btw i found u through an interview of geohotz with lex.... i heard u like to teach and they r right about that statement :)

  • @wholenutsanddonuts5741
    @wholenutsanddonuts5741 Год назад

    Fant wait for this next step in the process!

  • @Leo-sy4vu
    @Leo-sy4vu Год назад

    Thank you soo much for the series i recently started it and its the best thing on the entire youtube. keep it up

  • @yuchaozhang4881
    @yuchaozhang4881 Год назад +1

    Can we use WaveNet for llm training? Interested in it's performance

  • @venkateshmunagala205
    @venkateshmunagala205 Год назад

    AI Devil is back . Thanks for the video @Andrej Karpathy.

  • @utkarshsingh1663
    @utkarshsingh1663 Год назад

    Thanks Andrej this course is awesome for base building..

  • @joekharris
    @joekharris Год назад

    I'm learning so much. I really appreciate the lucidity and simplicity of your approach. I do have a question. Why not initialize running_mean and running_var to None and then set them on the first batch? That would seem to be a better approach than to start them at zero and would be consistent with making them exponentially weighted moving averages - which they are except for the initialization at 0.0.

  • @mobkiller111
    @mobkiller111 Год назад

    Thanks for the content & explanations Andrej and have a great time in Kyoto :)

  • @arielfayol7198
    @arielfayol7198 Год назад

    Please don't stop the series😢

  • @mohammadhomsee8640
    @mohammadhomsee8640 9 месяцев назад

    I have challenging question ( for me :) ). I made a very simple network which takes x as an input and produce y as an output. the network looks like that (y = sin(ax + b)) where a and b are learnable variables. training data is built out from sin(3x+4312). loss function is quadratic mean. using usual approaches I couldn't make it works ! what do you think the problem is?

  • @lotfullahandishmand4973
    @lotfullahandishmand4973 Год назад

    Dear Andrej your work is amazing, we are here to share and have a beautiful world all together and you are doing that.
    If you could make a video about Convolution NNs, or Image net top architectures, any thing deep related to vision, that would be great
    Thank you !

  • @philipwoods6720
    @philipwoods6720 Год назад

    SO EXCITED TO SEE THIS POSTED LEEEEETS GOOOOOOOO