The Absolutely Simplest Neural Network Backpropagation Example

Поделиться
HTML-код
  • Опубликовано: 4 июн 2024
  • I'm (finally after all this time) thinking of new videos. If I get attention in the donate button area, I will proceed:
    www.paypal.com/donate/?busine...
    sorry there is a typo: @3.33 dC/dw should be 4.5w - 2.4, not 4.5w-1.5
    NEW IMPROVED VERSION AVAILABLE: • 0:03 / 9:21The Absolut...
    The absolutely simplest gradient descent example with only two layers and single weight. Comment below and click like!
  • НаукаНаука

Комментарии • 185

  • @markneumann381
    @markneumann381 Месяц назад

    Really nice work. Thank you so much for your help.

  • @SuperYtc1
    @SuperYtc1 16 дней назад

    4:03 Shouldn't 3(a - y) be 3(1.5*w - 0.8) = 4.5w - 2.4? Where have you got -1.5 from?

  • @GustavoMeschino
    @GustavoMeschino Месяц назад

    GREAT, it was a perfect inspiration for me to explain this critical subject in a class. Thank you!

  • @TruthOfZ0
    @TruthOfZ0 20 дней назад

    if we take directly the derivitive dC/dw from C=(a-y)^2 is the same thing right? do we really have to split individually da/dw and dC/da ???

  • @srnetdamon
    @srnetdamon 3 месяца назад +1

    man 4:08 i dont undestrand how you find the valor 4.5, in expression 4.5.w-1.5,

  • @animatedzombie64
    @animatedzombie64 Месяц назад

    Best video ever about the back propagation in the internet 🛜

  • @javiersanchezgrinan919
    @javiersanchezgrinan919 Месяц назад

    Great video. Just one question, this is for 1 x 1 input and batch size of 1 right?. If we have, let´s say a batch size of 2, It is just to sum (b-y)^2 to the loss function ( C= (a-y)^2 + (b-y)^2) isnt it?, with b = w * j and j = the input of the second batch size. Then you just perform the backpropation with partial derivatives. Is it correct?

  • @hegerwalter
    @hegerwalter Месяц назад

    Where and how did you get the learning rate?

  • @bhlooli
    @bhlooli Год назад

    Thanks very helpful.

  • @lazarus8011
    @lazarus8011 Месяц назад

    Unreal explanation

  • @rachidbenabdelmalek3098
    @rachidbenabdelmalek3098 Год назад

    Thanks you

  • @sameersahu3987
    @sameersahu3987 Год назад

    Thanks

  • @btmg4828
    @btmg4828 Месяц назад +1

    I don’t get it you write 1.5*2(a-y) = 4.5w -1.5
    But why? It should be 4.5w -2,4
    Because 2*0,8*-1,5= -2,4
    Where am I rong?

  • @Vicente75480
    @Vicente75480 5 лет назад +184

    Dude, this was just what I needed to finally understand the basics of Back Propagation

    • @webgpu
      @webgpu Месяц назад

      if you _Really_ liked his video, just click the first link he put on the description 👍

  • @AjitSingh147
    @AjitSingh147 Год назад

    GOD BLESS YOU DUDE! SUBSCRIBED!!!!

  • @SamuelBachorik-Mrtapo8-ApeX
    @SamuelBachorik-Mrtapo8-ApeX 2 года назад +13

    Hi I have question for you, at 3:42, you have, 1.5*2(a-y) = 4.5*w-1.51, how did you get this result?

    • @nickpelov
      @nickpelov Год назад +16

      ... in case someone missed it like me - it's in the description (it's a typo). y=0.8; a=i*w = 1.5*w, so 1.5*2(a-y) =3*(1.5*w - 0.8) = 4.5*w - 3*0.8 = 4.5*w - 2.4 is the correct formula.

  • @whywhatwherein
    @whywhatwherein Месяц назад

    finally, a proper explanation.

  • @fredfred9847
    @fredfred9847 2 года назад

    Great video

  • @anirudhputrevu3878
    @anirudhputrevu3878 2 года назад

    Thanks for making this

  • @ExplorerSpace
    @ExplorerSpace Год назад

    @Mikael Laine even though you say that @3:33 has a typo. i cant see the typo. 1.5 is correct because y is the actual desired out put and it is 0.5. so 3.0 * 0.5 = 1.5

  • @giorgosmaragkopoulos9110
    @giorgosmaragkopoulos9110 2 месяца назад

    So what is the clever part of back prop? Why does it have a special name and it isn't just called "gradient estimation"? How does it save time? It looks like it just calculates all derivatives one by one

  • @formulaetor8686
    @formulaetor8686 Год назад

    Thats sick bro I just implemented it

  • @starkarabil9260
    @starkarabil9260 2 года назад

    1:11 where does y = 0.8 come from?

  • @Nova-Rift
    @Nova-Rift 3 года назад

    hmm, if y = .8 then should dc/dw = 4.5w - 2.4. Because .8 * 3 = 2.4, not 1.5. What am I missing?

  • @LunaMarlowe327
    @LunaMarlowe327 2 года назад

    very clear

  • @polybender
    @polybender 24 дня назад

    best on internet.

  • @riccardo700
    @riccardo700 3 месяца назад

    My maaaaaaaannnnn TYYYY

  • @drummin2dabeat
    @drummin2dabeat 3 месяца назад

    What a breakthrough, thanks to you. BTW, not to nitpick, but you are missing a close paren on f(g(x), which should be f(g(x)).

  • @demetriusdemarcusbartholom8063

    ECE 449 UofA

  • @justinwhite2725
    @justinwhite2725 3 года назад +18

    @8:06 this was super useful. That's a fantastic shorthand. That's exactly the kind of thing I was looking for, something quick I can iterate over all the weights and find the most significant one for each step.

  • @alexandrmelnikov5126
    @alexandrmelnikov5126 7 месяцев назад

    man, thanks!

  • @bedeamadi9317
    @bedeamadi9317 3 года назад +6

    My long search ends here, you simplified this a great deal. Thanks!

  • @sabinbaral4132
    @sabinbaral4132 Год назад

    Good content sir keep making these i subscribe

  • @mateoacostarojas6031
    @mateoacostarojas6031 5 лет назад +7

    just perfect, simple and with this we can extrapolate easier when in each layer there are more than one neuron! thaaaaankksss!!

  • @adoughnut12345
    @adoughnut12345 3 года назад +16

    This was great. Removing non linearity and including basic numbers as context help drove this material home.

    • @gerrypaolone6786
      @gerrypaolone6786 2 года назад

      If you use relu there is nothing more that that

  • @saral123
    @saral123 3 года назад +3

    Fantastic. This is the most simple and lucid way to explain backprop. Hats off

  • @gautamdawar5067
    @gautamdawar5067 3 года назад +3

    After a long frantic search, I stumbled upon this gold. Thank you so much!

  • @svtrilogywestsail3278
    @svtrilogywestsail3278 2 года назад

    this was kicking my a$$ until i watched this video. thanks

  • @EthanHofton
    @EthanHofton 3 года назад +6

    Very clearly explained and easy to understand. Thank you!

  • @ilya5782
    @ilya5782 6 месяцев назад +5

    To understand mathematics, I need to see an example.
    An this video from start to end is awesome with quality presentation.
    Thank you so much.

  • @tellmebaby183
    @tellmebaby183 Год назад

    Perfect

  • @arashnozarinejad9915
    @arashnozarinejad9915 4 года назад +5

    I had to write a comment and thank you for your very precise yet simple explanation, just what I needed. Thank you sir.

  • @sparkartsdistinctions1257
    @sparkartsdistinctions1257 3 года назад +6

    I watched almost every videos of back propagation even Stanford but never got such clear idea until I saw this one ☝️.
    Best and clean explanation.
    My first 👍🏼 which I rarely give.

    • @webgpu
      @webgpu Месяц назад

      a 👍is very good, but if you click on the first link on the description, it would be even better 👍

    • @sparkartsdistinctions1257
      @sparkartsdistinctions1257 Месяц назад

      @@webgpu 🆗

  • @RaselAhmed-ix5ee
    @RaselAhmed-ix5ee 3 года назад

    what is a

  • @Freethinker33
    @Freethinker33 2 года назад +3

    I was just looking for this explanation to align derivatives with gradient descent. Now it is crystal clear. Thanks Miakel

  • @SureshBabu-tb7vh
    @SureshBabu-tb7vh 5 лет назад +3

    You made this concept very simple. Thank you

  • @praneethaluru2601
    @praneethaluru2601 3 года назад +2

    The best short video explanation of the concept0 on RUclips till now...

  • @OviGomy
    @OviGomy 5 месяцев назад

    I think there is a mistake. 4.5w -1.5 is correct.
    On the first slide you said 0.5 is the expected output.
    So "a" is the computed output and "y" is the expected output. 0.5 * 1.5 * 2 = 1.5 is correct.
    You need to correct the "y" next to the output neuron to 0.5.

  • @xflory26x
    @xflory26x Месяц назад

    Not kidding. This is the best explanation of backpropagation on the internet. The way you're able to simplify this "complex" concept is *chef's kiss* 👌

  • @TrungNguyen-ib9mz
    @TrungNguyen-ib9mz 3 года назад +9

    Thank you for your video. But I’m a bit confused about 1,5.2(a-y) = 4,5.w-1,5, Might you please explain that? Thank you so much!

    • @user-gq7sv9tf1m
      @user-gq7sv9tf1m 3 года назад +9

      I think this is how he got there :
      1.5 * 2(a - y) = 1.5 * 2 (iw - 0.5) = 1.5 * 2 (1.5w - 0.5) = 1.5 * (3w - 1) = 4.5w - 1.5

    • @christiannicoletti9762
      @christiannicoletti9762 3 года назад +2

      @@user-gq7sv9tf1m dude thanks for that, I was really scratching my head over how he got there too

    • @Fantastics_Beats
      @Fantastics_Beats Год назад

      i am also confused this error

    • @morpheus1586
      @morpheus1586 Год назад +2

      @@user-gq7sv9tf1m y is 0.8 not 0.5

  • @santysayantan
    @santysayantan 2 года назад +2

    This makes more sense than anything I ever heard in the past! Thank you! 🥂

    • @brendawilliams8062
      @brendawilliams8062 9 месяцев назад

      It beats the 1002165794 thing and 1001600474 jumping and calculating with 1000325836 and 1000564416. Much easier 😊

    • @jameshopkins3541
      @jameshopkins3541 9 месяцев назад

      you are wrong: Say me what is deltaW?

  • @TheRainHarvester
    @TheRainHarvester Год назад

    6:55 but it's NOT the same terms. Is that da0/dw1 term correct?

    • @TheRainHarvester
      @TheRainHarvester Год назад

      Looks like the w term turns into a each back step.

  • @riccardo700
    @riccardo700 3 месяца назад +1

    I have to say it. You have done the best video about backpropagation because you chose to explain the easiest example, no one did that out there!! Congrats prof 😊

    • @webgpu
      @webgpu Месяц назад

      did you _really_ like his video? Then, i'd suggest you click the first link he put on the description 👍

  • @AAxRy
    @AAxRy 3 месяца назад

    THIS IS SOO FKING GOOD!!!!

  • @ronaldmercado4768
    @ronaldmercado4768 8 месяцев назад

    Absolutly simple. Very useful illustration not only to understand Backpropagation but also to show gradient descent optimization. Thanks a lot.

  • @grimreaperplayz5774
    @grimreaperplayz5774 Год назад

    This is absolutely awesome. Except..... Where did that 4.5 come from???

    • @delete7316
      @delete7316 10 месяцев назад

      You’ve probably figured it out by now but just in case: i = 1.5, y=0.8, a = i•w. This means the expression for dC/dw = 1.5 • 2(1.5w - 0.8). Simplify this and you get 4.5w - 2.4. This is where the 4.5 comes from. Extra note: in the description it says -1.5 was a typo and the correct number is -2.4.

  • @outroutono4937
    @outroutono4937 Год назад

    Thank you bro! Its so easier to visualize it when its presented like that.

  • @sunilchoudhary8281
    @sunilchoudhary8281 2 месяца назад +1

    I am so happy that I can't even express myself right now

    • @webgpu
      @webgpu Месяц назад

      there's a way you can express your happiness AND express your gratitude: by clicking on the first link in the description 🙂

  • @RaselAhmed-ix5ee
    @RaselAhmed-ix5ee 3 года назад +1

    in the final eqn why it is 4.5w-1.5 instead it should be 4.5w-2.4 since y=0.8 so 3*0.8 =2.4

    • @kamilkaya5367
      @kamilkaya5367 2 года назад

      Yes you are right. I noticed too.

  • @mixhybrid
    @mixhybrid 4 года назад +1

    Thanks for the video! Awesome explanation

  • @studentgaming3107
    @studentgaming3107 Год назад

    how do you get for y 1.5?

  • @hamedmajidian4451
    @hamedmajidian4451 3 года назад

    Great illustrated, thanks

  • @adriannyamanga1580
    @adriannyamanga1580 4 года назад +3

    dude please make more videos. this is amazing

  • @Leon-cm4uk
    @Leon-cm4uk 7 месяцев назад

    The error should be (1.2 - 0.5) = squared(0.7) = 0.49. So y is 0.49 and not 0.8 as it is displayed after minute 01:08.

  • @giuliadipalma5042
    @giuliadipalma5042 2 года назад

    thank you, this is exactly what I was looking for, very useful!

  • @JAYSVC234
    @JAYSVC234 9 месяцев назад

    Thank you. Here is pytorch implementation.
    import torch
    import torch.nn as nn
    class C(nn.Module):
    def __init__(self):
    super(C, self).__init__()
    r = torch.zeros(1)
    r[0] = 0.8
    self.r = nn.Parameter(r)
    def forward(self, i):
    return self.r * i
    class L(nn.Module):
    def __init__(self):
    super(L, self).__init__()
    def forward(self, p, t):
    loss = (p-t)*(p-t)
    return loss
    class Optim(torch.optim.Optimizer):
    def __init__(self, params, lr):
    defaults = {"lr": lr}
    super(Optim, self).__init__(params, defaults)
    self.state = {}
    for group in self.param_groups:
    for par in group["params"]:
    # print("par: ", par)
    self.state[par] = {"mom": torch.zeros_like(par.data)}
    def step(self):
    for group in self.param_groups:
    for par in group["params"]:
    grad = par.grad.data
    # print("grad: ", grad)
    mom = self.state[par]["mom"]
    # print("mom: ", mom)
    mom = mom - group["lr"] * grad
    # print("mom update: ", mom)
    par.data = par.data + mom
    print("Weight: ", round(par.data.item(), 4))
    # r = torch.ones(1)
    x = torch.zeros(1)
    x[0] = 1.5
    y = torch.zeros(1)
    y[0] = 0.5
    c = C()
    o = Optim(c.parameters(), lr=0.1)
    l = L()
    print("x:", x.item(), "y:", y.item())
    for j in range(5):
    print("_____Iter ", str(j), " _______")
    o.zero_grad()
    p = c(x)
    loss = l(p, y).mean()
    print("prediction: ", round(p.item(), 4), "loss: ", round(loss.item(), 4))
    loss.backward()
    o.step()

  • @mysteriousaussie3900
    @mysteriousaussie3900 3 года назад +4

    are you able to briefly describe how the calculation at 8:20 works for a network with mutliple neurons per layer?

  • @Controlvers
    @Controlvers 3 года назад +1

    Thank you for sharing this video!

  • @rdprojects2954
    @rdprojects2954 3 года назад +1

    Excellent , please continue we need this kind of simplicity in NN

  • @zemariagp
    @zemariagp 9 месяцев назад

    why do we ever need to consider multiple levels, why not just think about getting the right weight given the output "in front" of it

  • @aorusaki
    @aorusaki 4 года назад

    Very helpful tutorial. Thanks!

  • @Janeilliams
    @Janeilliams 2 года назад

    okay !! , it was simple and clear ,
    BUT , things are getting complex when i add two inputs or hidden layers, the partial derivates how to do ? if you anyone have propoiate and simple vedio of doing more than one inputs , hidden layers , then please throw it in the replay box , thanks !

  • @jks234
    @jks234 3 месяца назад

    I see.
    As previously mentioned, there are a few typos. For anyone watching, please note there are a few places where 0.8 and 0.5 are swapped for each other.
    That being said, this explanation has opened my eyes to the fully intuitive explanation of what is going on...
    Put simply, we can view each weight as an "input knob" and we want to know how each one creates the overall Cost/Loss.
    In order to do this, we link (chain) each component's local influence together until we have created a function that describes weight to overall cost.
    Once we have found that, we can adjust that knob with the aim of lowering total loss a small amount based on what we call "learning rate".
    Put even more succinctly, we are converting each weight's "local frame of reference" to the "global loss" frame of reference and then adjusting each weight with that knowledge.
    We would only need to find these functions once for a network.
    Once we know how every knob influences the cost, we can tweak them based on the next training input using this knowledge.
    The only difference between each training set will just be the model's actual output, which is then used to adjust the weights and lower the total loss.

  • @zh4842
    @zh4842 4 года назад

    excellent video, simple & clear many thanks

  • @satishsolanki9766
    @satishsolanki9766 3 года назад

    Awesome dude. Much appreciate your effort.

  • @RohitKumar-fg1qv
    @RohitKumar-fg1qv 5 лет назад +3

    Exactly what i needed

  • @zeljkotodor
    @zeljkotodor 2 года назад

    Nice and clean. Helped me a lot!

  • @user-mc9rt9eq5s
    @user-mc9rt9eq5s 3 года назад +15

    Thanks! This is Awesome. I have I question, if we make the NN more complicated a little bit (adding an activation function for each layer), what will be the difference?

  • @paurodriguez5364
    @paurodriguez5364 Год назад

    best explanation i had ever seen, thanks.

  • @DaSticks
    @DaSticks 5 месяцев назад

    Great video, going to spend some time working out it looks for multiple neurons, but a demonstration on that would be awesome

  • @lhyd7hak
    @lhyd7hak 2 года назад

    Thanks for a very explanatory video.

  • @mahfuzurrahman4517
    @mahfuzurrahman4517 7 месяцев назад

    Bro this is awesome, I was struggling to understand chain rule, now it is clear

  • @jakubpiekut1446
    @jakubpiekut1446 2 года назад

    Absolutely amazing 🏆

  • @ApplepieFTW
    @ApplepieFTW Год назад

    It clicked after just 3 minutes. Thanks a lot!!

  • @shirish3008
    @shirish3008 3 года назад

    This is the best tutorial on back prop👏

  • @bettercalldelta
    @bettercalldelta 2 года назад +1

    I'm currently programming a neural network from scratch, and I am trying to understand how to train it, and your video somewhat helped (didn't fully help cuz I'm dumb)

  • @popionlyone
    @popionlyone 5 лет назад +24

    You made it easy to understand. Really appreciated it. You also earned my first RUclips comment.

  • @muthukumars7730
    @muthukumars7730 3 года назад

    hi the eqn a=wx usually pass through 0 right?. but in pic it is represented away from center. can i understand that it is just for illustration?

  • @banpridev
    @banpridev Месяц назад

    Ow you did not lie on the tittle.

  • @fildaCZSK
    @fildaCZSK 3 года назад

    how in the hell did you get 4.5*w-1.5 from 1.5*2(a-y)

  • @souravmajumder4630
    @souravmajumder4630 10 месяцев назад

    what is r here?

  • @elgs1980
    @elgs1980 3 года назад

    Thank you so much!

  • @Chess_Enthusiast
    @Chess_Enthusiast 6 месяцев назад

    I still don't understand how is this useful for more complicated neural networks. How can we generalize this with incompatible matrices between the weights? Here you multiply the weights but when the weights are matrices they might be incompatible. I am not criticizing because I am a novice and I am trying to learn.

    • @mikaellaine9490
      @mikaellaine9490  6 месяцев назад

      Good questions. Matrices are just tables by which we organize numbers. Their sizes must match, such that you have the correct input-, weights- and output matrices.

  • @dcrespin
    @dcrespin Год назад

    The video shows what is perhaps the simplest case of a feedforward network, with all the advantages and limitations that extreme simplicity can have. From here to full generalization several steps are involved.
    1.- More general processing units.
    Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong not only to Euclidean spaces but to any Hilbert spaces as well. Derivatives are linear transformations and the derivative of a unit is the direct sum of the partial derivatives with respect to the inputs and with respect to the weights.
    2.- Layers with any number of units.
    Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a layer is equivalent to taking their product (as functions, in the set theoretical sense). Layers are functions of the totality of inputs and weights of the various units. The derivative of a layer is then the product of the derivatives of the units. This is a product of linear transformations.
    3.- Networks with any number of layers.
    A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers. Here we have a composition of linear transformations.
    4.- Quadratic error of a function.
    ---
    This comment is becoming a too long. But a general viewpoint clarifies many aspects of BPP.
    If you are interested in the full story and have some familiarity with Hilbert spaces please Google for papers dealing with backpropagation in Hilbert spaces.
    Daniel Crespin

  • @malinyamato2291
    @malinyamato2291 Год назад

    thanks a lot... a great start for me to learn NNs :)

  • @meanderthalensis
    @meanderthalensis 2 года назад

    Helped me so much!

  • @capilache
    @capilache 4 года назад

    When calculating the new value for w3 (at the end of the video), do you use the other original weights or the updated weights?

    • @mikaellaine9490
      @mikaellaine9490  4 года назад +2

      For a backward pass, you use the old weights.

  • @Dan-uf2vh
    @Dan-uf2vh 4 года назад

    I have a problem and can't find a solution.. how do you express this in matrices? how to do you backpropagate an error vector along the weights and preceding vectors

  • @evanparshall1323
    @evanparshall1323 3 года назад +1

    This video is very well done. Just need to understand implementation when there is more than one node per layer

    • @mikaellaine9490
      @mikaellaine9490  3 года назад

      Have you looked at my other videos? I have a two-dimensional case in this video: ruclips.net/video/Bdrm-bOC5Ek/видео.html

  • @Fantastics_Beats
    @Fantastics_Beats Год назад

    Please correct me?
    2 neuron w0 and w1
    (i)___i*w0__(a0=£i*w)___a0*w1___[Goal]
    #Feed forward
    a0=i*w0
    a1=a0*w1
    # back jump adjust weight
    W1(new)=w1(old)- R*a0*2*(a1-Goal)
    W0(new)=w0(old)- R*i*2*(a0-Goal)
    Please correct back jump weight adjust