Backpropagation in Convolutional Neural Networks (CNNs)

Поделиться
HTML-код
  • Опубликовано: 25 янв 2025

Комментарии • 116

  • @louissimon2463
    @louissimon2463 Год назад +13

    great video, but i don't understand how we can find the value of the dL/dzi terms. At 7:20 you make it seem like dL/dzi = zi, is that correct?

    • @far1din
      @far1din  Год назад +5

      No, they come from the loss function. I explain this at 4:17.
      It might be a bit unclear so I’ll highly reccomend you watch the video from 3blue1brown: ruclips.net/video/tIeHLnjs5U8/видео.htmlsi=Z6asTm87XWcW1bVn 😃

    • @rtpubtube
      @rtpubtube Год назад +5

      I'm with @louissimion, you show how dL/dw1 is related to dz1/dw1+... (etc), but you never show/expain where dL/dz1 (etc) comes from. Poof - miracle occurs here. Having a numerical example would help a lot. This "theory/symbology" only post is therefore incomplete/useless from a learing/understanding standpoint.

    • @mandy11254
      @mandy11254 8 месяцев назад +2

      ​@@rtpubtubeIt's quite literally what he wrote. He hasn't defined a loss function so that's just what it is from the chain rule. If you're asking how the actual value of dL/dz1 is computed, the last layer has its own set of weights besides the ones shown in the video, in addition to an activation function. You use that and a defined loss function to compute dL/dzi. It's similar to what you see in standard NNs. If you studied neural networks, you should know this. This is a video about CNNs not an intro to NNs. Go study that before this. It's not his job to point out every little thing.

  • @khayyamnaeem5601
    @khayyamnaeem5601 2 года назад +19

    Why is this channel so underrated? You deserve more subscribers and views.

    • @eneadriancatalin
      @eneadriancatalin Год назад +1

      Perhaps developers use ad blockers, and as a result, RUclips needs to ensure revenue by not promoting these types of videos (that's my opinion)

  • @noohayub2188
    @noohayub2188 3 месяца назад +3

    what an excellent demonstration of the backpropagation on CNN, you won my heart, literally no one on the internet explains it as clearly as you did but please try to make another video as a sequel to this one where you also use the biases, a more complex exampl

  • @nizamuddinkhan9443
    @nizamuddinkhan9443 Год назад +5

    Very well explanation, I search many videos but no body explained regarding change in filter's weight. Thank you so much for this animated simple explanation.

  • @abhimanyugupta532
    @abhimanyugupta532 8 месяцев назад +2

    Been trying to understand backpropogation in CNN for years until today! Thanks a ton mate!

    • @yosukesharp
      @yosukesharp 8 месяцев назад

      it was obvious primitive algo dude... people like you are being called "data scientists" now, which is really sad...

    • @blubaylon
      @blubaylon 3 месяца назад

      ​@@yosukesharpget off your high horse

  • @haideralix
    @haideralix Год назад +2

    I have seen few videos before, this one is by far the best one. It breaks down each concept and answers all the questions that comes in the mind. The progression, the explanation is best

  • @JessieJussMessy
    @JessieJussMessy Год назад +2

    This channel is a hidden gem. Thank you for your content

  • @rubytejackson
    @rubytejackson 4 месяца назад

    This is an exceptional explanation, and I can't thank u more... u have to keep going, u enlighten many student on the planet! that's the best thing a human can do!

    • @far1din
      @far1din  4 месяца назад

      Thank you brother, very much appreciate it! 🔥

  • @markuskofler2553
    @markuskofler2553 Год назад +5

    Couldn’t explain it better myself … absolutely amazing and comprehensible presentation!

  • @srinathchembolu7691
    @srinathchembolu7691 5 месяцев назад

    This is gold. Watching this after reading Michael Nielsen makes the concept crystal clear

  • @jayeshkurdekar126
    @jayeshkurdekar126 Год назад +2

    You are a great example of fluidity of thought and words..great explanation

    • @far1din
      @far1din  Год назад +1

      Thank you my friend. Hope you got some value! :)

    • @jayeshkurdekar126
      @jayeshkurdekar126 Год назад

      @@far1din sure did

  • @zemariamm
    @zemariamm Год назад +3

    Fantastic explanation!! Very clear and detailed, thumbs up!

  • @farrugiamarc0
    @farrugiamarc0 10 месяцев назад +1

    This is a topic which is rarely explained online, but it was very clearly explained here. Well done.

  • @Joker-ez2fm
    @Joker-ez2fm Год назад

    Please do not stop making these videos!!!

    • @far1din
      @far1din  Год назад +1

      I won’t let you down Joker 🔥🤝

  • @heyman620
    @heyman620 Год назад +2

    What a masterpiece.

  • @boramin3077
    @boramin3077 6 месяцев назад

    Best video to understand what is going on the under the hood of CNN.

  • @giacomorotta6356
    @giacomorotta6356 Год назад

    great video, underrated channel , please keep it up with CNN videos!

  • @lifedoesntworkthisway
    @lifedoesntworkthisway 13 дней назад

    Thanks man! Helped me a lot!

  • @saikoushik4064
    @saikoushik4064 11 месяцев назад

    Great Explanation, helped me understand the background working

  • @pedroviniciuspereirajunho7244
    @pedroviniciuspereirajunho7244 Год назад

    Amazing!
    I was looking for some material like this a long time ago and only found it here, beautiful :D

    • @far1din
      @far1din  Год назад +1

      Thank you my brother 🔥

  • @Peterpeter-hr8gg
    @Peterpeter-hr8gg Год назад +1

    what i was looking for. well explained

  • @paedrufernando2351
    @paedrufernando2351 Год назад

    your channel is a Hidden Gem..My suggestion is to start a discord and get some crowd functing and one on ones for people who want to learn from you..youa re gifted in teaching.

  • @DVSS77
    @DVSS77 Год назад

    really clear explanation and good pacing. I felt I understood the math behind back propagation for the first time after watching this video!

  • @msaeid_999
    @msaeid_999 Месяц назад

    Bruh!. This content is so underrated based on the video impression.
    I was reading the Computer Vision chapter from "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" and I was having it hard time understanding it.
    After, watching this video I can explain CNN to a lay man.

  • @shivanisrivarshini180
    @shivanisrivarshini180 3 месяца назад

    Great explanation. Thank you sir

  • @sourabhverma9034
    @sourabhverma9034 8 месяцев назад

    Really intuitive and great animations.

  • @mahmoudhassayoun9475
    @mahmoudhassayoun9475 4 месяца назад

    Good job , the explanation is super, I hope you do not stop making videos in this calibre . Did you use manim to make this video or an other video editor?

  • @DSLDataScienceLearn
    @DSLDataScienceLearn Год назад

    great explanation, clear direct and understandable, sub!

  • @ramazanyel5979
    @ramazanyel5979 8 месяцев назад

    excellent. the exact video i was looking for.

  • @SolathPrime
    @SolathPrime 2 года назад +7

    Well explained now I need to code it my self

    • @far1din
      @far1din  2 года назад +4

      Haha, that’s the hard part

    • @SolathPrime
      @SolathPrime 2 года назад +6

      @@far1din I think I came up with a solution Here
      def backward(self, output_gradient, learning_rate):
      kernels_gradient = np.zeros(self.kernels_shape)
      input_gradient = np.zeros(self.input_shape)
      for i in range(self.depth):
      for j in range(self.input_depth):
      kernels_gradient[i, j] = convolve2d(self.input[j], output_gradient[i], "valid")
      input_gradient[j] += convolve2d(output_gradient[i], self.kernels[i, j], "same")
      self.kernels -= learning_rate * kernels_gradient
      self.biases -= learning_rate * output_gradient
      return input_gradient
      First i initialized the kernel gradient as an array of zeros with the kernel shape
      then I iterated through the depth of the kernels the the depth of the input then for each gradient withe respect to the kernel
      I did the same to compute the input gradients
      Your vid helped me understand the backward method better
      So I have to say thank you sooo much for it

    • @SolathPrime
      @SolathPrime 2 года назад

      @@far1din I'll document the solution and but it here when I do please pin the comment

    • @far1din
      @far1din  2 года назад +1

      @@SolathPrime That’s great my friend. Will pin 💯

  • @shazzadhasan4067
    @shazzadhasan4067 Год назад

    Great explanation with cool visual. Thanks a lot.

    • @far1din
      @far1din  Год назад

      Thank you my friend 😃

  • @guoguowg1443
    @guoguowg1443 9 месяцев назад

    great stuff man, crystal clear!

  • @chatgpt-nv5ck
    @chatgpt-nv5ck 3 месяца назад

    Beautiful🙌

    • @far1din
      @far1din  3 месяца назад

      Thank you 🙌

  • @RAHUL1181995
    @RAHUL1181995 Год назад

    This was really helpful....Thank you so much for the vizualization...Keep up the good work...Looking forward to your future uploads.

  • @bamboooooooooooo
    @bamboooooooooooo 9 месяцев назад

    great job. this explanation is really intuitive

  • @PlabonTheSadEngineer
    @PlabonTheSadEngineer Год назад +1

    please continue your videos !!

  • @Coca244
    @Coca244 9 дней назад

    amazing video

  • @elgs1980
    @elgs1980 2 года назад +1

    Thank you so much!!! This video is so so so well done!

    • @far1din
      @far1din  2 года назад

      Thank you. Hope you got some value out of this! 💯

  • @aikenkazin4096
    @aikenkazin4096 Год назад

    Great explanation and visualization

    • @far1din
      @far1din  Год назад

      Thank you my friend 🔥🚀

  • @MarcosDanteGellar
    @MarcosDanteGellar Год назад

    the animations were super useful, thanks!

  • @gregorioosorio16687
    @gregorioosorio16687 Год назад +1

    Thanks for sharing!

  • @AlbertoOrtiz-we2jc
    @AlbertoOrtiz-we2jc 3 месяца назад

    excellent explanation thanks

    • @far1din
      @far1din  3 месяца назад

      Glad it was helpful!

  • @rodrigoroman4886
    @rodrigoroman4886 Год назад +2

    Great video!! Your explanation is the best I have found.
    Could you please tell me what software you use for the animations ?

    • @far1din
      @far1din  Год назад

      I use manim 😃
      www.manim.community

  • @bug8628
    @bug8628 3 месяца назад

    Amazing video!! :D

    • @far1din
      @far1din  3 месяца назад

      Thanks! 😄

  • @osamamohamedos2033
    @osamamohamedos2033 9 месяцев назад

    Masterpiece 💕💕

  • @arektllama3767
    @arektllama3767 2 года назад +2

    1:15 why do you iterate in steps of 2? If you iterated by 1 then you could generate a 3x3 layer image. Is that just to save on computation time/complexity or is there something other reason for it?

    • @far1din
      @far1din  2 года назад +3

      The reason why I used a stride of two (iterations in steps of two) in this video is partially random and partially because I wanted to highlight that the stride when performing backpropagation should be the same as when performing the forward propagation. In most learning materials I have seen, they usually use a stride of one, hence a stride of one for the backpropagation. This could lead to confusion when operating with larger strides.
      The stride could technically be whatever you like (as long as you keep it within the dimensions of the image/matrix). I could have chosen another number for the stride as you suggested. In that case, with a stride of one, the output would be a 3 x 3 matrix/image. Some will say that a shorter stride will encapsulate more information than a larger one, but this becomes “less true” as the size of the kernel increases. As far as I know there are no “rules” for when to use larger strides and not. Please let me know if this notion has changed as everything changes so quickly in this field! 🙂

    • @arektllama3767
      @arektllama3767 2 года назад +3

      @@far1din I never considered how stride length could change depending on kernel size. I guess that makes sense, the larger kernel could cover the same data as a small kernel, just in fewer steps/iterations. I also figured you intentionally generated a 2x2 image since that’s a lot simpler than a 3x3 and this an educational video. Thanks for the feedback, that was really insightful!

  • @UtkalSinha-s8j
    @UtkalSinha-s8j Год назад

    Nicely put, thank you so much.

  • @AsilKhalifa
    @AsilKhalifa 6 месяцев назад

    Thanks a lot!

  • @LeoMarchyok-od5by
    @LeoMarchyok-od5by 9 месяцев назад

    Best explanation

  • @aliewayz
    @aliewayz 8 месяцев назад

    really beautiful, thanks.

  • @dhudach
    @dhudach 4 месяца назад

    I'm new to machine learning and neural networks. Your video is very helpful. I have built a small python script just using numpy and I can train numerous samples. So this is a big picture question. Let's say I've trained my program on thousands of inputs and I'm satisfied. Now I want to see if it can recognize a new input, one not used in training. What weight and bias values do I use? After I'm finished with training, how do I modify the script to 'guess?' It would seem to me that back propagation isn't used because I don't actually have a 'desired' value so I'm not going to calculate loss. What weight and bias values do I use from the training sessions? There are dozens of videos and tutorials on training but I think the missing piece is what to do with the training program to make it become the 'trained' program, the one that guesses new inputs without back propagation.

  • @harshitbhandi5005
    @harshitbhandi5005 Год назад

    great explanation

  • @ItIsJan
    @ItIsJan Год назад

    5:24
    does this just mean we divide z1 by w1 and ultiply by L divided by z1 and do that for all z'S to get the partial derivative of L in respect to w1?

    • @far1din
      @far1din  Год назад

      It’s not that simple. Doing the actual calculations is a bit more tricky. Given no activation function, Z1 = w1*pixel1 + w2*pixel2 + w3*pixel3… you now have to take the derivative of this with respect to w1, then y = z1*w21 + z2*w22… take the derivative of y with respect to z1 etc. The calculus can be a bit too heavy for a comment like this.
      I’ll highly reccomend you watch the video by 3blue1brown: ruclips.net/video/tIeHLnjs5U8/видео.htmlsi=Z6asTm87XWcW1bVn 😃

  • @objectobjectobject4707
    @objectobjectobject4707 Год назад

    Great example thanks a lot

  • @manfredbogner9799
    @manfredbogner9799 3 месяца назад

    Sehr gut

  • @akshchaudhary5444
    @akshchaudhary5444 Год назад

    amazing video thanks!

  • @ziligao7594
    @ziligao7594 8 месяцев назад

    Amazing

  • @govindnair5407
    @govindnair5407 10 месяцев назад

    What is the loss function here, and how are the values in the flattened z matrix used to compute yhat ?

  • @samiswilf
    @samiswilf Год назад

    Well done.

  • @r0cketRacoon
    @r0cketRacoon 6 месяцев назад

    tks u very much for this video, but it's probably more helpful if you also add a max pooling layer.

  • @SiddhantSharma181
    @SiddhantSharma181 7 месяцев назад

    Is the stride only along the rows, and not along columns? Is is common or just simplified?

  • @piyushkumar-wg8cv
    @piyushkumar-wg8cv Год назад +1

    Great explanation. Can you please tell which tool do you use for making these videos.

    • @far1din
      @far1din  Год назад

      Thank you my friend! I use manim 😃
      www.manim.community

  • @yuqianglin4514
    @yuqianglin4514 Год назад

    fab video! help me a lot

    • @far1din
      @far1din  Год назад

      Glad to hear that you got some value out of this video! :D

  • @MohamedBENELMOSTAPHA-l4v
    @MohamedBENELMOSTAPHA-l4v 10 месяцев назад

    I've had no trouble learning about the 'vanilla' neural networks. Although your videos are great, I can't seem to find resources that delve a little deeper into the explanations of how CNNs work. Are there any resources you would recommend ?

  • @ManishKumar-pb9gu
    @ManishKumar-pb9gu Год назад

    thanku you so much for this

  • @lordcasper3357
    @lordcasper3357 Месяц назад

    thanks boss

  • @bnnbrabnn9142
    @bnnbrabnn9142 10 месяцев назад

    What about the weights of the fully connected layer

    • @mandy11254
      @mandy11254 8 месяцев назад

      No point in adding it to this video since that's something you should know from neural networks. That's why he just leaves it as dL/dzi.

  • @vishvadoshi976
    @vishvadoshi976 4 месяца назад

    “Beautiful, isn’t it?”

  • @im-Anarchy
    @im-Anarchy Год назад +2

    perfect, one suggestion make videos a little longer 20-30 is a good number

    • @far1din
      @far1din  Год назад +1

      Haha, most people don't like these kind of videos too long. Average watchtime for this video is about 3minutes :P

    • @im-Anarchy
      @im-Anarchy Год назад +1

      ​@@far1din​oh shii! 3 minutes, that was very unexpected, maybe it's because people revisit the video to revise specific topic.

    • @far1din
      @far1din  Год назад

      Must be 💯

  • @OmidDavoudnia
    @OmidDavoudnia 9 месяцев назад

    Thanks.

  • @simbol5638
    @simbol5638 Год назад

    +1 sub, excellent video

  • @MoeQ_
    @MoeQ_ Год назад +2

    dL/dzi = ??

    • @far1din
      @far1din  Год назад +1

      I explain the term at 4:17.
      It might be a bit unclear so I’ll highly reccomend you watch the video from 3blue1brown: ruclips.net/video/tIeHLnjs5U8/видео.htmlsi=Z6asTm87XWcW1bVn 😃

  • @PeakyBlinder-lz2gh
    @PeakyBlinder-lz2gh Год назад

    thx

  • @burerabiya7866
    @burerabiya7866 Год назад

    Hello well explained. I need your presentation

    • @far1din
      @far1din  Год назад

      Just download it 😂

  • @Тима-щ2ю
    @Тима-щ2ю Год назад

    You have nices videos, that helped me better understand the concept of CNN. But, from this video, it is not really obvious that matrix dL/dw - is convolution of image matrix and dL/dz matrix, as showed here ruclips.net/video/Pn7RK7tofPg/видео.html. The stride of two is also a little bit confusing

    • @far1din
      @far1din  Год назад

      Thank you for the comment! I believe he is doing the exact same thing (?)
      I chose to have a stride of two in order to highlight that the stride should be similar to the stride used during the forward propagation. Most examples stick with a stride of one. I now realize it might have caused some confusion :p

  • @minhnguyenvu9479
    @minhnguyenvu9479 2 месяца назад

    original is a matrix of 5x5, kernel is a matrix of 3x3, then output must be a matrix of (5-3+1) x (5-3+1) or 3x3, not 2x2 as your video

    • @far1din
      @far1din  2 месяца назад

      The stride used on the example in this video is 2, hence the 2x2 output. You would have been correct if the stride was 1 😄

  • @CorruptMem
    @CorruptMem Год назад

    I think it's spelled "Convolution"

    • @far1din
      @far1din  Год назад +1

      Haha thank you! 🚀

  • @int16_t
    @int16_t Год назад

    w^* is an abuse of math notation, but it's convenient.