This is exactly what you need when you study Back-propagation, this is a fundamental understanding of how it works. I hope RUclips algorithm can push this video to more people.
This is by far the best video on backpropagation on yt with regards to the math that is actually happening and not just explaining the "idea" of backpropagation. Absolutely underrated.
This video is the best version of explanation anyone can get to understand what backward propagation actually is. I wish I had this video 3 years earlier lol
Hi, I just want you to know that: you are one of the best teachers on RUclips can clearly explain these hard materials and transform them in a simple way.
My god, that explanation of the chain rule blew my mind. You have such a gift being able to explain seemingly complex topics so intuitively, in all of your videos. You deserve many more subscribers.
Man you are so good. I love the fact that you start in first principle and you define things Mathematically instead of using analogies. Thanks a lot man and may God bless you and your hustle.
This is like the 100th video I'm watching, and I can tell he's actually trying to make us understand. I wish I can talk to you and ask you what's not clear to me, thanks for the attempt!! I still don't understand!
Actually, it is only the beginning of my road in Data Science and Time Series Forecasting, but your videos saved my life throughout my whole life in uni! These are the most easy to understand and clearest videos that I have seen, with no doubts! Please keep up your work, it is extremely necessary for people like us, we are very grateful and appreciate it
It may be worth to note that instead of partial derivatives one can work with derivatives as the linear transformations they really are. Also, looking at the networks in a more structured manner makes clear that the basic ideas of BPP apply to very general types of neural networks. Several steps are involved. 1.- More general processing units. Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong, beyond Euclidean spaces, to any Hilbert space. Derivatives are linear transformations and the derivative of a neural processing unit is the direct sum of its partial derivatives with respect to the inputs and with respect to the weights. This is a linear transformation expressed as the sum of its restrictions to a pair of complementary linear subspaces. 2.- More general layers (any number of units). Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a unique layer is equivalent to taking their product (as functions, in the sense of set theory). The layers are functions of the of inputs and of the weights of the totality of the units. The derivative of a layer is then the product of the derivatives of the units; this is a product of linear transformations. 3.- Networks with any number of layers. A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers; this is a composition of linear transformations. 4.- Quadratic error of a function. ... --- With the additional text down below this is going to be excessively long. Hence I will stop the itemized previous comments. The point is that a sufficiently general, precise and manageable foundation for NNs clarifies many aspects of BPP. If you are interested in the full story and have some familiarity with Hilbert spaces please google for our paper dealing with Backpropagation in Hilbert spaces. A related article with matrix formulas for backpropagation on semilinear networks is also available. We have developed a completely new deep learning algorithm called Neural Network Builder (NNB) which is orders of magnitude more efficient, controllable, precise and faster than BPP. The NNB algorithm assumes the following guiding principle: The neural networks that recognize given data, that is, the “solution networks”, should depend only on the training data vectors. Optionally the solution network may also depend on parameters that specify the distances of the training vectors to the decision boundaries, as chosen by the user and up to the theoretically possible maximum. The parameters specify the width of chosen strips that enclose decision boundaries, from which strips the data vectors must stay away. When using the traditional BPP the solution network depends, besides the training vectors, in guessing a more or less arbitrary initial network architecture and initial weights. Such is not the case with the NNB algorithm. With the NNB algorithm the network architecture and the initial (same as the final) weights of the solution network depend only on the data vectors and on the decision parameters. No modification of weights, whether incremental or otherwise, need to be done. For a glimpse into the NNB algorithm, search in this platform our video about : NNB Deep Learning Without Backpropagation. In the description of the video links to a free demo software will be found. The new algorithm is based on the following very general and powerful result (google it): Polyhedrons and Perceptrons Are Functionally Equivalent. For the conceptual basis of general NNs in see our article Neural Network Formalism. Regards, Daniel Crespin
Everthing you explained is so simple and intuitive! I suffered from this semester's machine learning and you really are the SAVIOR of this course! Thank you sooooo much!
I am currently doing the MIT Statistics and Data Science Micromaster and a few times already I have relied on your videos for a clearer, more high-level explanation of certain concepts. Even when I already understand the concepts, like it is the case with this issue of backpropagation, I often find it useful to watch them simply in order to strengthen and reinforce my intuition. So, thank you. You do good work!
This video is the best one I've seen until now and I've fully understood the whole process in very intuitive way that actually makes sense thanks for making this video
these are great videos, I save my time watching these and understand basic concepts, rather that scratching around internet to find what i need. Thanks ritvikmath
Very good explanation! The right balance between building intuition and formulas! As a minor improvement I think it could help to finally show the derivatives for each weight so it can be seen how the terms repeat backwards in the formulas.
Great explanation, no surprise it's ranked #3 on backpropagation key word search on RUclips just after 3blue1brown and statquest videos on the subject. Nice work 👍
Big Big Big like!!! Really you are a very good teacher. I hope I can do like this in my language so the students will benefit and understand the concept of Backpropogation!
useful video, thanks. it would be helpful to also have explained the practical aspects of the training algos...forward prop vs back prop, epochs, batch vs incremental modes, etc. probably more of a ds code than ds concepts topic.
I never comment on any RUclips video but I have to say your clear communication on describing the core concepts is simply amazing. I especially appreciate you walking through every little piece of information and focusing on the intuition which helps the formulas look less daunting. I will definitely forward your explanations to anyone I know learning these topics. Thank you and keep up the amazing videos.
Beautiful explanation. I have always believed the people whose own concepts are as clear as crystal also happen to be the best explainers. Your channel proves my intuition right. Kudos.
Heh, bless You mate! Thank you for using a simple language with no ‘Fancy’ words. You and ‘3Blue1Brown’ give a way better understanding of ML, compared to my program in Uni. You are making the world a better place!
Excellent video once again! For real application purpose, perhaps you can go over how to deal with imbalance data (eg., undersampling, oversampling, SMOTE) .
I feel like it's not done. So we understand the idea. Exactly how does it work? You already have that simple NN, why not go into the algorithm steps to see the effect of what you explained? i.e., what to do with the derivatives? How do they help improve on the weights (and biases)?
Mister I can not find any video you made about partial derivatives and they show up all the time in deep learning! Can you make one please? Much appreciate what you do.
So is it fair to just say that gradient descent is just the method of parameter optimization we're using? I took an optimization course in college and remember learning about things like Newton's method, Lagrangians, etc. It'd explain the connection between the two topics very nicely
little mistake : you count 9 weights at 1:51... Problem is that you only have 6 weights if you have 2 inputs, 2 hiddens and 1 output. You got confused with your (+1) that you traced with a circle while it's not an input nor a neuron. It's your bias... or you count the bias as another weight which can explain the count of 9
thank you for this video, it was really helpful to understand the backpropagation, have you talked in another video about "direct propagation"? And i have a question, why do we prefer back propagation on direct propagation?
During back propagation do you do a forward pass after stepping back each layer to get a new error OR do you go back through all layers then update all weights then do a new forward pass?
This is exactly what you need when you study Back-propagation, this is a fundamental understanding of how it works. I hope RUclips algorithm can push this video to more people.
Thanks!
Yes. For the algorithm.
This is by far the best video on backpropagation on yt with regards to the math that is actually happening and not just explaining the "idea" of backpropagation. Absolutely underrated.
This video is the best version of explanation anyone can get to understand what backward propagation actually is. I wish I had this video 3 years earlier lol
thanks!
Hi, I just want you to know that: you are one of the best teachers on RUclips can clearly explain these hard materials and transform them in a simple way.
My god, that explanation of the chain rule blew my mind. You have such a gift being able to explain seemingly complex topics so intuitively, in all of your videos. You deserve many more subscribers.
Man this channel is going to be big someday. Keep it up man!
I appreciate that!
Man you are so good. I love the fact that you start in first principle and you define things Mathematically instead of using analogies. Thanks a lot man and may God bless you and your hustle.
This is like the 100th video I'm watching, and I can tell he's actually trying to make us understand.
I wish I can talk to you and ask you what's not clear to me, thanks for the attempt!! I still don't understand!
Actually, it is only the beginning of my road in Data Science and Time Series Forecasting, but your videos saved my life throughout my whole life in uni! These are the most easy to understand and clearest videos that I have seen, with no doubts! Please keep up your work, it is extremely necessary for people like us, we are very grateful and appreciate it
Wow finally i understood backpropagation....i think everyone can understand if u have the right teacher.It takes two to tangle
It may be worth to note that instead of partial derivatives one can work with derivatives as the linear transformations they really are.
Also, looking at the networks in a more structured manner makes clear that the basic ideas of BPP apply to very general types of neural networks. Several steps are involved.
1.- More general processing units.
Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong, beyond Euclidean spaces, to any Hilbert space. Derivatives are linear transformations and the derivative of a neural processing unit is the direct sum of its partial derivatives with respect to the inputs and with respect to the weights. This is a linear transformation expressed as the sum of its restrictions to a pair of complementary linear subspaces.
2.- More general layers (any number of units).
Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a unique layer is equivalent to taking their product (as functions, in the sense of set theory). The layers are functions of the of inputs and of the weights of the totality of the units. The derivative of a layer is then the product of the derivatives of the units; this is a product of linear transformations.
3.- Networks with any number of layers.
A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers; this is a composition of linear transformations.
4.- Quadratic error of a function.
...
---
With the additional text down below this is going to be excessively long. Hence I will stop the itemized previous comments.
The point is that a sufficiently general, precise and manageable foundation for NNs clarifies many aspects of BPP.
If you are interested in the full story and have some familiarity with Hilbert spaces please google for our paper dealing with Backpropagation in Hilbert spaces. A related article with matrix formulas for backpropagation on semilinear networks is also available.
We have developed a completely new deep learning algorithm called Neural Network Builder (NNB) which is orders of magnitude more efficient, controllable, precise and faster than BPP.
The NNB algorithm assumes the following guiding principle:
The neural networks that recognize given data, that is, the “solution networks”, should depend only on the training data vectors.
Optionally the solution network may also depend on parameters that specify the distances of the training vectors to the decision boundaries, as chosen by the user and up to the theoretically possible maximum. The parameters specify the width of chosen strips that enclose decision boundaries, from which strips the data vectors must stay away.
When using the traditional BPP the solution network depends, besides the training vectors, in guessing a more or less arbitrary initial network architecture and initial weights. Such is not the case with the NNB algorithm.
With the NNB algorithm the network architecture and the initial (same as the final) weights of the solution network depend only on the data vectors and on the decision parameters. No modification of weights, whether incremental or otherwise, need to be done.
For a glimpse into the NNB algorithm, search in this platform our video about :
NNB Deep Learning Without Backpropagation.
In the description of the video links to a free demo software will be found.
The new algorithm is based on the following very general and powerful result (google it): Polyhedrons and Perceptrons Are Functionally Equivalent.
For the conceptual basis of general NNs in see our article Neural Network Formalism.
Regards,
Daniel Crespin
Everthing you explained is so simple and intuitive! I suffered from this semester's machine learning and you really are the SAVIOR of this course! Thank you sooooo much!
You and three blue brown. Best Math RUclipsrs out there.
This is simply amazing. The chain rule explanation unlocked the understanding. Thank you sir!
This is the most helpful intro video on back propagation I've seen.
Best video on BP I've watched. Thanks for putting it together so nicely. Cheers
You are exceptionally good at explaining difficult/intricate subjects clearly! Thank you for doing this!
You're very welcome!
Have ML exam in two days, and i'm enjoying these videos a lot. Very clear explenations, thankyou !!!!!
Great addition to your channel ... thanks for uploading
Glad you enjoy it!
Great video. May “The Algorithm” present this to many more people!
I am currently doing the MIT Statistics and Data Science Micromaster and a few times already I have relied on your videos for a clearer, more high-level explanation of certain concepts. Even when I already understand the concepts, like it is the case with this issue of backpropagation, I often find it useful to watch them simply in order to strengthen and reinforce my intuition. So, thank you. You do good work!
All I can say is... you deserve a big hug. Fantastic Teaching
Now this is some explanation, thankyou sir for not just teaching the math operations like everyone else
This video is the best one I've seen until now and I've fully understood the whole process in very intuitive way that actually makes sense thanks for making this video
Wow, thanks!
This is the best video of back propagation that I have watched!
Articulate to the core, you're gifted man...
Thanks I appreciate it!
Definitely better than my Coursera course. Worth watching ads for the quality video!
I really like to go through everything mathematically___
And i found you doing that job..
It's great
Thank you. I finally understand back propagation
Glad!
Chain rule explanation is like, just Wow...Amazing.
thanks!
Your videos are immensely helpful. I love watching you explaining complex concepts in the most simple manner. Thank you so much.
This is my first time actually understanding this! Thank you!
Thank goodness I did calc a few years ago . . .
these are great videos, I save my time watching these and understand basic concepts, rather that scratching around internet to find what i need. Thanks ritvikmath
I really like the way you explain it. simple but easy to understand
The explanation on caching was really helpful.
good to hear!
Best video i have seen about backpropagation, thank you very much
Thank you so so much for your explanation 🙏🏿 I think that I finally understand how backpropagation works. God bless you 🙏🏿
What a great explanation!!! Such clear explanation!!! Thank you for teachingso beautifully, Ritvik. I feel fortunate to come across this video.
Very good explanation! The right balance between building intuition and formulas! As a minor improvement I think it could help to finally show the derivatives for each weight so it can be seen how the terms repeat backwards in the formulas.
Great explanation, no surprise it's ranked #3 on backpropagation key word search on RUclips just after 3blue1brown and statquest videos on the subject. Nice work 👍
Hands down the best explanation of backpropagation. Thanks for making these videos! Do you have a patreon or something to support you?
Big Big Big like!!! Really you are a very good teacher. I hope I can do like this in my language so the students will benefit and understand the concept of Backpropogation!
useful video, thanks. it would be helpful to also have explained the practical aspects of the training algos...forward prop vs back prop, epochs, batch vs incremental modes, etc. probably more of a ds code than ds concepts topic.
Hey great suggestion thanks!
I never comment on any RUclips video but I have to say your clear communication on describing the core concepts is simply amazing. I especially appreciate you walking through every little piece of information and focusing on the intuition which helps the formulas look less daunting. I will definitely forward your explanations to anyone I know learning these topics. Thank you and keep up the amazing videos.
Beautiful explanation. I have always believed the people whose own concepts are as clear as crystal also happen to be the best explainers. Your channel proves my intuition right. Kudos.
Heh, bless You mate! Thank you for using a simple language with no ‘Fancy’ words. You and ‘3Blue1Brown’ give a way better understanding of ML, compared to my program in Uni. You are making the world a better place!
Very well explained. Much appreciated!
Guys, hit the like button, we need teachers like him, the likes button will help them to stay and create more videos!!!!
Greatly helpful and informative video, thank you very much!
The beauty of this video
Every lesson you teach inspires me. You are the best professor I have ever experienced. Thank you!
excellent class! the best one for me. very intuitive!
You sir, are a legend. Thanks !
God Level Explanation
Glad you think so!
Wonderful explanation! Thank you.
crystal clear explaination . Thanks a lott!!
Best explanation everrr!!
excellent explanation, thank you very much!
Excellent video once again! For real application purpose, perhaps you can go over how to deal with imbalance data (eg., undersampling, oversampling, SMOTE) .
I feel like it's not done. So we understand the idea. Exactly how does it work? You already have that simple NN, why not go into the algorithm steps to see the effect of what you explained? i.e., what to do with the derivatives? How do they help improve on the weights (and biases)?
Mister I can not find any video you made about partial derivatives and they show up all the time in deep learning! Can you make one please? Much appreciate what you do.
Appreciated man the way of presentation
So is it fair to just say that gradient descent is just the method of parameter optimization we're using? I took an optimization course in college and remember learning about things like Newton's method, Lagrangians, etc. It'd explain the connection between the two topics very nicely
This was beautiful
How does the math change with multiple hidden layers? How do you compute the partial derivatives for the 3rd layer going into the 2nd layer?
Great explanation!
Your explanation is awesome!
Can you make a video on the next step as well i.e, gradient descent and finding the minimum error?
A gradient descent video is coming out soon! Stay tuned :)
Perfect! Thank you!
Glad it helped!
little mistake : you count 9 weights at 1:51... Problem is that you only have 6 weights if you have 2 inputs, 2 hiddens and 1 output. You got confused with your (+1) that you traced with a circle while it's not an input nor a neuron. It's your bias... or you count the bias as another weight which can explain the count of 9
thank you for this video, it was really helpful to understand the backpropagation, have you talked in another video about "direct propagation"? And i have a question, why do we prefer back propagation on direct propagation?
During back propagation do you do a forward pass after stepping back each layer to get a new error OR do you go back through all layers then update all weights then do a new forward pass?
Can we assume that h2 is also calculated the way h1. I mean you from statement should have included both h1 and h2 .
Thank you for speaking clearly. I can’t understand all the Indians.
Fantastic job!
Thank you! Cheers!
Maybe it would be better if you also explained the whole process in the neural network more detailed in the end.
Million thanks!
You have great content!!
Why does the output of sigma point to h sub 2?
well done- thanks
thanks for sharing
Awsome❤
thank u 💕
You're welcome 😊
Waiting for you to drop that marker
haha!
🐐
WOW
.
PLEASE DON'T DO MORE VIDS. YOU GET YOUR CIRCUS CLOWN
It was the horribly worst way of explaining something ever. Bro, you explained nothing, you just put those notations into words.