Learn about back propagation if you want to learn about vanishing gradient. Skip connection means not skipping a layer. It actually does identity operation with current and previous layer. Then if you have smaller networks how can you figure out higher level features from images.
Well, how skipping works is supposed you have a layer and it gives out outputs in form of activation functions. Now let's assume 4 streams of outputs are flowing out of one layer. Now with the help of the dropout regularization method, we restrict any 2 streams randomly. And they will keep on changing. For example on iteration (epoch) one, output streams 2 and 3 are restricted and on epoch/iteration two, 3 and 1 are restricted and so on. This, in turn, forces the model not to be dependent on one particular stream of inputs/data to learn and makes the system more robust and prevents overfitting. And because it prevents overfitting. The vanishing gradient problem is automatically corrected. For an accurate explanation, you first need to understand the backpropagation because the vanishing gradients problem is directly associated with derivatives of the activation function.
Some more details would have helped, i.e. how the sizes of last and the one before the last differ, and how they are still added. This visualization was not too useful to me, the usual diagram on paper works equivalently for me.
How skipping solves the vanishing gradient problem? What is means to skip a layer exactly? Why can't we simply remove it and build a smaller network?
That's exactly what I asked myself while wathching this!
Learn about back propagation if you want to learn about vanishing gradient. Skip connection means not skipping a layer. It actually does identity operation with current and previous layer. Then if you have smaller networks how can you figure out higher level features from images.
Well, how skipping works is supposed you have a layer and it gives out outputs in form of activation functions. Now let's assume 4 streams of outputs are flowing out of one layer. Now with the help of the dropout regularization method, we restrict any 2 streams randomly. And they will keep on changing. For example on iteration (epoch) one, output streams 2 and 3 are restricted and on epoch/iteration two, 3 and 1 are restricted and so on. This, in turn, forces the model not to be dependent on one particular stream of inputs/data to learn and makes the system more robust and prevents overfitting. And because it prevents overfitting. The vanishing gradient problem is automatically corrected. For an accurate explanation, you first need to understand the backpropagation because the vanishing gradients problem is directly associated with derivatives of the activation function.
Great video! Is the best fast introduction to resnets that I found in youtube. It is strange the it has only 5k visits
Some more details would have helped, i.e. how the sizes of last and the one before the last differ, and how they are still added. This visualization was not too useful to me, the usual diagram on paper works equivalently for me.
Spot on!
best introduction to resnets!
Great video, exactly gave me the introduction I needed. Thanks.
this is incredible ! thanks for the explanation
Pleaseee moree like these it is amaizing
Why would people want suggestions of reading in video? A detailed explanation would be appreciated.
I felt this info to be shallow... Why not do a more detailed video?
This video is very nice
which is better? ResNet or EfficientNet ?
if you want something light, efficientnet is better by a mile
Unfortunately this video does everything but explain the core concepts behind RESNET.
goodluck
Poorly explained.
Why so little view for this amazing lecture? 🥲🥲🥲