Getting stuck at a Local Minima has very little to do with Gradient Descent, but more with the shape of Loss function we choose to optimize, the loss function chosen in this video seems to be a MSE, which has extremely convex structure, there's no local minima in the shape of this convex function, hence there are no plateaus where the G.D algorithm will get stuck. If we iterate on a convex function for long enough, with right convex function and right learning rate, there's no reason why G.D will get stuck in local minima. All other optimization algorithms help speed up the performance ( running time ) of G.D but not solve drastically the problem of getting stuck at local minima. "Hands on Machine Learning with scikit Learn and Tensorflow" is a much-much better resource in understanding G.D and back propagation than this video.
Getting stuck at a Local Minima has very little to do with Gradient Descent, but more with the shape of Loss function we choose to optimize, the loss function chosen in this video seems to be a MSE, which has extremely convex structure, there's no local minima in the shape of this convex function, hence there are no plateaus where the G.D algorithm will get stuck. If we iterate on a convex function for long enough, with right convex function and right learning rate, there's no reason why G.D will get stuck in local minima. All other optimization algorithms help speed up the performance ( running time ) of G.D but not solve drastically the problem of getting stuck at local minima. "Hands on Machine Learning with scikit Learn and Tensorflow" is a much-much better resource in understanding G.D and back propagation than this video.
thank you for the great lecture madam. Watching it during Corona virus times :)
great lecture mam. can we get the slides?
how do you pass class Label as input in the neural network in the classification problem like suppose class are : car , scooter, bike?
mam if there are more than two features say n features the decision boundary will be a hyper plane.......please clarify..thnaks in advance
doubt:26:42 why delta w(i,j) having input value instead observed output value.
Excellent.....
I watched the whole thing for LSTM, there is no mention. Sigh :-(
good job
Awesome