I will give away a million dollars to you when I have say 10 million dollars so you can continue do good to the community (smiles). You're just amazing Ahmad Bazzi, pls stay alive and dont let these videos disappear from here.
Whenever I get hung up about some hard to understand topic, I remember Ahmad Bazzi waiting you in Ahmad Bazzi; so be relax and enjoy learning something new with him; without any doubt.
I was in pain because of the concept of gradient descent and its relationship with the line of best fit. I just can’t believe that someone can explain it so well. Keep it up!
You are the best teacher ever Ahmad Bazzi... I'm going Andrew Ng ML course and dint understand what the hell was Gradient Descent.. so came to RUclips and found your video...BAM..Thank you so much for doing these videos.... Keep em coming...💜
I just want to thank you so much that I can't even express it! You have done such a good job on explaining this, thank you very much. The problem is that I am in 7th grade and no one knows what a conjugate gradient is (at my school). I am studying deep reinforcement learning (more exactly TRPO), and you are my life saver. I couldn't understand it for 2 weeks, but finally I did, thanks to you!
Thanks for the question. We choose to model our measured data for example g by H theta_QP, so by our own modelling definition this is true. In reality the model may be inaccurate, but the equality holds if H = A^T A
If you're wondering why, when we have least squares, would we want to use gradient descent... the answer is that least squares only works in specific situations and gradient descent can work in many more.
Good explanation but would have been better if you elaborated its formula of why it is used to reach next step. Why is derivative multiplied by learning rate and why it is then substracted from first point value
Very helpful to remember easily n even actually at the last moment without any confusion... thank you sir!...i just have one question for relaxing method the is procedure will same?
Damn it felt like watching a statistics - Data science - Machine learning tutorial from a SpongeBob SquarePants episode! That was interesting and funny at the same time. Well done!
When the derivative is negative, then we need to move to a larger parameter value and the step-size is also negative. Because we need to move to a larger new value, we subtract the negative step size. When the derivative is positive, then we need to move to a smaller parameter value and the step-size is also positive. Because we need to move to a smaller new value, we subtract the positive step-size.
In theory, the function could equal 0. However, in practice, we do gradient descent on computers and computers have rounding errors and due to rounding errors and whatnot, the function never equals 0 exactly.
This video really helped me clear it up and your example was very simple and useful! I am confused as to why you made beta = 0.707... what made you choose this? is it just a standard?
While considering sum of the squares of the residuals as the loss function, why can't we just equate the slope of loss function to zero to get best intercept instead of plugging in many intercepts and check
I am assuming H = A^T A (where A is a forward model, such as a discrete Radon transform to a sinogram for an imaging example). But A, and so H, will depend on your model of the mean of the measured data. It can vary a lot according to the problem being solved of course.
BAM!!! Great explanation of gradient descent. I too have a doubt on this, does the readymade packages of python and R like sklearn use gradient descent for calculating the linear regression slope and intercepts.
Hey Ahmad Bazzi , @19:20 the last two elements of the derivative with slope shouldn't have a power of 2. Instead, they should be to the power of 1. Please respond if my understanding is correct. Thanks!
The math for gradient descent is pretty simple, all you really need to understand is The Chain Rule: ruclips.net/video/wl1myxrtQHQ/видео.html - and The Chain Rule comes up so much in Machine Learning that it really is worth learning. However, if you really don't want to learn it, you can get by by just understanding the concepts.
I just have one question. You said that we don't know Theta_QP but we know that H Theta_QP is the measured data vector g. Where do we know that from? It would be really amazing if you could anwser this question
I just want to thank you so much that I can't even express it! You have done such a good job on explaining this, thank you very much. The problem is that I am in 7th grade and no one knows what a conjugate gradient is (at my school). I am studying deep reinforcement learning (more exactly TRPO), and you are my life saver. I couldn't understand it for 2 weeks, but finally I did, thanks to you!
You explain things so slow and step-by-step as it was being explained for chimps. Exactly what i needed. Thank you a lot.
I will give away a million dollars to you when I have say 10 million dollars so you can continue do good to the community (smiles). You're just amazing Ahmad Bazzi, pls stay alive and dont let these videos disappear from here.
God bless you boss. I have really struggled today for close to 24 hours in search for g2bbage online till I got your video.
Whenever I get hung up about some hard to understand topic, I remember Ahmad Bazzi waiting you in Ahmad Bazzi; so be relax and enjoy learning something new with him; without any doubt.
Thanks so much for your feedback, really appreciated!
After all these years I finally understand the magic behind gradients! Thanks!
I swear this is the most clear and FANTASTIC explanation I've ever found
Amazing explanation! Greetings from Brazil!
Very well explained and easy to follow. Thank you very much Sir.
I was in pain because of the concept of gradient descent and its relationship with the line of best fit. I just can’t believe that someone can explain it so well. Keep it up!
One of the best lectures here on RUclips !
Best Video of gradient descent I've ever found in the universe!! Thanks for saving my life
god i am so glad you labeled stuff like step size and direction, my teacher didnt and all it did was waste like 30 mins of my time
Sir, your lectures are awesome! Thank you very much.
Superb explanation sir. Love from India.
What an amazing animation
The best one on the internet
You are the best teacher ever Ahmad Bazzi... I'm going Andrew Ng ML course and dint understand what the hell was Gradient Descent.. so came to RUclips and found your video...BAM..Thank you so much for doing these videos.... Keep em coming...💜
Simple and clear ... Yet need more detailing ...!!!!
I just want to thank you so much that I can't even express it! You have done such a good job on explaining this, thank you very much. The problem is that I am in 7th grade and no one knows what a conjugate gradient is (at my school). I am studying deep reinforcement learning (more exactly TRPO), and you are my life saver. I couldn't understand it for 2 weeks, but finally I did, thanks to you!
Glad to hear it!
That was a great explanation of Gradient descent and in an amenable way !! Thank you for the great video!
Hello from the USA 🇺🇸
Very nice explanation!!
God level Explanation... 😍😍😍😍😍😍😍😍😍😍😍😍😍😍😍😍
Great video that explains gradient descent perfectly
Thanks for the question. We choose to model our measured data for example g by H theta_QP, so by our own modelling definition this is true. In reality the model may be inaccurate, but the equality holds if H = A^T A
Thank you for the comprehensive lecture, Professor Ahmad Bazzi :)
very good explanation.
Glad it was helpful!
awesome explanation ...
Great tutorial.
each and every notation is explained serially, step by step with its meaning and relation to the problem
If you're wondering why, when we have least squares, would we want to use gradient descent... the answer is that least squares only works in specific situations and gradient descent can work in many more.
Great sir. ...👍👍💯
Interesting!!
Thanks Moustafa ! Glad you found it interesting !
Awesome lecture and tutorial going into the details of Armijo and newton.
Excellent.
Great...👌👌👌
Thank you Ahmad !
Good explanation but would have been better if you elaborated its formula of why it is used to reach next step. Why is derivative multiplied by learning rate and why it is then substracted from first point value
Greetings from Germany 🇩🇪
Very helpful to remember easily n even actually at the last moment without any confusion... thank you sir!...i just have one question for relaxing method the is procedure will same?
Damn it felt like watching a statistics - Data science - Machine learning tutorial from a SpongeBob SquarePants episode! That was interesting and funny at the same time. Well done!
When the derivative is negative, then we need to move to a larger parameter value and the step-size is also negative. Because we need to move to a larger new value, we subtract the negative step size. When the derivative is positive, then we need to move to a smaller parameter value and the step-size is also positive. Because we need to move to a smaller new value, we subtract the positive step-size.
Thanks Ahmad !
In theory, the function could equal 0. However, in practice, we do gradient descent on computers and computers have rounding errors and due to rounding errors and whatnot, the function never equals 0 exactly.
Gut gemacht. Danke!
Hooray! :)
You are a Hero
very clear
You are a superhero
Thanks in advance!
Wow, thanks!
Thank you!
Thank you very much! :)
Thank you so much 😀
May you get all the resources you need to keep making these videos.
4:52 WOW ! What on earth did I just see ?
it will give the intercept value immediately
Need more sir
Thanks!
@Ahmad Bazzi Ok thank you will watch it and let you know whether or not my confusion is finished
It is given. alpha lies between 0 and 0.5 and beta lies between 0 and 1.
Ahmad Bazzi thanks a lot ❤️
Thanks! :)
This video really helped me clear it up and your example was very simple and useful! I am confused as to why you made beta = 0.707... what made you choose this? is it just a standard?
That's a known typo mentioned in a pinned comment.
Thanks for the great video! One question: 1:03 where do you get or derived that while condition? Can you please provide some materials? Thanks!
Sir can i ask , please do explanation on quasi newton methods.
While considering sum of the squares of the residuals as the loss function, why can't we just equate the slope of loss function to zero to get best intercept instead of plugging in many intercepts and check
I am assuming H = A^T A (where A is a forward model, such as a discrete Radon transform to a sinogram for an imaging example). But A, and so H, will depend on your model of the mean of the measured data. It can vary a lot according to the problem being solved of course.
Should he need to wear the subjective refraction results when doing the pct at far ?
BAM!!! Great explanation of gradient descent. I too have a doubt on this, does the readymade packages of python and R like sklearn use gradient descent for calculating the linear regression slope and intercepts.
didnt explain how do calculate the direction we are moving to (the minus), why the derivatives etc
Hey Ahmad Bazzi , @19:20 the last two elements of the derivative with slope shouldn't have a power of 2. Instead, they should be to the power of 1. Please respond if my understanding is correct. Thanks!
The math for gradient descent is pretty simple, all you really need to understand is The Chain Rule: ruclips.net/video/wl1myxrtQHQ/видео.html - and The Chain Rule comes up so much in Machine Learning that it really is worth learning. However, if you really don't want to learn it, you can get by by just understanding the concepts.
on 9:21 why you didn't use the maxima minima concept and equate derivative to 0?
The direction of minus got be on the opposite no ?
I just have one question. You said that we don't know Theta_QP but we know that H Theta_QP is the measured data vector g. Where do we know that from? It would be really amazing if you could anwser this question
@Ahmad Bazzi, do u have any tecnhical report on gradient descent?
BAM !!!
YOU ARE GOD
How did you find the slope = 0.64?
Why is the slope value taken as 0.64 ?
are u using L2 norm? in l2 norm , its predicted - observed?
BAM!
BAM! :)
Before calculating intercept how did you assume slope as 0.64 (differentiating wrt intercept) and vice versa ? please clarify
Hi Ahmad Bazzi,
OK
Ahmad Bazzi
fucking amazing.
I found a diamond
Explicación para un no-matemático
No one explaines why are we taking this bloody curve as an example.. why intercept and why slope.. why!!!!!!!!
:)
God bless you boss. I have really struggled today for close to 24 hours in search for g2bbage online till I got your video.
Very well explained and easy to follow. Thank you very much Sir.
I just want to thank you so much that I can't even express it! You have done such a good job on explaining this, thank you very much. The problem is that I am in 7th grade and no one knows what a conjugate gradient is (at my school). I am studying deep reinforcement learning (more exactly TRPO), and you are my life saver. I couldn't understand it for 2 weeks, but finally I did, thanks to you!