what's J in this? Y values? I'm super confused about this d/dm of m, cz it would be just 1. and m I think is just total number of values. Shouldn't the slope be d/dx of y?
Why am I not surprised with such a lucid and amazing explanation of cost function, gradient descent,Global minima, learning rate ...may be because watching you making complex things seems easy and normal has been one of my habit. Thank you SIR
A small comment at 17:35. I guess it is Derivative of J(m) over m. In other words, the rate of change of J(m) over a minute change of m. That gives us the slope at instantaneous points, especially for non linear curves when slope is not constant. At each point of "m, J(m)", Gradient descent travels in the opposite direction of slope to find the Global minima, with the smaller learning rate. Please correct me if I am missing something. Thanks for a wonderful video on this concept @Krish, your videos are very helpful to understand the Math intuition behind the concepts, I am a super beneficiary of your videos, Huge respect!!.
The video was really great. But I would like to point out that the derivative that you took for convergence theorem, there instead of (dm/dm) it should be derivative of cost function with respect to m . Also a little suggestion at the end it would have been helpful, if you mentioned what m was, total number of points or the slope of the best fit line. Apart from this the video helped me a lot hope you add a text somewhere in this video to help the others.
Hi Krish, Thanks for the video. Some queries/clarifications required: 1. We do not take gradient of m wrt m. That will always be 1. We take the gradient of J wrt m 2. If we have already calculated the cost function J at multiple values of m, then why do we need to do gradient descent because we already know the m where J is minimum 3. So we start with an m , calculate grad(J) at that point and update m with m' = m - grad(J)* learn_rate and repeat till we reach some convergence criteria Please let me know if my understanding is correct.
The trials of slope selections go until the cost function reaches the local minima point ....and for intercept there are some random initialization techniques through which a fixed value is set for intercept....
How can I not say that you are amazing !! I was struggling to understand the importance of gradient descent and u cleared it to me in the simplest way possible.. Thank you so much sir :)
sir i can' find the simple regression and multiple regression video as u said and some videos are little jumbled its getting difficult to follow the videos and plz do explain the functionalities of each and every keyword or a inbuilt function when ur explaining the code...ofcourse ur explaining in a very good way but i faced a liitle problem while folllowing that practical implementation of univariate,multivariate,and bivariate analysis(there you have used FACETGRID function)..so will u plz expalin me what is the exact use of facetgrid...?
@@Gayathri-jo4ho This playlist itself is a fantastic place to start, Or can enroll in this course "Machine Learning A-Z by krill eremenkrov" in udemy. The course will give you an intuitive understanding of the ML Algorithms. Then it's up to you to research and study the math behind each concept..Reff (kgnuggets, Medium, MachineLearningplus and lot more)
In 22:50 time sir said when it reaches to global minima the slope value will be 0 and And the value of m will be considered for best fit line but the value of slope and m is same.Please clear doubt @krishan Naik sir
Really thanks you krish. you just cleared my doubts on cost function and gradient descent. First I saw Andrew Ng class but have few doubts after seeing you video. Now its crystal clear.. Thank You...
Hi . Can you please do a video about the architecture of machine learning systems in real world . How does really work in real life .for example how hadop (pig,hive) , spark, flask , Cassandra , tableau are all integrated to create a machine learning architecture. Like an e2e
Before watching this video I was struggling with the concepts exactly like you were struggling in plotting the gradient descent curve. ☺️Thanks for explaining this beautifully.
I think in the Convergence theorem part, the derivative should be d(J(m))/d(m), as in a y-x graph, we take derivative of y wrt x. Here our Y is J(m) and X is m.
The graph of the cost function is not gradient descent. The automatic differentiation of cost function with respect to m is gradient decent which is used to update the m.
Please add the indepth math intution of other algorithms like logistic, random forest, support vector and ANN.. Many Thanks for the clearly explained abt linear regression
Hi krish, that was an awesome explanation of Gradient Descent. With respect to finding the optimal slope. But in linear regression both slope and the intercept are tweakable parameters, how do we achive the optimal intercept value in linear regression.
At 22:12 , why slope will be 0 ...At global minimum slope is 1 and cost function ( absolute square error) is 0 .... Isn't it sir ? Excuse , if I'm wrong .🙏
Sir, what if our problem statement does not reach the global minima when C=0 is considered. And how will our algorithm come to know that the C=0 condition is not sufficient for the best fit line?
For different independent variables, we would have that many gradient descent. Individually, using convergence theorem, we would get global minimum, but how we are going to find the best fit combining all???
I am working in some company with bpm domain... I have no idea about programming but some how I manage to create interrest in ML... The best part is I just want to learn it to enhance my knowledge and I m ready to work for free... If you can suggest something will help...
Dear Krish: At 14:42' you mention that curve is called gradient descent. I believe this is not true. Gradient descent is not the name of that curve. Gradient descent is an optimization algorithm.
Thank you for sharing this insightful video about linear regression. While I found it informative, I'm uncertain about how it addresses the challenge of avoiding local minima. I'd greatly appreciate it if you could provide some insights on this aspect as well.
Hi Sir, I am from cloud & DevOps background is it make sense to go & learn Ml AI, what path I can follow to become a dataops engineer or devops ml ai engineer.
H i sir great content and a big fan of your work let me ask a doubt in cost function many books or blogs takes the cost function as 1/NSUMATION( Y - Y^) BUT you used 1/2N SUMATION( Y - Y^) so i was bit confused in that part and tq u for wonderful content thnak you so much sir
Excellent explanation sir. I have started following your videos for all the ML related topics its very interesting. One doubt = In Gradient Descent, when slope is zero, M value will be considered as the slope of the best file line. I do not understand this. Can you please explain here? Thanks.
krish u are saying here discussed details about simple linear regression in ur previous videos but the previous one actually regarding PDF and CDF . is the playlist is sorted ??
Hi Krish, how to calculate the intercept value as in this we have initialized it to 0 and we have not calculated at the end. We have calculated only slope of best fit line.
See global minimum is nothing but the smallest region .....so suppose you are standing in a hilly areas so there are many ups and downs ....(consider these small downs as your local minima) but at that hilly area one point will be the lowest one which will be much lower than all your other smaller ups and downs .....so this lowest region is known as global minima ....and the main aim of your algorithm is to converge at the lowest point .....as low as possible ....hence we consider global minimum ... I hope you got your doubts clear? 🙂
Best explanation of cost function, we learned it as masters students and the course couldnt explain it as well.. simply brilliant
I never understood what is a gradient descent and a cost function is until I watch this video 🙏🙏
I have seen many teachers explaining the same concept, but your explainations are next level. Best teacher.
For those who are confused.
The convergence derivative will be dJ/dm.
what's J in this? Y values? I'm super confused about this d/dm of m, cz it would be just 1. and m I think is just total number of values. Shouldn't the slope be d/dx of y?
@@tusharikajoshi8410 it will be the cost or loss (J)
new(m) = m- d(loss or cost)/dm * Alpha(learning rate.
Super helpful
I'dont think because it netwons method actually
Why am I not surprised with such a lucid and amazing explanation of cost function, gradient descent,Global minima, learning rate ...may be because watching you making complex things seems easy and normal has been one of my habit. Thank you SIR
I don't see a link on the top right corner for the implementation as you said in the end.
A small comment at 17:35. I guess it is Derivative of J(m) over m. In other words, the rate of change of J(m) over a minute change of m. That gives us the slope at instantaneous points, especially for non linear curves when slope is not constant. At each point of "m, J(m)", Gradient descent travels in the opposite direction of slope to find the Global minima, with the smaller learning rate. Please correct me if I am missing something.
Thanks for a wonderful video on this concept @Krish, your videos are very helpful to understand the Math intuition behind the concepts, I am a super beneficiary of your videos, Huge respect!!.
The video was really great. But I would like to point out that the derivative that you took for convergence theorem, there instead of (dm/dm) it should be derivative of cost function with respect to m . Also a little suggestion at the end it would have been helpful, if you mentioned what m was, total number of points or the slope of the best fit line. Apart from this the video helped me a lot hope you add a text somewhere in this video to help the others.
Really awesome video , so much better than many famous online portals charging huge amount of money to teach things.
Hi Krish, Thanks for the video. Some queries/clarifications required:
1. We do not take gradient of m wrt m. That will always be 1. We take the gradient of J wrt m
2. If we have already calculated the cost function J at multiple values of m, then why do we need to do gradient descent because we already know the m where J is minimum
3. So we start with an m , calculate grad(J) at that point and update m with m' = m - grad(J)* learn_rate and repeat till we reach some convergence criteria
Please let me know if my understanding is correct.
Yes this is correct
I think we have to train the model to reach that min. loss point while performing grad. descent in real life problems.
How to find best Y intercept ?
I knew that their will be an Indian that can make all the stuffs easy !! Thanks Krish
Best video on youtube to understand the intution and math(surface level) behind Linear regression.
Thank you for such great content
This maths is same as coursera machine learning courses
Thank you sir for this great content ..
At 14:56, how do we decide how many slope values to try? and how about selecting intercepts in a certain range?..
The trials of slope selections go until the cost function reaches the local minima point ....and for intercept there are some random initialization techniques through which a fixed value is set for intercept....
How can I not say that you are amazing !! I was struggling to understand the importance of gradient descent and u cleared it to me in the simplest way possible.. Thank you so much sir :)
This is the best stuff i ever came across on this topic !
No one can find easiest explanation of gradient descent on youtube. This video is the exception.
Best explanation of Linear Regression🙏🙏🙏.Simply wow🔥🔥
you just made the whole concept clear with this video,you are a great teacher
Watched this video 3 times back to back .Now its embaded in my mind forever. Thanks Krish , great explanation !!
sir i can' find the simple regression and multiple regression video as u said and some videos are little jumbled its getting difficult
to follow the videos and plz do explain the functionalities of each and every keyword or a inbuilt function when ur explaining the code...ofcourse ur explaining in a very good way but i faced a liitle problem while folllowing that practical implementation of univariate,multivariate,and bivariate analysis(there you have used FACETGRID function)..so will u plz expalin me what is the exact use of facetgrid...?
So beautifully explained...did not find anywhere this kind of clarity....keepnup the good work....
Similar to Andrew NG course from coursera kind of revision for me 😊😊
Can you please suggest me how to begin with in order to learn machine learning
@@ArpitDhamija did you have knowledge on machine learning??if so, please suggest me I saw so many but I couldnt able to .
@@Gayathri-jo4ho This playlist itself is a fantastic place to start, Or can enroll in this course "Machine Learning A-Z by krill eremenkrov" in udemy. The course will give you an intuitive understanding of the ML Algorithms. Then it's up to you to research and study the math behind each concept..Reff (kgnuggets, Medium, MachineLearningplus and lot more)
@@shhivram929 thank you
Exactly. This is the equivalent of Andrew Ng's description
In 22:50 time sir said when it reaches to global minima the slope value will be 0 and And the value of m will be considered for best fit line but the value of slope and m is same.Please clear doubt @krishan Naik sir
Such a great explanation of gradient descent and convergence theorem.
Thanks for all great prepared videos, I think you meant (deriv.J(m) / deriv(m)) at 17'.45", is it correct?
Really thanks you krish.
you just cleared my doubts on cost function and gradient descent. First I saw Andrew Ng class but have few doubts after seeing you video. Now its crystal clear..
Thank You...
Now I understand what GD means. Thanks always, Krish
Hi . Can you please do a video about the architecture of machine learning systems in real world . How does really work in real life .for example how hadop (pig,hive) , spark, flask , Cassandra , tableau are all integrated to create a machine learning architecture. Like an e2e
What about the C (intercept) value? how does the algorithm selects the C value?
Great! Fantastic! Fantabulous! tasting the satisfaction of learning completely - only in your videos!!!!!
Before watching this video I was struggling with the concepts exactly like you were struggling in plotting the gradient descent curve. ☺️Thanks for explaining this beautifully.
Thank you Soo much Krish. No where I could find such a detailed explanation
You made my Day!
The best I've come across on gradient descent and convergence theorem
I think in the Convergence theorem part, the derivative should be d(J(m))/d(m), as in a y-x graph, we take derivative of y wrt x. Here our Y is J(m) and X is m.
Ya also I think this thing.
The graph of the cost function is not gradient descent. The automatic differentiation of cost function with respect to m is gradient decent which is used to update the m.
Please add the indepth math intution of other algorithms like logistic, random forest, support vector and ANN.. Many Thanks for the clearly explained abt linear regression
Hi krish, that was an awesome explanation of Gradient Descent. With respect to finding the optimal slope.
But in linear regression both slope and the intercept are tweakable parameters, how do we achive the optimal intercept value in linear regression.
the only video that made gradient descent so simple that even 2nd grade students woud understand
Implementation part:
Multiple linear Regression - ruclips.net/video/5rvnlZWzox8/видео.html
Simple linear Regression - ruclips.net/video/E-xp-SjfOSY/видео.html
Awesome!! Cleared all doubts seeing this video! Thanks alot Mr. Krish for creating indepth content on such subject!
every line you speak..so much important to understand ths concept......thank u
At 22:12 , why slope will be 0 ...At global minimum slope is 1 and cost function ( absolute square error) is 0 .... Isn't it sir ? Excuse , if I'm wrong .🙏
I’m also having the same doubt.
Great,but not able to find the link for how to implement in python,plz awaiting for your valuable reply.
Sir, what if our problem statement does not reach the global minima when C=0 is considered.
And how will our algorithm come to know that the C=0 condition is not sufficient for the best fit line?
For different independent variables, we would have that many gradient descent. Individually, using convergence theorem, we would get global minimum, but how we are going to find the best fit combining all???
i knew the concept of Linear Regression but didn't know the logic behind it.. the way Line of Regression is chosen. Thanks for this!
Thankyou for this awesome explanation!
Thanq so much for all your efforts.... Knowledge, rate of speech and ability to make thing easy are nicest skill that you hold...
Why 2m in place of m in cost function calculation... Pls explain
you can write m also, authors prefer 2m because when you find the derivative the 2 gets cancelled
Hi krish, why we divide the cost function /2 also however in mse formula we just divide the number of data points i.e m
Thank you my friend, you are a great teacher!
@ 17:34 it should have been d/dm (J(m))?
@krish Naik
What will happen if it is a local minima ( for different equation)
What is the significance of (1/2n), n is no. of data points in the cost function?
I had so much difficulty in understanding gradient descent but after this video
It's perfectly clear
Bro, how we update the slope
I am working in some company with bpm domain... I have no idea about programming but some how I manage to create interrest in ML... The best part is I just want to learn it to enhance my knowledge and I m ready to work for free... If you can suggest something will help...
This is when you become genius
Value of the video is just undefinable! Thanks a lot :)
Do we need to consider intercept value as zero initially, if not then how to proceed further.
Why does considering intercept C will have you draw 3d plots?
god bless you too sir, explained very well. basics helps to grow high level understanding
why we are using cost function and gradient sir what is the concluson
? can we apply as well multilinear and logistic regression also?
You are taking derivative of cost function w.r.t. m in convergence theorem? Please reply!
Best video on theory of linear regression! Thankyou soo much Krish!
Dear Krish: At 14:42' you mention that curve is called gradient descent. I believe this is not true. Gradient descent is not the name of that curve. Gradient descent is an optimization algorithm.
Your explanations are the clearest!!!
Great explanation, how to figure out which direction to move?
how slope can be minimized of a given dataset ? can you make a vedio of a practical all calculation doing things like make a slope lesser or bigger?
Yaar you nailed it man after watching sooo many videos i had some Idea , By Finishing your Video now i m completely clear 😍😍😍😍
Right
And statistical regression analysis is different from Machine learning (gradient descent) estimation right?
This guy was born to teach
Really great sir. I very much thank you sir for this clear explanation
Finally I understood the perfect answer of gradient descent..
14:11 Gradient Descent
Oh my gosh this is awesome tutorial I ever seen God bless you sir🤩🤩
initially How would I decide the m value of m ???
Why multiply the cost function with 1/2 ? I mean, what's the need of 1/2 in cost function : 1/2m * sum( y_ - y )^2
Thank you for sharing this insightful video about linear regression. While I found it informative, I'm uncertain about how it addresses the challenge of avoiding local minima. I'd greatly appreciate it if you could provide some insights on this aspect as well.
In the convergence theorem eqn, m = m - (dm/dm)* alpha . How dm/dm is the slope of J(m) vs m curve ?? slope should be dJ(m)/dm rather than dm/dm.
it is that thing only, probably he should make m' in convergence th. to avoid confussion with m......:D
17:33 Shouldn't it be d(costFunc(m)) / d(m) ?
Yes..
We would also recommend your videos to our students!
It would be great if you could suggest some best books for python programming?
Sir, In most of the cases C will be not zero, then how will we find the value of C.
Will we find value of C using gradient Descent?
as c and m both are changing , shouldn't covergence theorem have rate of change of c also ?
Small correction: @22:25 instead of slope as 0, it should be Cost function as 0. Correct me if I’m wrong…
Thank you so much, Krish!
Hi Sir, I am from cloud & DevOps background is it make sense to go & learn Ml AI, what path I can follow to become a dataops engineer or devops ml ai engineer.
never found a better explaination
H i sir great content and a big fan of your work let me ask a doubt in cost function many books or blogs takes the cost function as 1/NSUMATION( Y - Y^) BUT you used 1/2N SUMATION( Y - Y^) so i was bit confused in that part and tq u for wonderful content thnak you so much sir
Excellent explanation sir. I have started following your videos for all the ML related topics its very interesting.
One doubt = In Gradient Descent, when slope is zero, M value will be considered as the slope of the best file line. I do not understand this. Can you please explain here? Thanks.
hi Krish, What if we have many local minima and then a global minima. in that case how will the convergance theorm will work?
Check my complete deep learning playlist
Great sir. Love this video
Thank You Sir, You have explained everything about gradient Descent in the best possible easiest way !!
Sir..thanks for the explanation.should the coefficient be derivative of loss function w.r.t m?
Yes, it's should be d(J(m))/dm
when you are writing convergence theorm it should be m - d(j(m))/dm * alpha
krish u are saying here discussed details about simple linear regression in ur previous videos but the previous one actually regarding PDF and CDF . is the playlist is sorted ??
Hi Krish, how to calculate the intercept value as in this we have initialized it to 0 and we have not calculated at the end. We have calculated only slope of best fit line.
Sir,there is no playlist of this series where can I found that? About cdf,pdf...
17:04 Sir, why every machine learning model looks for global minimums instead of local minimums?
See global minimum is nothing but the smallest region .....so suppose you are standing in a hilly areas so there are many ups and downs ....(consider these small downs as your local minima) but at that hilly area one point will be the lowest one which will be much lower than all your other smaller ups and downs .....so this lowest region is known as global minima ....and the main aim of your algorithm is to converge at the lowest point .....as low as possible ....hence we consider global minimum ...
I hope you got your doubts clear? 🙂
@@abhishek_maity Noted
As always Krish very well explained!!