Thanks for watching! Checkout the description for the MEDIUM article (published in Towards Data Science) that accompanies this video. Hopefully that should answer questions. Also please follow here and on medium for fun updates like this!
I can’t speak for every case. But in linear regression, we assume the distribution of the labels follows a normal distribution. And the normal distribution can be characterized by a mean and standard deviation. And if you substitute this is the “maximum likelihood estimation”, the math with simplify to optimizing the residual sum or squares ( which is proportional to the mean squared error ) to compute the coefficients in the linear regression hypothesis. I explain this in the entire probability and likelihood videos too if that helps
It was probably a subject that I had been trying to clarify in my head for a month and could not clarify it. Maybe because I'm a little detail-oriented. Thanks to you, brother, I understood the subject. Thanks to RUclips, you have a brother from the other side of the world. Thank you very much.
Thank you. Studying mathematics and statistics in college. I really like this video. My professor told me that “ the most important thing for statistics is : you have to understand the basic logic first using a basic example or daily life example, know what u want and what you need to do“. The second important thing is to “remember the notation and to read the books and study by myself. I really like the first part of the video---that’s the key and core idea for most likely function. Why i watch this video! 😂, Because want to refresh the idea. Doing harder problems with only notations and symbols, get lost.
Ive been a sub since I was in undergrad several years back, you popped into my feed for whatever reason again after years. I'm glad to see you're still putting out gold, good for you man. Edit: I was halfway through the video when I commented, and now I finished the video. Those clarifications that pop up from under the screen are the best thing since sliced bread, wow
This is great! However it's really important to not confuse the probability density function (p(x)) with the probability of x. For one p(x) can be larger than 1!
You also take the logarithm of both sides because that leads to nice properties when differentiating (because log is strictly increasing, it maintains the property that if x1 < x2, then l(x1) < l(x2)). Addressing arithmetic underflow is definitely a useful added benefit too.
it's great that you thought of making a video on comparison between probability and likelihood. However, I think in the initial graphs, the y-axis do not represent probability values. They are probability-density values at various x.
Nice introduction! Very clear and helpful, thanks. My only nitpick would be that, when you change to logarithms, maybe "L proportional to P" (i.e. "L = kP") should become "log L = log k + log P" - not a proportionality anymore, but a constant offset. The idea of monotonicity is still maintained.
Yep. Good catch. I think that's technically correct. Guess when making this type of video when teaching on the spot, sometimes details like this slip my mind
Thanks! Great explanation at the beginning (up to about minute 8 which is how far I have gotten). Aren't your example choices of myu and sigma off by a factor of more than 1000 though? Just want to make sure I am clear about it.
You were talking about sigma and mean , and everything was clear until when you started talking about theta , where did the sigma and mean go ? are we training the model to make predictions on the model parameters or the distribution parameters ?? Thanks tho.
Is it possible to find probability distribution? Looks like in real world we see only likelihood, couse can't obtain general observation (population), does it?
Difference between probability and likelihood done ok. However, the shift to a ML example was done without stating what the example is going to demonstrate and a bit aimless I believe.
probabaly a stupid question but, P(y1,y2,y3...) is written as P(y1).P(y2).P(y3)...; P(y1,y2,y3...) isn't this a function, but taking the product P(y1).P(y2).P(y3)... gives me a number? and these two are the same thing?
P(y1,y2,y3) is the probability the first random variable (RV) has a value y1 AND the 2nd RV has a value y2 and the 3rd RV is y3. This is a number. Now, if each of these RVs are independent of each other, then yea you can write it out as a product of P(y1)P(y2)P(y3). This too is a product of 3 numbers which gives us a number. If they aren’t independent RVs, you are going to have to use the Bayes Rule to write it out in a compex equation. Ultimately, the outcome though is still some real number
I don't see any math explanation in this other than showing the equations. but good explanation theoretically. sorry to comment this but i would appreciate if i see actual math and its explinations. thanks
Thanks for commenting! This was my first time teaching in this way with a white boarding strategy. I have tried more for future videos (hopefully they have turned out better)
At no point in this video did you ever state what likelihood actually is, only what it is proportional to. I recognize you're trying to educate but this is a very poor job, similar to the article you wrote on this subject.
Thanks for watching! Checkout the description for the MEDIUM article (published in Towards Data Science) that accompanies this video. Hopefully that should answer questions. Also please follow here and on medium for fun updates like this!
Tell me how do I use intuition vs probability to predict outcome of my 5 lottery deep training model? 😂
Could you please explain why we used mean and standard deviation when attempting to calculate the likelihood?
I can’t speak for every case. But in linear regression, we assume the distribution of the labels follows a normal distribution. And the normal distribution can be characterized by a mean and standard deviation. And if you substitute this is the “maximum likelihood estimation”, the math with simplify to optimizing the residual sum or squares ( which is proportional to the mean squared error ) to compute the coefficients in the linear regression hypothesis.
I explain this in the entire probability and likelihood videos too if that helps
Thank you so much. Such a good and simple explanation sir.
It was probably a subject that I had been trying to clarify in my head for a month and could not clarify it. Maybe because I'm a little detail-oriented. Thanks to you, brother, I understood the subject. Thanks to RUclips, you have a brother from the other side of the world. Thank you very much.
Thank you. Studying mathematics and statistics in college. I really like this video. My professor told me that “ the most important thing for statistics is : you have to understand the basic logic first using a basic example or daily life example, know what u want and what you need to do“. The second important thing is to “remember the notation and to read the books and study by myself. I really like the first part of the video---that’s the key and core idea for most likely function. Why i watch this video! 😂, Because want to refresh the idea. Doing harder problems with only notations and symbols, get lost.
Actually the goat what an amazing explanation combining it with machine learning too. Exactly what I needed to help understand my textbook.
Ive been a sub since I was in undergrad several years back, you popped into my feed for whatever reason again after years. I'm glad to see you're still putting out gold, good for you man.
Edit: I was halfway through the video when I commented, and now I finished the video. Those clarifications that pop up from under the screen are the best thing since sliced bread, wow
This playlist video enough for ml probability???
finally, my searching of 2-3 hours and many videos on the likelihood rests. thanks man...
This is such a clear explanation. Great job my dude
This is great! However it's really important to not confuse the probability density function (p(x)) with the probability of x. For one p(x) can be larger than 1!
You also take the logarithm of both sides because that leads to nice properties when differentiating (because log is strictly increasing, it maintains the property that if x1 < x2, then l(x1) < l(x2)). Addressing arithmetic underflow is definitely a useful added benefit too.
it's great that you thought of making a video on comparison between probability and likelihood. However, I think in the initial graphs, the y-axis do not represent probability values. They are probability-density values at various x.
thank you for the clarity of the video!
one of the best explanations on youtube! well done sir!
Thanks a ton for watching
Thank you! My confusion goes away after watching this. Thumb up.
You are very welcome. Thanks for watching !
This is the best explanation of likelihood function. thank you so much for the video.
Seriously, one of the best explanations !
Thank you so much!! You made complicated concepts so easy to understand!!! Thanks again!
Super welcome and also very glad to hear :D
Thank god, I clicked the videoooo
Thanks man people out there really like to make easy things difficult ty og
Thank you so much. This video solved so many things for me.
Very well done, clear and concise!
Thank you! My first time trying this style out. So I’m glad it turned out well :)
This is very well explained, thank you!
Thank you for watching!
THANK YOU SO MUCH , YOU ARE A LEGEND
Another 🔥video! This man has an insane brain
Thanks Shashank! I’m just happy it’s useful 🙂🙂
The mean values are not well selected. Most of the samples are distributed around 200k. So the means have to be around 200k
Nicely explained! I got better understanding of this, could you also include some examples which give some feel about the calculations...
Very good explanation of MLE. Amazing
Awesome stuff! Just to clarify: logistic regression uses the binomial distribution; let's not confuse viewers with link functions and sigmoids.
Aren't sigmoids a whole family of functions that have certain properties?
Thanks. It is such a nice explanation of the topic. Everything is explained well
Thanks so much for the compliment! And I am glad you liked it :)
Nice introduction! Very clear and helpful, thanks. My only nitpick would be that, when you change to logarithms, maybe "L proportional to P" (i.e. "L = kP") should become "log L = log k + log P" - not a proportionality anymore, but a constant offset. The idea of monotonicity is still maintained.
Yep. Good catch. I think that's technically correct. Guess when making this type of video when teaching on the spot, sometimes details like this slip my mind
Thank you so much for this !
Hi sir can you do a video on why we use Basie n inferences and how to use them?
wonderful, thanks for your clear explaining, pretty good
You are very welcome
Great job man. Thanks so much!
Great explanation sir! Thx a lot!
Great job. Thank you
Thanks! Great explanation at the beginning (up to about minute 8 which is how far I have gotten). Aren't your example choices of myu and sigma off by a factor of more than 1000 though? Just want to make sure I am clear about it.
Nice video! New topic...👍 Pl make video ML binary classification of time series forecasting using likelyhood equation. waiting for next video!
obeservations y1 , y2 ..... yn are joint probability ? i didn't get that part .
you are honestly #1
You are too kind :)
With X values in the six figures, how can mu be a double digit number?
Nice explanation
iid? i tot it was independent but non identical distribution, the fact that our data may come from different parameter values
Thank you so much that was really helpful
You were talking about sigma and mean , and everything was clear until when you started talking about theta , where did the sigma and mean go ? are we training the model to make predictions on the model parameters or the distribution parameters ?? Thanks tho.
Is it possible to find probability distribution? Looks like in real world we see only likelihood, couse can't obtain general observation (population), does it?
Minute 8:11. I wonder if for a better illustration, L(u, sigma), u should be around 200k and above. So that the mean matched to x - axis.
Great video
It should be probability Density on y axis. Not not probability since X a continious Random Variable
Yep! Going to make some videos around probability theory soon to clear this up. Good catch!
@@CodeEmporium yes please more probability theory videos is what we need
Simply amazing!
awesome video. thank you!
I got the benefit and enjoyment thank you
Excellent! Thanks :)
Super explanation. Thanks
Welcome! Thanks for watching:)
Great video dude
Difference between probability and likelihood done ok. However, the shift to a ML example was done without stating what the example is going to demonstrate and a bit aimless I believe.
Thanks a lot!
I think you should include keyword: Maximum Likelihood, Log Likelihood Ratio, to your title to reach more audience.
Yea. I’ll keep this in mind. Thanks for the tip. Maybe I’ll change this title soon
probabaly a stupid question but, P(y1,y2,y3...) is written as P(y1).P(y2).P(y3)...; P(y1,y2,y3...) isn't this a function, but taking the product P(y1).P(y2).P(y3)... gives me a number? and these two are the same thing?
P(y1,y2,y3) is the probability the first random variable (RV) has a value y1 AND the 2nd RV has a value y2 and the 3rd RV is y3. This is a number.
Now, if each of these RVs are independent of each other, then yea you can write it out as a product of P(y1)P(y2)P(y3). This too is a product of 3 numbers which gives us a number. If they aren’t independent RVs, you are going to have to use the Bayes Rule to write it out in a compex equation. Ultimately, the outcome though is still some real number
Thanks for the video
Why do we use pdf with well fitted parameter instead of histogram?
Good stuff.
Very nice review. Thanks.
You are very welcome!
Bro I got 4 ads watching this video. I hope this guy is making bank off of these videos
Another great video
Thank youu
Thx, life saver
Great video.
Thanks a ton!
Writing red and green on a black background is very hard to read for colourblind people
Yea. I didn’t think it would look this dark. In future videos , I try to correct this. :)
Great explanation! Thanks, man. By the way, what Blackboard App are you using in this video?
Thank you! The app is called “explain everything “
Should have clarified that housing prices in practice are not independent. Perhaps use a better example.
7.52
I don't see any math explanation in this other than showing the equations. but good explanation theoretically. sorry to comment this but i would appreciate if i see actual math and its explinations. thanks
Thanks for commenting! This was my first time teaching in this way with a white boarding strategy. I have tried more for future videos (hopefully they have turned out better)
Nice
Thank you!
Smarter version of Aziz Ansari!
Waiting
Not much longer now :)
At no point in this video did you ever state what likelihood actually is, only what it is proportional to. I recognize you're trying to educate but this is a very poor job, similar to the article you wrote on this subject.
Next Level Explanation , Subscriber+=1 :)
Welcome aboard! Thanks a ton!
Fake accent nothing else
God damn you explain so much better than my college prof.
Thanks a ton ! Hope you enjoy the rest of these videos :)