probability is the quantity most people are familiar with which deals with predicting new data given a known model ("what is the probability of getting heads six times in a row flipping this 50:50 coin?") while likelihood deals with fitting models given some known data ("what is the likelihood that this coin is/isn't rigged given that I just flipped heads six times in a row?"). I wanted to add this perspective because using the example in this clip, a likelihood quantity such as 0.12 is not meaningful to the layman unless it is explained exactly what 0.12 means: a measure of the sample's support for the assumed model i.e. low values either mean rare data or incorrect model!
That's what I was thinking. I've taught plenty of Probability courses, and I really wasn't sure what this "Likelihood" was supposed to be conceptually. The issue being that Likelihood as described here isn't really concept, it's just a computational tool used in the intermediate steps for fitting models. In fact, if you go a step further you turn the set of possible models into a probability space itself, and then Likelihood is itself a probability again.
Thank you for easy explanation!!! In addition.. The Probability of an event is 0... that is, 1/infinite. Therefore, We should compare the cumulative probability between events. But we can compare the y values between events. Definition of Likelihood in easy : y value of PDF. Countable Events : Likelihood = Probability. Continuous Events : Likelihood =/= Probability. Likelihood = PDF. Have a nice day!.
@@jamescollier3 when you say "10 percent" you are talking about probability (area under the curve). likelihood is instead just an instantaneous point on the curve. 0.12 does not equal "12 percent". it is at the moment just a number for one data point. you would compare it to another datapoint which would have its own likelihood given the assumed model, e.g. the likelihood of getting a 34g mouse is 0.15, or, it is more likely given the model that a randomly selected mouse is 34g instead of 32g. if you wanted to to quantify it in layman terms you would use probability instead. such as, given this model, 68% of the time the mouse should be 34g +/- 2.5g
I am a postdoc in neuroscience/psychiatry on a mission to brush up my stats knowledge. I can only repeat what others have already said: It's amazing how you break down and explain complicated topics in an understandable way. Thx so much for these videos.
When I was doing my neuroscience undergrad, I had to help the TAs in the Stats lab courses as I was the only person that understood math lol. Even still, I wish these videos existed when I was taking the course.
Right, so: - Probability is for predicting an outcome within a known system (eg. rolling fair dice) - Likelihood is for analysing an outcome when the system is not known for certain (eg. determining if particular dice actually are fair, or determining what curve is the best model for a dataset) I just learned a thing - thank you. :-)
I never thought I would be saying I like stats. As you grow older, you will see the importance of stats in the real world you start to appreciate it more
Bam! I think what you say is true. The older I get, the more I appreciate the variation that I see in the world around me. And the more I appreciate variation, the more I appreciate statistics as a way to understand it.
This is good stuff. Although I have taken 3 graduate level statistics courses, I struggle to explain the rationale behind the concepts. Which means, of course, that I lack deep learning of statistics. Your straightforward and graphic explanations are marvelous; even better than my stats professors. Thanks and I’m a subscriber now.
this is the most understandable explanation out of all videos i've seen so far about the differences between probability and likelihood.. thankyou so much
Wow and gosh! I work a lot in risk management and have long argued for the appropriate use of the terms 'likelihood' and 'probabiltiy'. As far as most poeple are concerned, they are synonymous. People cannot understand the difference between the likelihood of a specific event and the probability of an event causing a specified outcome. Thanks for the simplification.
I was completely confused on the concepts, after referring lots of videos and confusing intuition I just came to this channel. The song you play at the starting of this video just refreshes the learning journey. The content you give with clear intuition makes me really very very very very very happy...
I studied financial econometric for past two semester, now I am at third semester, doing special project/thesis, all the concept I studies, becomes linked after I watch your video, I just want to say I really appreciated your explanations. I am pressing likes for every videos I watch going forward. Maybe after my project, I will learn machine learning. Thank you.
I just noticed something interesting. In my native language (spanish) the word Likelihood doesn't have a special word to be defined, like Josh is saying in english where exists a difference between likelihood and probability, if one checks the translation of likelihood to spanish it would equal to "probability" so it appears on spanish the word just does not need to be differentiate but in english it is !
This problem is actually the same in English. In normal conversation, "probability" and "likelihood" are equal and mean the same thing. Only in statistics do they mean different things. This is why these terms are so easily confused, even by native english speakers.
In spanish, some professors use the word "hope" in addition to "probability." Years later, I find this video and suspect a lot happened that I didn't notice. 🤨
I was trying to understand likelihood from yesterday but unable to understand properly, this the best video which explained me in very simple way. Thanks
I do see your point, but to me likelhood is just a probability mass or probability density, that's a function of the unknown parameters of the distribution, given a sample from that distribution.
Hi Josh, I really love how you simplify statistics concepts. After watching above video, I have this question, Does likelihood always have to be a single point and not a range? Can there be a question like, What is the likelihood of weighing a mouse between 32 and 34 grams? Will the likelihood also be a range in that case? Like between 0.12 and 0.13 or something like that?
That's a good question. The problem is that in normal conversation, "likelihood" and "probability" are interchangeable. So, in normal conversation, it makes sense to say "the likelihood of weighing something between 23 and 35 grams is X". So that's OK. However, the mathematical definition restricts it to a single point. So when we are deriving things we have to use the strict, mathematical definition. p.s. Thank you so much for supporting StatQuest!!! BAM!
Great work done till this point. Videos are great and reflect meticulous effort. However it seems we are multiplying oranges and apples in the hope of getting strawberries. I have searched lot of famous books but never seen anyone plugging in numbers into f(x) equation of normal or exponential or any continuous function in the hope of getting single discrete value. Infact we always take integral and put in limits to get probability of a range of values. Of x and that is F(x).
my head was boiling coz of PCA and it's a blessing that I came across the statquest video ... thank you for making these topics so clear n interesting... stay blessed
Oh man. I had thought you did these songs at the start just for the memes. But I checked out one or two songs from your discography. They're prettty good!
StatQuest makes me feel so happy... so very very very very very very very very..... Happy! StatQuest! I mean seriously, yeah!!!!!! love this channel from China!
Hello! I'm grinding for my categorical data analysis exam (what doesnt set me in the best mood) and have to say that the song at the beginning made me feel slightly better
A truly excellent and simple presentation of Bayes theorem and more importantly about how it can go horribly awry in practice. It is always important to remember as you so well describe that it is an estimation of probability conditioned on our state of apparent knowledge and not a statement about truth. For that we need the actual data from the world. It is all too easy to forget this and get into a recursive loop where this unintentionally instead becomes a tool to support confirmation bias.
This is so good. I’m looking to understand likelihood estimation function and just from this deep yet simple explanation I can guess what MLE is all about. Thanks!
I think it's worth mentioning that "likelihood" is otherwise described as "probability density". We call it a "density" as it is literally the probability per unit mass (in this context). This also means that the *unit* of likelihood is the reciprocal of the unit on the horizontal axis, in this case: probability per unit gram. The choice of unit is arbitrary, and so the value of the likelihood is dependent on the unit of mass you care to choose. This also has a bearing on the numerical values for the mean and standard deviation. Josh-I love your work! Just offering up my slice for constructive feedback/food for thought / follow-up ideas.
The likelihood of a specific data point (measurement) is the value of the pdf at that specific data point. (Values on the x axis and pdf/likelihood on the y axis) The probability for any specific exact measurement point, is 0 but probability of a range is the area under the pdf in that range The probability is of data given a distribution where as likelihood is the likelihood of a distribution given data.
@Mostafa I agree with what you 've written. Would you mind rephrasing that for discrete distributions? I get confused with the terms when differentiating between discrete and continuous distributions. :-/
Thanks for this video. I'm a little confused about what you have mentioned about probability. You mentioned that probability is the area under the curve of distribution for a given interval. However if data is only a single point, the area is zero and it would contradict with this equation L(parameters; data) = P(data; parameters)
I must confess that I had a hard time understanding it too. (not knowing English is part of it, I've been playing the second video of the logistic regression series for more than 10 times) I think probability is when you evaluate the integral of the function in a certain range. And likelihood is when you evaluate the function at that point.
I am a little bit confused concerning the second examle because this is probability density function, not cumulative distribution function. Probability density function has probability density (not probability itself) on vertical axis. So why did you conclude that likelihood is 0,12? In my opinion, this is probability density. In order to see likelihood we have to swich to cumulative distribution function and look what Y-value correspondes to 34 grams.
The reason why we use a probability density function (PDF) is that a very large number, like 1,000,000 grams should have a very low likelihood - and the PDF ensures that is the case. If we used a cumulative density function (CDF) then the likelihood would be very large for a crazy large number, and that doesn't make sense.
That did actually also confuse me quite a bit at first as I thought the same, why is he describing the pdf. However, the thing is that normally you consider not just one sample but numerous, which form a product and that is also where it becomes clear what the likelihood is. I would suggest to maybe use multiple data points in the explanation since that is also how you are going to encounter it in most text books.
@@quAdxify That's a good idea, and that's what I have in all of the videos that show "likelihood in action", for the exponential, binomial and normal distribution: ruclips.net/video/p3T-_LMrvBc/видео.html ruclips.net/video/4KKV9yZCoM4/видео.html ruclips.net/video/Dn6b9fCIUpM/видео.html
Note that likelihood=0.12 doesn't mean that, given the observed data, those parameters have 12% of being the true parameters. The likelihood, in this case, just has a relative meaning. The degree to which data supports one parameter value versus another is measured by their likelihood ratio. To prove why PDF can be used as the "likelihood" and why maximizing likelihood is a good strategy for estimating distribution parameters is a different story (e.g., we must prove that maximizing the "likelihood" is equivalent to maximizing the PDF, and the parameter estimated by maximizing likelihood converges to the true parameters exponentially when the number of observed data point increases) In different foundations of statistics, the likelihood can mean different things. In Bayes inference, the likelihood has an absolute probability meaning, which says that "Given the parameter is true, what is the probability that I can observe this data".
The way of you speaking is very similar to Baldi in Baldi's Basics in Education and Learning. Which makes me terrified some times, but i'm generally more terrified how rapid i learn watching your videos. Thank you.
I got another BAMM for you. Bam, subscribed! That was a vivid and accessible illustration. I never have to fret over this distinction again. Now I want to see the rest of these videos. Yay.
Bumped into this video, and glad I did! Please make a video using a discrete distribution. That'll really help differentiate between likelihood and probability. The problem with Y values being in a continuous domain is that, by definition, we'll always have to just say that the P(mouse weight = 32) = 0, and there ends any scope for comparing it to L(32) :-)
Couldn't help but type in Comment #300 I would have LOVED to be a Mathematician or a Statistician. But had to give up on that when I realized that I didn't have quite the "Talent Level" that my peers who were going in that direction had. Paul Erdos I'm not. A friend told me (please understand that this was in the mid 70s), "You're going to need AT LEAST a Masters Degree before you can go out and beg for a job. And in all likelihood, not only will you will need to write a Thesis to get that Degree, but you're going to have to pick a topic VERY Soon after you start your Masters Studies AT THE LATEST or you will be light-years behind your classmates. Most of them will have chosen their Thesis Topic during their Undergraduate days." The only topic I can recall having had Any interest in at that time was the Navier-Stokes Equations. But, I have seen where so-called "Authorities" argue about whether this is a physics or a math Equation. So much for that. Also, tried to read background material on the Hodge Conjecture and it might as well have been written in Sanskrit. Also, I was told, "If you can't get a job as a Mathematician or a Statistician, you can always be a teacher." Yeah right... I can honestly say that I never had even ONE teacher that "inspired" me in any Method, Manner or Fashion. And the number of teachers I would have loved to hire the Columbine Boys to "take care of" is Easily in two digits. So I sure as heck wasn't going to become one of them. Lastly, given my Life-long Hatred of school in general (me and Evariste Galois, among others), I saw that going into Accounting was a viable alternative, in that I was able to get a job with just the (then) four-year Bachelors Degree. The day I put all school in the rear-view mirror for Life, I remember saying, "This is how prisoners who just got paroled must feel." All that aside, to this day Math and Statistics remain two of my favorite hobbies. Since I was Lucky and Blessed enough to have been able to retire five years ago at age 55, I have all day, every day, to do what I LIKE to do, rather than what I HAD to do to survive. GREAT Video! Thank you, Sir! All Best!
@@ReTr093 That's good to know and I am happy for each and every one of them. But I am curious. Do you know if the typical requirement for getting job as a statistician is a Masters Degree? Or can they become employed with a Bachelors Degree? Thank you for your response. All Best!
The best way to understand likelihood is to see it in action. Check out how likelihood is used to find the optimal exponential distribution given some data: ruclips.net/video/p3T-_LMrvBc/видео.html
It's best understood through conditional probability. The symbol P(x|y) means the "probability density of x conditioned on y", and it is at the same time the "likelihood of y".
Wish I have came across this channel earlier, subscribed after just one video. Keep up the good work man. Awesome contents and nice lil song at the beginning.
It's true - a discrete distribution would be a little easier to understand, but most people do t-tests, which are continuous, so I figured I'd start there. however, I may do another video that shows the discrete version.
Strange, almost every book I've read so far talk of P(data|distribution) as likelihood, and P(distribution|data) as posterior. Am I missing something here? 🤔
That is very interesting. The second part, P(distribution | data), is in fact a posterior, as you suspect. The "P" is the important part. It tells you to integrate the function. If it were "L", then we would a likelihood, and that would tell you evaluate the function a specific point. The first part, however, P( data | distribution) being called a likelihood is confusing. Even Bayesian folks tend to observe the "P" means "probability" and "L" means "likelihood" convention. That said, in Baysian statistics, you'll see likelihood functions written like this L( data | distribution). However, in this case, they are calculated just like L( distribution | data). The difference being that Bayesians let the parameters be random variables, and non-Bayesians only let the data be random variables.
P(distribution|data) = [P(data|distribution) * P(distribution)] / P(data) or in Bayesian Terms posterior = likelihood * prior. In the likelihood function we are given a fixed data set (the measurement) and we are postulating the distribution that data set came from. This is what Josh says in the video. Also, P(data) is a constant to make the posterior integrate to 1.
Not exactly. Probability is the integral from a to b of f(x)dx, likelihood is just f(x). If you remember calculus, any integral from a to a is zero, even if the interior function is not.
@@ronaldjensen2948 I've randomly stumbled on the same video again and yes, it says that likelihood is the PDF. I don't know how me from ten months ago arrived at his conclusion.
Right! It seems he talks about the probability that the mouse has a weight between 33,5 and 34,499999 g. And then we talk again about an area under the graph as before. So the whole video is a fail.
Yes but he's not talking about the probability of the mouse weighing exactly 34g. The mouse weighs 34g. It's a given. A data point. The question is how likely this is given a distribution. That's what the likelihood is for.
You are absolutely correct, but please refer to the title of the video. *the likelihood of the **_distribution_** (with the stated parameters) given you had a mouse of 34g*
Nice explanation but i m afraid i still dont understand whats likelihood..can somebody pls explain in plain words..josh has explained in terms of formula
say you measure one mouse. ONLY one mouse. You get 34g. You know nothing about mice? according to your single experiment it is probably best to put the mean of mice around 34g because that gives the max likelihood for your experiment (i.e because the mean has highest probabilityy. For all you know now the mean is 34g and thats it but you only did 1 now you go ahead and do 2 mass weights one at 34g and one happened to be at 30g. According to your original distribution where mean was at 34 this would not give maximum likelyhood because the mass at 30g is very little represented. You will have to shift your distributiion to MAXIMIZE likelihood. To maximize the probability of all measurements
i thought that the probability of a certain value in continous distribution is always 0. you dont have discrete weights like 32, 33 etc, but 32.00000001 etc. the probability of a mouse being exactly 32 is zero.
That is exactly right. However, the likelihood for a single point is not zero. The reason for this might be more obvious if you saw likelihoods in action. Here's a video that shows how likelihoods are used to estimate parameters for the exponential distribution: ruclips.net/video/p3T-_LMrvBc/видео.html
@@georgesmith3022 No, it's zero. For example, the area of a rectangle that has 0 width is 0. So the area under the curve of a continuous statistical distribution, like the normal distribution, with 0 width is 0.
God ! So hard have I been through, to finally find a intelligent guy to know probability.Hi Josh Starmer Can I talk to you in private? Can I make friends with you.
I am slightly confused. I have always been taught that the likelihood of a model is the probability of the data given the model, but the notation uses is always in the form of P(Data | Model). this is the opposite of how the video is presenting it.
Both the likelihood and probability use the same exact function, but they are used differently. When you calculate probability, you integrate the function, when you calculate likelihood, you evaluate it at specific points.
@@statquest I think what it confusing is that the textbook definition of the likelihood function L can be written as a conditional probability P such that P(data|model)==L(model|data).
@@RabidMortal1 If the distribution is discrete, like binomial, then P(data | model) == L(model | data). However, when the distribution is continuous, like a normal distribution, then P does not equal L. For more information, check out my videos where I show the likelihood function in action. First, here's how it works with a continuous distribution: ruclips.net/video/p3T-_LMrvBc/видео.html ...and here's how it works for a discrete distribution: ruclips.net/video/4KKV9yZCoM4/видео.html
probability is the quantity most people are familiar with which deals with predicting new data given a known model ("what is the probability of getting heads six times in a row flipping this 50:50 coin?") while likelihood deals with fitting models given some known data ("what is the likelihood that this coin is/isn't rigged given that I just flipped heads six times in a row?"). I wanted to add this perspective because using the example in this clip, a likelihood quantity such as 0.12 is not meaningful to the layman unless it is explained exactly what 0.12 means: a measure of the sample's support for the assumed model i.e. low values either mean rare data or incorrect model!
That's exactly right! Thanks for posting your comment.
Thanks for clarification
That's what I was thinking. I've taught plenty of Probability courses, and I really wasn't sure what this "Likelihood" was supposed to be conceptually.
The issue being that Likelihood as described here isn't really concept, it's just a computational tool used in the intermediate steps for fitting models.
In fact, if you go a step further you turn the set of possible models into a probability space itself, and then Likelihood is itself a probability again.
Thank you for easy explanation!!!
In addition..
The Probability of an event is 0... that is, 1/infinite. Therefore, We should compare the cumulative probability between events. But we can compare the y values between events.
Definition of Likelihood in easy : y value of PDF.
Countable Events : Likelihood = Probability.
Continuous Events : Likelihood =/= Probability. Likelihood = PDF.
Have a nice day!.
@@jamescollier3 when you say "10 percent" you are talking about probability (area under the curve). likelihood is instead just an instantaneous point on the curve. 0.12 does not equal "12 percent". it is at the moment just a number for one data point. you would compare it to another datapoint which would have its own likelihood given the assumed model, e.g. the likelihood of getting a 34g mouse is 0.15, or, it is more likely given the model that a randomly selected mouse is 34g instead of 32g. if you wanted to to quantify it in layman terms you would use probability instead. such as, given this model, 68% of the time the mouse should be 34g +/- 2.5g
I am a postdoc in neuroscience/psychiatry on a mission to brush up my stats knowledge. I can only repeat what others have already said: It's amazing how you break down and explain complicated topics in an understandable way. Thx so much for these videos.
Thanks!
When I was doing my neuroscience undergrad, I had to help the TAs in the Stats lab courses as I was the only person that understood math lol. Even still, I wish these videos existed when I was taking the course.
Right, so:
- Probability is for predicting an outcome within a known system (eg. rolling fair dice)
- Likelihood is for analysing an outcome when the system is not known for certain (eg. determining if particular dice actually are fair, or determining what curve is the best model for a dataset)
I just learned a thing - thank you. :-)
yep.
Seriously, please don't stop making these! You really nail the topics in the most perfect time, and always explain them in awesome ways :)
After spending 2 hours on RUclips learning about probability and anything related to it, I landed on this video and I want to say this: I LOVE YOU
BAM! :)
I never thought I would be saying I like stats. As you grow older, you will see the importance of stats in the real world you start to appreciate it more
Bam! I think what you say is true. The older I get, the more I appreciate the variation that I see in the world around me. And the more I appreciate variation, the more I appreciate statistics as a way to understand it.
This is good stuff. Although I have taken 3 graduate level statistics courses, I struggle to explain the rationale behind the concepts. Which means, of course, that I lack deep learning of statistics. Your straightforward and graphic explanations are marvelous; even better than my stats professors. Thanks and I’m a subscriber now.
Thank you very much! :)
this is the most understandable explanation out of all videos i've seen so far about the differences between probability and likelihood.. thankyou so much
Glad it was helpful!
Your intro songs have really grown on me. You are like the Bob Ross of Statistics.
Very well put
Haha agreed!
Now I’m worried that if I subscribe I’m going to loose my taste in music.
Totally agree Bob Ross of Statistics haha
Bob Ross was a painter, not a musician...
Wow and gosh! I work a lot in risk management and have long argued for the appropriate use of the terms 'likelihood' and 'probabiltiy'. As far as most poeple are concerned, they are synonymous. People cannot understand the difference between the likelihood of a specific event and the probability of an event causing a specified outcome. Thanks for the simplification.
Glad it helped! :)
I was completely confused on the concepts, after referring lots of videos and confusing intuition I just came to this channel. The song you play at the starting of this video just refreshes the learning journey. The content you give with clear intuition makes me really very very very very very happy...
Awesome! :)
I studied financial econometric for past two semester, now I am at third semester, doing special project/thesis, all the concept I studies, becomes linked after I watch your video, I just want to say I really appreciated your explanations. I am pressing likes for every videos I watch going forward. Maybe after my project, I will learn machine learning. Thank you.
Thank you! :)
I just noticed something interesting. In my native language (spanish) the word Likelihood doesn't have a special word to be defined, like Josh is saying in english where exists a difference between likelihood and probability, if one checks the translation of likelihood to spanish it would equal to "probability" so it appears on spanish the word just does not need to be differentiate but in english it is !
This problem is actually the same in English. In normal conversation, "probability" and "likelihood" are equal and mean the same thing. Only in statistics do they mean different things. This is why these terms are so easily confused, even by native english speakers.
In spanish, some professors use the word "hope" in addition to "probability." Years later, I find this video and suspect a lot happened that I didn't notice. 🤨
@@diegofabianledesmamotta5139 In polish we used to call expected value "mathematical hope". We don't differentiate probability and likelihood too.
I was trying to understand likelihood from yesterday but unable to understand properly, this the best video which explained me in very simple way.
Thanks
Thank you! :)
as a student who majors in statistics, all of your videos are so helpful! thank you❤
Duuuude, I've been doing data science for about 5 years now and stats for a lot longer and this is the best explanation every. I'm stealing this!
Hooray! :)
Crisp, concise and to the point! Instantly cleared my confusion!
Hooray! :)
wow!! that was the most clearly explained math video i have ever seen, thank you so much
Thank you! :)
came here after watching his logistic regression vids..the clearity of these topics is insane!! thanks a lot :)
Glad it was helpful!
Yeah, the clarity of this guy's explanations are off da chartZz!!
statquest makes feel very very very happy indeed!
Hooray!!! :)
I do see your point, but to me likelhood is just a probability mass or probability density, that's a function of the unknown parameters of the distribution, given a sample from that distribution.
This is the best explanation i've seen ever
Thanks!
StatQuest , you're the best when it comes to "making it Simple"
Thank you! :)
Statsquest makes me feel so happy
So very very very very very very very very very happy
:)
Actually makes things so much more intuitive, thank you!
Hooray! :)
Best explanation I have seen of probability and likelihood. Thank you so much..
Glad it was helpful!
Hi Josh, I really love how you simplify statistics concepts. After watching above video, I have this question, Does likelihood always have to be a single point and not a range? Can there be a question like, What is the likelihood of weighing a mouse between 32 and 34 grams? Will the likelihood also be a range in that case? Like between 0.12 and 0.13 or something like that?
That's a good question. The problem is that in normal conversation, "likelihood" and "probability" are interchangeable. So, in normal conversation, it makes sense to say "the likelihood of weighing something between 23 and 35 grams is X". So that's OK. However, the mathematical definition restricts it to a single point. So when we are deriving things we have to use the strict, mathematical definition.
p.s. Thank you so much for supporting StatQuest!!! BAM!
Great work done till this point. Videos are great and reflect meticulous effort. However it seems we are multiplying oranges and apples in the hope of getting strawberries. I have searched lot of famous books but never seen anyone plugging in numbers into f(x) equation of normal or exponential or any continuous function in the hope of getting single discrete value.
Infact we always take integral and put in limits to get probability of a range of values. Of x and that is F(x).
@@statquest single point is discrete
this guy is smart and he is great at explaining things. he is a true national treasure.
Thanks! :)
Maybe this seems overly simple for some, but this was awesome! Thank you!
my head was boiling coz of PCA and it's a blessing that I came across the statquest video ... thank you for making these topics so clear n interesting... stay blessed
Thanks!
Oh man. I had thought you did these songs at the start just for the memes. But I checked out one or two songs from your discography. They're prettty good!
Thanks! :)
"Jibber Jabber clearly explained" is t-shirt worthy!!!
That's a great idea! BAM! :)
StatQuest makes me feel so happy... so very very very very very very very very..... Happy! StatQuest! I mean seriously, yeah!!!!!! love this channel from China!
Thank you very much! :)
Hello! I'm grinding for my categorical data analysis exam (what doesnt set me in the best mood) and have to say that the song at the beginning made me feel slightly better
bam! :)
Thank you for solving my questions that keeps staying on my mind for years. This helps a lot!
A truly excellent and simple presentation of Bayes theorem and more importantly about how it can go horribly awry in practice. It is always important to remember as you so well describe that it is an estimation of probability conditioned on our state of apparent knowledge and not a statement about truth. For that we need the actual data from the world. It is all too easy to forget this and get into a recursive loop where this unintentionally instead becomes a tool to support confirmation bias.
noted
This was a very clean explanation of probability and likelihood.
Thanks!
Thank you! :)
This is so good. I’m looking to understand likelihood estimation function and just from this deep yet simple explanation I can guess what MLE is all about. Thanks!
bam! :) If you want to see some examples, see: ruclips.net/video/p3T-_LMrvBc/видео.html ruclips.net/video/Dn6b9fCIUpM/видео.html
BAM! Thank you Josh, you explain concepts with such clarity and ease!
I think it's worth mentioning that "likelihood" is otherwise described as "probability density". We call it a "density" as it is literally the probability per unit mass (in this context). This also means that the *unit* of likelihood is the reciprocal of the unit on the horizontal axis, in this case: probability per unit gram.
The choice of unit is arbitrary, and so the value of the likelihood is dependent on the unit of mass you care to choose. This also has a bearing on the numerical values for the mean and standard deviation.
Josh-I love your work! Just offering up my slice for constructive feedback/food for thought / follow-up ideas.
Thanks!
When he says "BAM", that's when I remember to press "Like"
BAM! :)
The likelihood of a specific data point (measurement) is the value of the pdf at that specific data point. (Values on the x axis and pdf/likelihood on the y axis)
The probability for any specific exact measurement point, is 0 but probability of a range is the area under the pdf in that range
The probability is of data given a distribution where as likelihood is the likelihood of a distribution given data.
@Mostafa I agree with what you 've written. Would you mind rephrasing that for discrete distributions?
I get confused with the terms when differentiating between discrete and continuous distributions. :-/
Thanks for this video. I'm a little confused about what you have mentioned about probability. You mentioned that probability is the area under the curve of distribution for a given interval. However if data is only a single point, the area is zero and it would contradict with this equation L(parameters; data) = P(data; parameters)
For continuous data, likelihoods do not equal probabilities. This is only the case for discrete data.
I must confess that I had a hard time understanding it too. (not knowing English is part of it, I've been playing the second video of the logistic regression series for more than 10 times) I think probability is when you evaluate the integral of the function in a certain range. And likelihood is when you evaluate the function at that point.
StatQuest make me feel soooooooooo happy 😊, thank you statQuest team 💜
Hooray! BAM! :)
You get an automatic like just for the intro! Thanks for the video.
Hooray!!!! You're welcome. :)
Thanks for saving my computational statistics
BAM! :)
I am a little bit confused concerning the second examle because this is probability density function, not cumulative distribution function. Probability density function has probability density (not probability itself) on vertical axis. So why did you conclude that likelihood is 0,12? In my opinion, this is probability density. In order to see likelihood we have to swich to cumulative distribution function and look what Y-value correspondes to 34 grams.
The reason why we use a probability density function (PDF) is that a very large number, like 1,000,000 grams should have a very low likelihood - and the PDF ensures that is the case. If we used a cumulative density function (CDF) then the likelihood would be very large for a crazy large number, and that doesn't make sense.
In all likelihood you may not fully understand the difference between probability density functions and cumulative density functions :-/
That did actually also confuse me quite a bit at first as I thought the same, why is he describing the pdf. However, the thing is that normally you consider not just one sample but numerous, which form a product and that is also where it becomes clear what the likelihood is. I would suggest to maybe use multiple data points in the explanation since that is also how you are going to encounter it in most text books.
@@quAdxify That's a good idea, and that's what I have in all of the videos that show "likelihood in action", for the exponential, binomial and normal distribution:
ruclips.net/video/p3T-_LMrvBc/видео.html
ruclips.net/video/4KKV9yZCoM4/видео.html
ruclips.net/video/Dn6b9fCIUpM/видео.html
Note that likelihood=0.12 doesn't mean that, given the observed data, those parameters have 12% of being the true parameters. The likelihood, in this case, just has a relative meaning. The degree to which data supports one parameter value versus another is measured by their likelihood ratio.
To prove why PDF can be used as the "likelihood" and why maximizing likelihood is a good strategy for estimating distribution parameters is a different story (e.g., we must prove that maximizing the "likelihood" is equivalent to maximizing the PDF, and the parameter estimated by maximizing likelihood converges to the true parameters exponentially when the number of observed data point increases)
In different foundations of statistics, the likelihood can mean different things. In Bayes inference, the likelihood has an absolute probability meaning, which says that "Given the parameter is true, what is the probability that I can observe this data".
Pls keep on making such videos, you might as well prevent me from feeling stupid. Thanks a tonne.
Thanks! :)
Wish I had seen this before I made my Baysian classifier. Thanks a lot for a great video!
Paul Burger why should someone cresate a NB classifier if there are plenty of them?
You are the best and easiest in statistics
Thank you! :)
This is the best explanation I've ever seen! and I like your intro song as well!
Hooray! :)
I really like your whole channel. You make things very clear, and you give a charming and personable presentation of it all. Thank you!
Wow, thank you!
Holy sh*t! My head blown twice! Double BAM!!!
Hooray! :)
the clearest explanation of the connection between likelihood and probability. more videos!
Finally someone from the USA using metric measurements.
The way of you speaking is very similar to Baldi in Baldi's Basics in Education and Learning. Which makes me terrified some times, but i'm generally more terrified how rapid i learn watching your videos. Thank you.
bam!
Amazing videos, can you do a playlist on bayesian statistics?
I got another BAMM for you.
Bam, subscribed!
That was a vivid and accessible illustration. I never have to fret over this distinction again.
Now I want to see the rest of these videos. Yay.
:)
Bumped into this video, and glad I did!
Please make a video using a discrete distribution. That'll really help differentiate between likelihood and probability. The problem with Y values being in a continuous domain is that, by definition, we'll always have to just say that the P(mouse weight = 32) = 0, and there ends any scope for comparing it to L(32) :-)
Very interesting and nice perspective on probability vs likelihood .. that is new !
Glad you liked it!
Couldn't help but type in Comment #300
I would have LOVED to be a Mathematician or a Statistician. But had to give up on that when I realized that I didn't have quite the "Talent Level" that my peers who were going in that direction had. Paul Erdos I'm not.
A friend told me (please understand that this was in the mid 70s), "You're going to need AT LEAST a Masters Degree before you can go out and beg for a job. And in all likelihood, not only will you will need to write a Thesis to get that Degree, but you're going to have to pick a topic VERY Soon after you start your Masters Studies AT THE LATEST or you will be light-years behind your classmates. Most of them will have chosen their Thesis Topic during their Undergraduate days." The only topic I can recall having had Any interest in at that time was the Navier-Stokes Equations. But, I have seen where so-called "Authorities" argue about whether this is a physics or a math Equation. So much for that. Also, tried to read background material on the Hodge Conjecture and it might as well have been written in Sanskrit.
Also, I was told, "If you can't get a job as a Mathematician or a Statistician, you can always be a teacher." Yeah right... I can honestly say that I never had even ONE teacher that "inspired" me in any Method, Manner or Fashion. And the number of teachers I would have loved to hire the Columbine Boys to "take care of" is Easily in two digits. So I sure as heck wasn't going to become one of them.
Lastly, given my Life-long Hatred of school in general (me and Evariste Galois, among others), I saw that going into Accounting was a viable alternative, in that I was able to get a job with just the (then) four-year Bachelors Degree. The day I put all school in the rear-view mirror for Life, I remember saying, "This is how prisoners who just got paroled must feel."
All that aside, to this day Math and Statistics remain two of my favorite hobbies. Since I was Lucky and Blessed enough to have been able to retire five years ago at age 55, I have all day, every day, to do what I LIKE to do, rather than what I HAD to do to survive.
GREAT Video! Thank you, Sir! All Best!
Funny how things changed. Statisticians get hired straight out of school now,and no way in hell it's a teaching gig.
@@ReTr093 That's good to know and I am happy for each and every one of them. But I am curious. Do you know if the typical requirement for getting job as a statistician is a Masters Degree? Or can they become employed with a Bachelors Degree?
Thank you for your response. All Best!
really happy after i found you sir
bam! :)
Thank you man, THANKS A LOT
seriously. Finally I got it, omg
Hooray!!! That's awesome! :)
Thank you for keeping things very clear!❤
Any time!
The likelihood part confused me...what here is shown as likelihood sounds more like hypth testing
The best way to understand likelihood is to see it in action. Check out how likelihood is used to find the optimal exponential distribution given some data: ruclips.net/video/p3T-_LMrvBc/видео.html
It's best understood through conditional probability. The symbol P(x|y) means the "probability density of x conditioned on y", and it is at the same time the "likelihood of y".
Wish I have came across this channel earlier, subscribed after just one video. Keep up the good work man. Awesome contents and nice lil song at the beginning.
Awesome, thank you!
So the likelihood is somewhat like the derivative of the probability?
no, it's just the value of the distribution at that point, not the slope of the distribution
It sure seems that the probability is the integral of the likelihood.
Thank you so much... I'm glad I got to learn something new today
Thanks!
great vid, but I almost feel that it would've been much more clear with discrete distribution.
It's true - a discrete distribution would be a little easier to understand, but most people do t-tests, which are continuous, so I figured I'd start there. however, I may do another video that shows the discrete version.
thanks for another intuitive and concise explanation!
My pleasure!
Thanks Joshua! I love your video series on probability and likelihood :)
Thank god for existing.
:)
As you said ,so clearly explained,Thank you very much.
You are welcome!!! I'm glad you like the video. :)
You stud you... the light shines in...
:)
Strange, almost every book I've read so far talk of P(data|distribution) as likelihood, and P(distribution|data) as posterior. Am I missing something here? 🤔
That is very interesting. The second part, P(distribution | data), is in fact a posterior, as you suspect. The "P" is the important part. It tells you to integrate the function. If it were "L", then we would a likelihood, and that would tell you evaluate the function a specific point. The first part, however, P( data | distribution) being called a likelihood is confusing. Even Bayesian folks tend to observe the "P" means "probability" and "L" means "likelihood" convention. That said, in Baysian statistics, you'll see likelihood functions written like this L( data | distribution). However, in this case, they are calculated just like L( distribution | data). The difference being that Bayesians let the parameters be random variables, and non-Bayesians only let the data be random variables.
P(distribution|data) = [P(data|distribution) * P(distribution)] / P(data) or in Bayesian Terms posterior = likelihood * prior. In the likelihood function we are given a fixed data set (the measurement) and we are postulating the distribution that data set came from. This is what Josh says in the video. Also, P(data) is a constant to make the posterior integrate to 1.
This is pure gold.
Thanks! :)
That still makes "likelihood" a probability though.
no
no
Not exactly. Probability is the integral from a to b of f(x)dx, likelihood is just f(x). If you remember calculus, any integral from a to a is zero, even if the interior function is not.
@@ronaldjensen2948
I've randomly stumbled on the same video again and yes, it says that likelihood is the PDF. I don't know how me from ten months ago arrived at his conclusion.
Why do people down vote this? Really... Great job, Josh. Thank you very much. Kudos to you, and let Karma take care of the rest.
:)
Those are outliers, highly deviated from the mean.
Yes! I'm finding this funny, fascinating and awesome! But I am very high.
Got my problem solved ....quite well explained josh
Awesome! :)
Shyeah, right! Try telling this to the editors of the Wikipedia article on Probability.
I really appreciate for saving my life..
bam! :)
The probability of a mouse weighing exactly 34g Is IS ZERO
Right! It seems he talks about the probability that the mouse has a weight between 33,5 and 34,499999 g. And then we talk again about an area under the graph as before. So the whole video is a fail.
Yeah you need to define conditional probabilities differently for that case.
Yes but he's not talking about the probability of the mouse weighing exactly 34g. The mouse weighs 34g. It's a given. A data point.
The question is how likely this is given a distribution. That's what the likelihood is for.
he never said probability, he said likelihood.
You are absolutely correct, but please refer to the title of the video. *the likelihood of the **_distribution_** (with the stated parameters) given you had a mouse of 34g*
Very simple description of probability and likelihood. Well done!
Thank you very much! :)
Nice explanation but i m afraid i still dont understand whats likelihood..can somebody pls explain in plain words..josh has explained in terms of formula
say you measure one mouse. ONLY one mouse. You get 34g. You know nothing about mice? according to your single experiment it is probably best to put the mean of mice around 34g because that gives the max likelihood for your experiment (i.e because the mean has highest probabilityy. For all you know now the mean is 34g and thats it but you only did 1
now you go ahead and do 2 mass weights one at 34g and one happened to be at 30g. According to your original distribution where mean was at 34 this would not give maximum likelyhood because the mass at 30g is very little represented. You will have to shift your distributiion to MAXIMIZE likelihood. To maximize the probability of all measurements
simple and clear explanation
Thanks!
i thought that the probability of a certain value in continous distribution is always 0. you dont have discrete weights like 32, 33 etc, but 32.00000001 etc. the probability of a mouse being exactly 32 is zero.
That is exactly right. However, the likelihood for a single point is not zero. The reason for this might be more obvious if you saw likelihoods in action. Here's a video that shows how likelihoods are used to estimate parameters for the exponential distribution: ruclips.net/video/p3T-_LMrvBc/видео.html
@@statquest ok I meant it tends towards 0. So even 0.1 is a big value. But i believe u know better math than me.
@@georgesmith3022 No, it's zero. For example, the area of a rectangle that has 0 width is 0. So the area under the curve of a continuous statistical distribution, like the normal distribution, with 0 width is 0.
Best video on likelihood!
Thank you! :)
God ! So hard have I been through, to finally find a intelligent guy to know probability.Hi Josh Starmer Can I talk to you in private? Can I make friends with you.
Can I talk to you in private? Can I make friends with you.
edenlove 828 hotmail.
I came here after watching a 3Blue1Brown video on probability distribution. This video really made me understand that one better. Thanks!
Hooray! :)
did Donald Trump write your lyrics? It was very very very good. Studying stats from Sydney. I come here because I don't understand uni my lectures :(
Hooray! I'm glad to hear that the video was helpful. :)
Great explanation. Please shared more videos for statistics in grad level. Thanks!
Will do!
Anybody else think he was talking about tiny weights that you put on a scale when he said mouse weights? Lmao
I love it! :)
You are a great teacher!
Thank you! :)
The likelihood of a point on a continuous line is epsilon of zero.
Thank you so much Josh 😊😊
Thanks!
I am slightly confused. I have always been taught that the likelihood of a model is the probability of the data given the model, but the notation uses is always in the form of P(Data | Model). this is the opposite of how the video is presenting it.
Both the likelihood and probability use the same exact function, but they are used differently. When you calculate probability, you integrate the function, when you calculate likelihood, you evaluate it at specific points.
@@statquest I think what it confusing is that the textbook definition of the likelihood function L can be written as a conditional probability P such that P(data|model)==L(model|data).
@@RabidMortal1 If the distribution is discrete, like binomial, then P(data | model) == L(model | data). However, when the distribution is continuous, like a normal distribution, then P does not equal L. For more information, check out my videos where I show the likelihood function in action. First, here's how it works with a continuous distribution: ruclips.net/video/p3T-_LMrvBc/видео.html ...and here's how it works for a discrete distribution: ruclips.net/video/4KKV9yZCoM4/видео.html
sometimes people use 'likelihood' to mean 'probability'. in the case you mentioned, it's actually mean 'probabiity'.
I wish i'd known you sooner, keep up the good work!
Thanks!
Yeah, but what's mouse's name? And why do you have a mouse fetish?
Jajaja he loves mouses but his stats videos are very nice.
Your every video makes us happy.... 👍
From India.
Thank you so much 😀