After 3 videos, I finally understood this n-1. Basically when we consider a sample from our population and calculate the mean for it, it may or may not be as close to the overall population mean (which is thr mean that matters) so to lower the possibility of a highly distinct sample mean/variance we use n-1 to reach at least near the population mean...
@@ordiv12345in the sum it is squared, so it actually doesnt matter because a negative^2 is positive and a positive^2 is positive, take (1-5)^2 and (9-5)^2 for example, they make the same value
@@baterwottle2400 Both positive and negative deviations become positive when squared, which ensures that variance measures the magnitude of deviations without cancellation due to signs.
The reason why one divides by n-1 instead of n is because of the definition of bias and unbiased. An unbiased estimate has the property such that the expected value (average) of the estimate is equal to the desired parameter. Assuming that the population has a variance of q^2, then the expected value of (1/(n-1))(Sum from i=1 to infinity of (X_i - X bar)^2 = E(S^2) [unbiased variance] also equals 1/(n-1) [Sum from i=1 to infinity of (q^2 - n(q^2 /n))] = q^2. If it were not n-1, the math wouldn't work. ...that's all. It's hard to explain without Latex, sorry. Source: My textbook.
Or maybe just think of partitioning things into groups. For example, there are n groups, so I need n-1 dividers. ... Not really a proof, but maybe it's more intuitive.
Thanks for this video Sal. Though intuitive and true, some viewers might find this approach (to dealing with the "bias" in the estimator) heuristic. For instance, one might argue "why not n-2 and so on...". If you decide to invest a bit more in this stats playlist, I hope you'll get to deeper concepts like degrees of freedom of estimators, which lie at the heart of the concept of this video. Please don't take this as criticism; the video is in the right direction :)
It has to do with the fact that on an interval with N points there are N-1 smallests subintervals. Consider for example the interval [1,4] on the natural number line in which case N=4 You can subdivide it only in [1,2] [2,3] [3,4] which is 3 not 4 smallest subintervals.
By the way for anyone curious, the "degrees of freedom" of some statistic, say, a sum across the x's, is n because this number has n ways or parameters (the x's themselves) by which it can vary. Using this simple notion of "freedom", you can state the d.o.f.s of the any statistic that is written in terms of some data points. As another example, the sum across the x's squared also has n d.o.f.s
The most common question seems to be why n-1 and not n-2 or n-3424342 (any other number). The way I understand it comes from the definition of unbiased estimators (look it up on wikipedia), in a nutshell an unbiased estimator is one whose expected value equals the value it is estimating. n-1 is known as bessel's correction (also on wikipedia). Here you can see that E[S^2]=sigma^2, hence it is unbiased. This makes sense; if you take enough samples and average them, you get true pop value.
A very interesting and important discussion. I made a break in the middle and thought about it by myself. I have a rather short explanation: If the sample size n is very small, such as 3, the variance calculated for the sample has more chance to be very different from the actual variance. The smaller the n is, the more effect has this '-1' on the result. Why do we use '-1' and not some other values like '-2', I think it is just a tradition. For the smallest sample size of 2, this unbiased variance can still be calculated. However, it is not really purely 'unbiased', just relatively 'unbiased'.
Being biased or unbiased can be formally prooved. For sample standard deviation to be unbiased means that its mathematical expectation should be equal sigma. If we divide just by n then this condition wouldn't be met. And if we divide by (n-1) the mathematical expectation of our sample standard deviation would be exactly equal sigma (st.dev. of population). For more details and for the mathematical proof look for Bessel's correction.
It never claimed to give mathematical proof. The title says it's a video to give you an intuitive understanding of n-1, so it's going to be superficial. If you want proof, you should see the video "Proof that the Sample Variance is an Unbiased Estimator of the Population Variance" on the channel of jbstatistics. Anyway, it's true that sometimes n-2, or n-3 or n-4 etc. are better than n-1. However, ON AVERAGE, n-1 is the best estimator of the population variance. The following simulation is a good demonstration of this www.khanacademy.org/computer-programming/will-it-converge-towards-1/1167579097
The proof from 8:11 onwards does not seem rigorous. Correct me if I'm wrong. The sum of squares of the distances the 3 samples have to pop. mean, divided by 3, could potentially be an over-estimation of the pop. variance. Because you could have picked 3 samples that are all far from the pop. mean, whereas in the population there are points much closer to the pop. mean. Which means: even the sum of the squares of the distances the 3 samples have to the sample mean, divided by 3, is always smaller than the sum of squares of the distances the 3 samples have to pop. mean, divided by 3, it is still not enough to show that the sum of the squares of the distances the 3 samples have to the sample mean, divided by 3, is always smaller than the pop. variance.
(I had the same question, and I've copy pasted this from someone else which answers this question:) That's not true, because you "square" the difference between sample mean and the variables to get the sample variance. For this reason the average of the difference between your measurements and the sample mean "are always closer", than to the real mean and that's what he wants to explain and why they decrease the value of the numerator. (Also here's the empirical evidence for the why it's the "n - 1" that is in the denominator:) ruclips.net/video/Cn0skMJ2F3c/видео.html
So instead of the sample lying somewhere much lower than the true population mean, what if it's lying much higher? Would it be correct to use n+1 instead of n-1 in order to deliberately make the sample variance smaller?
No, because the population mean is still outside of your sample (doesn't matter whether it's too big outside or too small outside), so your sample variance would still be smaller than your population variance, so you'd still use n-1
9:08 - You are just as likely to be overestimating, you just chose to pick the bottom points rather than the top ones. This offers literally NO explanation, let alone an intuitive one, as to why I should expect there to be a downwards bias.
Whether it's the top set or bottom set, the VARIANCE will be an overestimate not the MEAN. I realise this may not be perfectly clear in the video but in order to get a true (unbiased) representation of the variance, you would need values that are very far apart and even then it wouldn't be an OVERestimate of the variance.
variance is the measurement of the distance between each point and the mean. It does not matter whether the mean of the sample is above or below the mean of the population. still, i do not understand why it is n - 1 rather than n - 2 or n -4 or so on
8:40 I think you should not represent the true variance and the sample variance on the same number line you drew for the population points. Also the consequence of your putting them together is you're visualizing the distance between the sample variance and the population variance on the same number line, resulting in your conclusion that because the sample points are far from the population mean, the variance is far too. Ponder over it, you'll realize. Love your lectures BTW 😃
This is essential reading. A book of similar stripe became a cornerstone in my personal growth. "Game Theory and the Pursuit of Algorithmic Fairness" by Jack Frostwell
So this means that the n-1 of the sample variance equation was just an arbitrarily chosen value because it's empirically closer to the actual population variance? Or is there any equation or a logical path in deriving the n-1? I kinda see that it's the former but kinda feel that there might be a theory that could explain why n-1 is the most appropriate and not any other value and that it's just a natural consequence of our math. Anyone who does have one, please tell me! Thank you for the video Khan Academy! It was very informative!
There is a reason why exactly n-1 gives the unbiased estimate, you can get it if you take a more theoretic look at the expectation value of the sample variance. If you do that, you mathematically can solve why it has to be n-1.
Knowledge is the only hope for world peace. We must have save trench town. As an actual real world issues that can be mathematically save the world from this little island. If you you can do it!
Say there's a population with a known population mean, and you take N random values from it, is there a way to calculate a probability density of deviance of the sample mean from the population mean? I hope that was a coherent question.
What if the sample mean is far greater than the population mean, then would you not divide by n+1 in order that your sample mean is not an overestimate?
We all hold the key to our part save the world by using and combining knowledge to promote peace throughout the world. I'm starting it off as an inventor and entrepreneur
Thank you very much for your video, it was very very good at explaining. But I have one more question, If descriptive statistics do not try to generalize to a population (since there is no uncertainty in descriptive statistics), then why does the sample standard deviation try to best estimate the population mean? Yet it is still considered a descriptive statistic
still dont get it. yes you would be underestimating it if u take the sample cluster below the mean. but if the cluster is above the mean? you would be overestimating it! seems arbitrary to me.
+john smith That's not true, because you square the difference between sample mean and the variables to get the sample variance. For this reason the average of the difference between your measurements and the sample mean are always closer, than to the real mean and that's what he wants to explain and why they decrease the value of the numerator. The only thing I still don't really get is, why it's (n -1)..
I don't know either, but I bet if you examine all possible sets of samples, for a given sample size and population, and compare their variance(s) against the population's variance, then the 'n-1' idea has a higher chance of being closer to the true variance than the plain 'n' idea. Further the "n-2" idea would probably have even a smaller chance than the "n-1". There's probably a cut off point too, where, for example, if your sample size has 99% of your population then n is better....
Let's say a report comes out that mentions standard deviation. How are we supposed to know which formula was used to calculate that standard deviation.
I GET IT! I had to work out the proof and think about it really hard, but I get it! I have an intuition for why n-1 makes sense! Message me with your questions, because I don't think I can explain it easily in the comment boxes.
I get the math.... What I don't get is how you're able to write with the drawing/annotation feature so freakin' nicely?!?!? Either you missed your calling as a steady-handed microsurgeon or there is some sort of stabilization assistance with the program you're using.
Doesn't explain the point sal, sample could've been among the higher than mu values only; in that case this would be completely opposite, we should've divided by n+1 then
The analogy you’re using is probably not very convincing/intuitive enough. Because there’s also a likelihood that the sample is over-estimating the population mean, so why don’t we divide it by n+1?
NOT one of Khan Academy's shining moments. You're other video (thanks Dhiraj Budhrani) is MUCH better (with the simulation & a mathematical explanation!).
starts at 5:05
ur a hero
After 3 videos, I finally understood this n-1. Basically when we consider a sample from our population and calculate the mean for it, it may or may not be as close to the overall population mean (which is thr mean that matters) so to lower the possibility of a highly distinct sample mean/variance we use n-1 to reach at least near the population mean...
What if you take the 3 highest values?
@@ordiv12345in the sum it is squared, so it actually doesnt matter because a negative^2 is positive and a positive^2 is positive, take (1-5)^2 and (9-5)^2 for example, they make the same value
@@baterwottle2400 Both positive and negative deviations become positive when squared, which ensures that variance measures the magnitude of deviations without cancellation due to signs.
whos this man? he knows so much and explains so majestic. I wonder why he does not have a statue in the main square of my city ? he deserve a few
So why minus - 1? Why not - 2 ? Or minus 6,345 % ? This is still not an explanation of the n - 1 :-(.
The reason why one divides by n-1 instead of n is because of the definition of bias and unbiased. An unbiased estimate has the property such that the expected value (average) of the estimate is equal to the desired parameter.
Assuming that the population has a variance of q^2, then the expected value of (1/(n-1))(Sum from i=1 to infinity of (X_i - X bar)^2 = E(S^2) [unbiased variance] also equals 1/(n-1) [Sum from i=1 to infinity of (q^2 - n(q^2 /n))] = q^2.
If it were not n-1, the math wouldn't work. ...that's all.
It's hard to explain without Latex, sorry.
Source: My textbook.
Or maybe just think of partitioning things into groups. For example, there are n groups, so I need n-1 dividers.
... Not really a proof, but maybe it's more intuitive.
Michael Esplin i think you use n-1 because you lose one degree of freedom by estimating the mean(x-bar)
+f lotars Watch video D1hgiAla3KI
This explains why it's (n-1) ruclips.net/video/Cn0skMJ2F3c/видео.html
Thanks for this video Sal. Though intuitive and true, some viewers might find this approach (to dealing with the "bias" in the estimator) heuristic. For instance, one might argue "why not n-2 and so on...". If you decide to invest a bit more in this stats playlist, I hope you'll get to deeper concepts like degrees of freedom of estimators, which lie at the heart of the concept of this video. Please don't take this as criticism; the video is in the right direction :)
11 years later on 11/27/23, but thank you for introducing me to *heuristic.* Really neat definition. 😌💓🖋️
It has to do with the fact that on an interval with N points there are N-1 smallests subintervals. Consider for example the interval [1,4] on the natural number line in which case N=4 You can subdivide it only in [1,2] [2,3] [3,4] which is 3 not 4 smallest subintervals.
yeah but why are we dividing by the number of subintervals instead of the number of data point, dividing by the data points would make more sense.
By the way for anyone curious, the "degrees of freedom" of some statistic, say, a sum across the x's, is n because this number has n ways or parameters (the x's themselves) by which it can vary. Using this simple notion of "freedom", you can state the d.o.f.s of the any statistic that is written in terms of some data points. As another example, the sum across the x's squared also has n d.o.f.s
The most common question seems to be why n-1 and not n-2 or n-3424342 (any other number). The way I understand it comes from the definition of unbiased estimators (look it up on wikipedia), in a nutshell an unbiased estimator is one whose expected value equals the value it is estimating. n-1 is known as bessel's correction (also on wikipedia). Here you can see that E[S^2]=sigma^2, hence it is unbiased. This makes sense; if you take enough samples and average them, you get true pop value.
A very interesting and important discussion. I made a break in the middle and thought about it by myself. I have a rather short explanation: If the sample size n is very small, such as 3, the variance calculated for the sample has more chance to be very different from the actual variance. The smaller the n is, the more effect has this '-1' on the result.
Why do we use '-1' and not some other values like '-2', I think it is just a tradition. For the smallest sample size of 2, this unbiased variance can still be calculated. However, it is not really purely 'unbiased', just relatively 'unbiased'.
what happens if we pick samples from far right side in above example ? n + 1 ?
@@user-jk1gb7wm6z honestly i only understood this through mathematical proof
Being biased or unbiased can be formally prooved. For sample standard deviation to be unbiased means that its mathematical expectation should be equal sigma. If we divide just by n then this condition wouldn't be met. And if we divide by (n-1) the mathematical expectation of our sample standard deviation would be exactly equal sigma (st.dev. of population).
For more details and for the mathematical proof look for Bessel's correction.
this does not give an explanation for why it is exactly n-1.
Yeah - why not n-2?
It never claimed to give mathematical proof. The title says it's a video to give you an intuitive understanding of n-1, so it's going to be superficial. If you want proof, you should see the video "Proof that the Sample Variance is an Unbiased Estimator of the Population Variance" on the channel of jbstatistics.
Anyway, it's true that sometimes n-2, or n-3 or n-4 etc. are better than n-1. However, ON AVERAGE, n-1 is the best estimator of the population variance. The following simulation is a good demonstration of this www.khanacademy.org/computer-programming/will-it-converge-towards-1/1167579097
This explains why it's (n-1) ruclips.net/video/Cn0skMJ2F3c/видео.html
Chi distribution mean squared is n-1
Much better than what my school teacher taught me
The proof from 8:11 onwards does not seem rigorous. Correct me if I'm wrong.
The sum of squares of the distances the 3 samples have to pop. mean, divided by 3, could potentially be an over-estimation of the pop. variance. Because you could have picked 3 samples that are all far from the pop. mean, whereas in the population there are points much closer to the pop. mean.
Which means: even the sum of the squares of the distances the 3 samples have to the sample mean, divided by 3, is always smaller than the sum of squares of the distances the 3 samples have to pop. mean, divided by 3, it is still not enough to show that the sum of the squares of the distances the 3 samples have to the sample mean, divided by 3, is always smaller than the pop. variance.
I had the intuition that overestimation and underestimation would compensate each other. Why is it not the case?
What if you take the 3 highest values?
6:10 Why we divide by n - 1 in variance
what if all the samples you took were greater than the mean? then you would be overestimating even more if you divide by n-1
(I had the same question, and I've copy pasted this from someone else which answers this question:) That's not true, because you "square" the difference between sample mean and the variables to get the sample variance. For this reason the average of the difference between your measurements and the sample mean "are always closer", than to the real mean and that's what he wants to explain and why they decrease the value of the numerator.
(Also here's the empirical evidence for the why it's the "n - 1" that is in the denominator:)
ruclips.net/video/Cn0skMJ2F3c/видео.html
yes exactly....
Not really , that's the point of variance , since it is a square root then it the same if you take negative or positive numbers
So instead of the sample lying somewhere much lower than the true population mean, what if it's lying much higher? Would it be correct to use n+1 instead of n-1 in order to deliberately make the sample variance smaller?
No, because the population mean is still outside of your sample (doesn't matter whether it's too big outside or too small outside), so your sample variance would still be smaller than your population variance, so you'd still use n-1
9:08 - You are just as likely to be overestimating, you just chose to pick the bottom points rather than the top ones. This offers literally NO explanation, let alone an intuitive one, as to why I should expect there to be a downwards bias.
I think we've all had teaching experiences where we've dropped the ball. It's just strange to see Khan Academy put it to video lol.
Whether it's the top set or bottom set, the VARIANCE will be an overestimate not the MEAN. I realise this may not be perfectly clear in the video but in order to get a true (unbiased) representation of the variance, you would need values that are very far apart and even then it wouldn't be an OVERestimate of the variance.
variance is the measurement of the distance between each point and the mean. It does not matter whether the mean of the sample is above or below the mean of the population. still, i do not understand why it is n - 1 rather than n - 2 or n -4 or so on
Why do we not use |Xi - x̄ | instead of (Xi - x̄ )² ?
8:40 I think you should not represent the true variance and the sample variance on the same number line you drew for the population points. Also the consequence of your putting them together is you're visualizing the distance between the sample variance and the population variance on the same number line, resulting in your conclusion that because the sample points are far from the population mean, the variance is far too. Ponder over it, you'll realize.
Love your lectures BTW 😃
Good point. I had the same point. You better framed it into words
Because of the upper and lower boundaries, samples are biased to be less spread, compared to the population mean, which is typically more centralized.
This is essential reading. A book of similar stripe became a cornerstone in my personal growth. "Game Theory and the Pursuit of Algorithmic Fairness" by Jack Frostwell
Awesome video! Thank you!
So this means that the n-1 of the sample variance equation was just an arbitrarily chosen value because it's empirically closer to the actual population variance? Or is there any equation or a logical path in deriving the n-1? I kinda see that it's the former but kinda feel that there might be a theory that could explain why n-1 is the most appropriate and not any other value and that it's just a natural consequence of our math. Anyone who does have one, please tell me!
Thank you for the video Khan Academy! It was very informative!
There is a reason why exactly n-1 gives the unbiased estimate, you can get it if you take a more theoretic look at the expectation value of the sample variance. If you do that, you mathematically can solve why it has to be n-1.
Knowledge is the only hope for world peace. We must have save trench town. As an actual real world issues that can be mathematically save the world from this little island. If you you can do it!
Say there's a population with a known population mean, and you take N random values from it, is there a way to calculate a probability density of deviance of the sample mean from the population mean?
I hope that was a coherent question.
What if the sample mean is far greater than the population mean, then would you not divide by n+1 in order that your sample mean is not an overestimate?
It would still be n-1 since the difference is always positive where squaring.
We all hold the key to our part save the world by using and combining knowledge to promote peace throughout the world. I'm starting it off as an inventor and entrepreneur
Thank you very much for your video, it was very very good at explaining. But I have one more question, If descriptive statistics do not try to generalize to a population (since there is no uncertainty in descriptive statistics), then why does the sample standard deviation try to best estimate the population mean? Yet it is still considered a descriptive statistic
I would like to know why we use the square of the difference between x and xbar, and not the absolute value of the difference?
still dont get it. yes you would be underestimating it if u take the sample cluster below the mean. but if the cluster is above the mean? you would be overestimating it! seems arbitrary to me.
+john smith That's not true, because you square the difference between sample mean and the variables to get the sample variance. For this reason the average of the difference between your measurements and the sample mean are always closer, than to the real mean and that's what he wants to explain and why they decrease the value of the numerator. The only thing I still don't really get is, why it's (n -1)..
This explains why it's (n-1) ruclips.net/video/Cn0skMJ2F3c/видео.html
I don't know either, but I bet if you examine all possible sets of samples, for a given sample size and population, and compare their variance(s) against the population's variance, then the 'n-1' idea has a higher chance of being closer to the true variance than the plain 'n' idea. Further the "n-2" idea would probably have even a smaller chance than the "n-1". There's probably a cut off point too, where, for example, if your sample size has 99% of your population then n is better....
How's life?
Let's say a report comes out that mentions standard deviation. How are we supposed to know which formula was used to calculate that standard deviation.
I don't get it.. What if you sample sits all the way to the right ? The unbiased variance would then be even further...
I GET IT! I had to work out the proof and think about it really hard, but I get it! I have an intuition for why n-1 makes sense! Message me with your questions, because I don't think I can explain it easily in the comment boxes.
Why should I expect the random selection of points to be an underestimate, when it's just as likely to be an overestimate?
thank you sal :4)
Starts at 5.00
Hi
How is this S2 variance of sample different from the sigma squared /n formula ( population variance /n) which is also the sample variance
thanks
You said that the biased variance was an underestimate, so is it possible to overestimate?
n-2
Why isn't this video on the statistics playlist?
But the same can be there for the other end where we would overestimate it?
Is n-1 mathematically derived?
Could we justify doing something else, e.g. using "0.85n" to build in conservativeness even for large n?
I get the math.... What I don't get is how you're able to write with the drawing/annotation feature so freakin' nicely?!?!? Either you missed your calling as a steady-handed microsurgeon or there is some sort of stabilization assistance with the program you're using.
I dont get it.
Yes the error will be smaller, but why we dont divide by n-2, or n-3 or n-4 , etc...
I love you, fuck the rest of explanations on internet, this made me understand
Doesn't explain the point sal, sample could've been among the higher than mu values only; in that case this would be completely opposite, we should've divided by n+1 then
As n approaches N, s_n approaches sigma, but s_n-1 approaches something that is not sigma. So what gives?
Hey, help me resolve world economics. Bringing knowledge
we cant
But we divide by n-1 even if we have 10000 samples, what difference n-1 will make?
This is terrible. Still no explanation of why it is unbiased if using n-1.
Perhaps that tends to be overdoing it?
So I guess the biased variance is better if your sample is still close to the entire population
The analogy you’re using is probably not very convincing/intuitive enough. Because there’s also a likelihood that the sample is over-estimating the population mean, so why don’t we divide it by n+1?
by this logic it can be n+1 also ig
Didn't say anything about n-1, misleading title.
This explains why it's (n-1) ruclips.net/video/Cn0skMJ2F3c/видео.html
N-1 is "better", but it is still very flawed
n-1 D:
NOT one of Khan Academy's shining moments. You're other video (thanks Dhiraj Budhrani) is MUCH better (with the simulation & a mathematical explanation!).
FIRST!
thank you so fucking much for this!!!
But you haven't explained it ! LOL
That was unclear.
What is bogus logic....khan academy is jack of all trade,master of none
...
This is not explained at all.