Why do we divide by n-1 to estimate the variance? A visual tour through Bessel correction

Serrano.Academy

Просмотров 12 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 сен 2024

Комментарии • 56

@AJoe-ze6go 3 месяца назад ⁺³
Now you have me questioning whether I really understand Bessel's correction! I always heard the argument from degrees of freedom, and to me that meant that the first sample point contributes nothing to the variance. For example, if your first sample yields a value of 1.5, then the average value is just 1.5/1, which is still 1.5, so the difference is zero. It's not until your second sample (and subsequent samples) that difference becomes meaningful (i.e., there is a value that can be different from the true mean). That has always been my understanding of why you divide by n-1; no matter how many samples you take, the variance can only be a function of n-1 of them, because a single point can contribute no difference.
@ekarpekin 3 месяца назад ⁺⁴
Thank you Luis for the video.
I also had a long time obsession with why the hack is (n-1) instead of n. Well, having watched your video I now explain it to myself as follows:
1) when we calculate the mean value 'mu' out of, say, 10 numbers making up a sample, these 10 numbers are independent for the mean calculation. Knowing the mean number 'mu' of a sample and 10 numbers constituting the sample, we cannot say that all 10 numbers are independent. If we know 'mu', we can compute any single number out of 10 if we know the other 9.
2) Now, when we come to variance, we take the difference between the 'mu' and each of 10 numbers, so we have 10 deltas. Yet, out of these 10 deltas, only 9 are independent, because the another delta we can calculate provided we know 9 numbers and the 'mu'. Hence, for the variance we devide the total sum of deltas by (n-1) - a count of independent deltas (or differences)...
@tamojitmaiti 3 месяца назад ⁺¹
This exact same reasoning seamlessly transitions into ANOVA calculations as well. I personally think the widely accepted proof of unbiasedness of the estimator is intuitive enough. Math always doesn't have to cater to the logical faculties of what a 5 year old can comprehend. I'm a big fan of Luis' content but this video came off as a bit weak in the math intuition part, not to mention super tedious.
@wiwaxiasilver827 3 месяца назад
Ah yes, the degrees of freedom explanation
@wiwaxiasilver827 3 месяца назад ⁺³
The degrees of freedom explanation is similar. It actually also comes up in regression, chi-squared function and F-distribution. Because the variance is the average of differences from the average, the sample itself counts as an entity, and so it takes away a degree of freedom. Technically, it’s always n - 1, but the distribution is assumed to have infinite sample size when we cover the whole population, at which point n - 1 is equivalent to n. Still, it was very nice to learn about the Bar(x) and how it comes out as 2*variance :) It’s also interesting how it seems to look a bit similar to covariance, hence why the n - 1 comes up again in regression :)
@robharwood3538 3 месяца назад ⁺⁶
A while back I came across an explanation of the (n-1) correction term from a Bayesian perspective. (It might have been E. T. Jaynes' book _Probability Theory: The Logic of Science,_ but I can't recall for certain.)
I was hoping you might go over it in this video, but I guess you didn't come across it in your search for the answer.
One thing that is relevant -- and illuminated by the Bayesian perspective -- is that the Bessel correction for *_estimated_* variance implicitly assumes a particular sampling method from the population. In particular, I believe it assumes you are performing sampling _with replacement_ (or that the population is so large that sampling without replacement is nearly identical to with replacement).
But in some non-trivial cases, that may not actually be the case, and so the Bessel correction may not be the appropriate estimator in such cases. For example, if the entire population is a small number, like 10 or 20 or so, then if you sample *_without_** replacement* then the distribution would behave differently. In the same way that a hypergeometric distribution is sometimes better than a binomial distribution, for example.
As an extreme manifestation of this, suppose you sample (without replacement) all 10 items from a population of just 10 items. Then using the Bessel correction would obviously give the wrong 'estimate' of the true variance, which should be divided by n, not (n-1).
A Bayesian approach (supposing that the population size, N=10 is a 'given' assumption) would correctly adjust the 'posterior variance' estimate to the *real* best estimate for sample sizes all the way up to 10, at which point it would be equivalent to the true variance.
Unfortunately, I don't remember how to derive the Bayesian estimate of Variance. But maybe if you found it it might shed even more light on your ultimate question of 'why (n-1)?' and perhaps you could do a follow up video? Just an idea!
Cheers!
@SerranoAcademy 3 месяца назад
Thanks! Ah this is interesting, I think somewhat I’m looking at sampling with no replacement, but in a different light. I like your Bayesian argument, I need to take a closer look and get back.
@tandavme 3 месяца назад ⁺³
Thank you, your videos are always deep and easy to follow!
@Fractured_Scholar 3 месяца назад ⁺¹
The degrees of freedom//n-1 thing has to do with the properties of linear equations. Take a standard form linear equation ax+by=c, where {a,b,c} are known constants and {x,y} are variables. The *instant* you choose a specific x, there is only one possible y that satisfies that equation. This same property is true for linear equations with n variables. The instant you choose specific values for the first n-1 variables, the nth variable is "forced" -- it is no longer a "free choice."
@juancarlosrivera1151 3 месяца назад ⁺⁴
I would use x\bar instead of mu in the right hand side equation (mins 9 or 10)
@Meine_Rede 2 месяца назад
Very intuitive examples and well explained. Thanks a lot.
@michaelzumpano7318 3 месяца назад
I wasn’t too sure after the first two minutes, but you started winning me over. I thought, ok, I’ll watch five minutes. I watched to the end. Good job on this video! Subscribed!
@dragolov 3 месяца назад
Deep respect, Luis Serrano!
@profenevarez 3 месяца назад
Thank you for this breakdown Luis!
@jbtechcon7434 3 месяца назад ⁺⁷
I once got in a shouting match at work over this. I was right.
@cc-qp4th 3 месяца назад ⁺⁶
The reason for dividing by n-1 is that by doing so the sample variance is an unbiased estimator of the population variance.
@bobtivnan Месяц назад
Here's how I think about it. Keep in mind that sample variance is an estimator of population variance, which allows for small manipulations since we don't expect equality. But x_bar should be pretty close to mu for high enough sample sizes. Find the closest sample observation to x_bar and replace it with x_bar. Since we are estimating, I can justify this by noting that the distance to x_bar was small to begin with and this doesn't contribute much to the variation. Then take all of the squared differences to the x_bar except for the one that became x_bar. This leaves n-1 squared differences to calculate the variance. Now you might say, didn't I just make the sample variance larger by dividing by n-1? Yes, but keep in mind that x_bar is actually the x value that minimizes the sum of the square distances in the sample. You can check this with algebra (quadratic-vertex of a parabola) or calculus (derivative- relative min). So mu the population mean actually gives a slightly greater sum of squared differences if it were used instead of x_bar, justifying the n-1 manipulation.
@TheStrings-83639 3 месяца назад
I think I can get the main idea of why we do this in variance, but how does it work on various types of statistical tests? Like, I can get why we'd substract more than one when doing a t-test, because each coefficient would be like a mean value with the sample variance of theirs, but how to derive this fact using this method of Bariance?
@epepchuy 2 месяца назад
Exvelente explciacion!!!
@uroy8665 3 месяца назад
Thank you for detail explanation, I came to knw about new BAR function. About (n-1) , at start I thought this: let's say we have 100 people , and we want to find variance of height, suppose there is one person with exact mean height, in that case mean will be correct by dividing by 100, but not the variance by dividing by 100, as one term will be zero and that will lower the variance, if there is no people with perfect mean then dividing by 100 is good for variance, but latter thought if 2 /3/4 etc persons have height of mean, then that would not work. Anyway after watching this video, my thinking changed and better as I am not from STAT.
@SerranoAcademy 3 месяца назад
Thanks, that’s a great argument! Yeah at some point I was thinking about it in a similar way, or considering having an extra person with the mean height. I couldn’t finish the argument but I believe that’s an alternate way to obtain the n-1.
@levi-civita1360 3 месяца назад ⁺²
I have read a book on statistics "Introduction to Probability and Statistics for Engineers and Scientists" by Sheldon M. Ross and there he used Let d = d(X) be an estimator of the parameter θ. Then
bθ (d) = E[d(X)] − θ is called the bias of d as an estimator of θ. If bθ (d) = 0 for all θ, then we say that d is
an unbiased estimator of θ.
and he proves if we use formula of sample variance with (n-1) then we get unbiased estimator other wise not.
@TruthOfZ0 3 месяца назад ⁺¹
The Variance of the world n-1 to exclude the observer that calculates that xD
@SerranoAcademy 3 месяца назад ⁺¹
lol! Good one 😂
@sahhaf1234 3 месяца назад
This is a superb explanation.
@weisanpang7173 3 месяца назад ⁺²
The algebraic explanation of bariance vs variance was somewhat sloppy.
@numeroVLAD 3 месяца назад
Yeah, it’s missing the formula that relates mean and variance. But I think the author intended audience is one that already knows that formula well
@archangecamilien1879 3 месяца назад ⁺¹
I know someone who was obsessed with knowing that, lol, back in the day...didn't manage to find a good explanation...there are other things he tried to understand the reasons for, lol...he wasn't sure that many others cared...well, lol, eventually he didn't care much himself, but at a time he did...
@archangecamilien1879 3 месяца назад
Lol...2:35...I hadn't reached that part before I made that comment...so, lol, the person in question wasn't the only one who would obsess over things like that in math...they often just feed you something without explanation, lol...even if you are a math major, I suppose they are thinking "They'll get it later"...they just tell you "You do this", lol...they also just fed the Jacobian to the students in the person's class, without explanation...well, lol, I suppose the student in question didn't have a textbook, but he doubts they explained where the Jacobian comes from in the textbook...
@archangecamilien1879 3 месяца назад
...he would search online, lol...I don't think there were many math videos back then, or perhaps there were and he didn't notice them...
@archangecamilien1879 3 месяца назад
Lol...the person in question didn't even really understand what was meant by "degrees of freedom"...I mean, lol...they would just throw the term around..."if you have a sample of 6 elements, there are 5 degrees of freedom", I think it could get more complicated than that, forgot, like the product of two sample spaces or something?...Not sure, lol...they would then do some other gymnastics...but they would just throw in the word "degrees of freedom" like it was a characteristic, like height, eye color, hair color, etc, lol...like, there would be tables, and they would inform you how many degrees of freedom there were, and that's the only times I would see the term ever appear...and everyone else seemed fine with it, lol, or maybe the student in question was just an idiot and everyone else had an intuitive sense of what was going on (he says he doubts it, lol)...
@SerranoAcademy 3 месяца назад ⁺¹
lol! Your friend sounds a lot like me 😊
@MMarcuzzo Месяц назад
N-1 makes an unbiased estimator. While N does not. Before are consistent, though
@PT-dz3iv 3 месяца назад ⁺¹
By the picture at 15:11, I think you confused two different sampling models: 1) a sample with 2 iid random chosen elements, which should allow repetitions; 2) a sample without replacement. Your calculation here eliminates the repetitions, so you are indeed dealing with the 2nd model at this point but all your video is intending to explain the i.i.d samples. That is not correct. In fact, in the iid case, the original definition of variance (7.5 in your example) is correct, but you need to use Bessel correction when you estimate. So the estimates corresponding to your picture at 15:11 would be 0.5,8,24.5,4.5,18,4.5. The average is 120/16=7.5, which is the real variance.
@f1cti 3 месяца назад
Great video!! Thanks so much, Luis! One question, though: when calculating the BARIANCE, why is the distance between two points calculated twice? Say we have only two points A and B...I have´t been able to wrap around my head around why we need to calculate A - B and B-A. Thanks!
@SerranoAcademy 3 месяца назад
Thank you! Great question! If we’re taking different points, it’s exactly the same to do:
A-B and divide by 1
Than to do
A-B and B-A and divide by 2.
This is because when you take pairs of distinct points, it’s the same to take them ordered than unordered, only by a factor of 2.
However, things change when we allow for repetition. If we allow A-A and B-B, then we have to consider all the possibilities for the first and for the second. So we need to throw in A-B and B-A, and then divide by 4. If we were to only take A-B and not B-A, and still take the two differences that are 0, then we end up not counting all the cases.
It’s a subtle point, but please let me know if something is not clear or if you have any further questions!
@f1cti 3 месяца назад
@@SerranoAcademy I greatly appreciate your response Luis!! Your explanation does clear things up a bit more, but I still wonder why we allow A-A and B-B in the first place: by definition a point has no distance from itself, so why allow it?
@SerranoAcademy 3 месяца назад ⁺¹
@@f1cti Thank you! Yes, exactly, it's a bad idea to pick A-a and B-B. So here's the thing:
If you pick A-A and B-B, you have a bad estimate. In this estimate, you divide by n*n.
If you don't pick A-A and B-B, you get a good estimate. In this estimate, you divide by n*(n-1) (because you're removing the n pairs).
So you changed an n by an n-1 in the denominator to get from a bad to a good estimate. And that's exactly what Bessel correction says!
@DiegoSilva-dv9uf 2 месяца назад
Valeu!
@SerranoAcademy Месяц назад
Thank you again, Diego, what a kind contribution! If you have any suggestions for videos you'd like to see, please let me know. Have a great day!
@AbhimanyuKumar-ke3qd 3 месяца назад
5:54 can you please explain why we square in order to remove negative values....we could have taken absolute values as well i.e., |x1 - u| + |x2 - u| ....
Same doubt in case of linear regression, least squares...
@SerranoAcademy 3 месяца назад
Great question! We can square or take absolute values. Same thing for regression. When you do absolute values for regression, it’s called L1, when you do squares it’s called L2.
I think the reason squares are more common is because a sum of squares is easier to differentiate. The derivative of an absolute value has a discontinuity at zero because th function y=|x| has a spike, while the function y = x^2 is smooth.
@AbhimanyuKumar-ke3qd 3 месяца назад
@@SerranoAcademy wow! Never thought about it in terms of differentiability...
Thank you so much!
If you can make a video on it, it would be very helpful
@vikraal6974 3 месяца назад ⁺³
Gauss proved that least squares are the best estimators for regression analysis. We could artificially create more estimators such as (x-mu)^4 which will behave just like the squared variant but it will be a bad estimate. Least Squares connect many different areas of mathematics such as Linear Algebra, Functional Analysis, Measure Theory and Statistics.
@AbhimanyuKumar-ke3qd 3 месяца назад
@@vikraal6974 Thanks ✨
@santiagocamacho264 3 месяца назад
@ 8:20 you say "Calculating the correct mean". Do you probably mean (no pun intended) "calculating (Estimating) the correct variance"?
@SerranoAcademy 3 месяца назад ⁺¹
Ah good catch! Yes that’s what I **mean**t, 🤣 gracias Santi! 😊
@ASdASd-kr1ft 3 месяца назад
I dont fully understand why Bar(x) = 2*Var(x)
@SerranoAcademy 3 месяца назад
Yes good point. I don't fully understand it either. I can see why it's bigger, since you're looking at distances from two different points, instead of a point in the middle. As for why it's twice, I have the mathematical proof which happens to work out if you expand it, but I'm still looking for a good intuitive visual explanation. If anything comes to mind, please let me know!
@Tom-sp3gy 3 месяца назад
Me too! I was always mystified by it
@KumR 3 месяца назад
Whoa.....
@layt01 3 месяца назад
Fun fact: in Spanish "V" and "B" are pronounced the same (namely as "B"). Bery good bideo, one of the vest eber!
@mcan543 3 месяца назад ⁺⁴
**[**0:00**] Introduction and Bessel's Correction**
- Introducing Bessel's Correction and why we divide by \( n-1 \) instead of \( n \) to estimate variance.
**[**0:12**] Introduction to Variance Calculation**
- Explaining the premise of calculating variance and introducing the concept of estimating variance using a sample instead of the entire population.
**[**1:01**] Definition of Variance**
- Defining variance as a measure of how much values deviate from the mean and outlining the basic steps of variance calculation.
**[**1:52**] Introduction to Bessel's Correction**
- Discussing why we divide by \( n-1 \) when calculating variance and introducing Bessel's Correction.
**[**2:35**] Challenges of Bessel's Correction**
- Sharing personal challenges in understanding the rationale behind Bessel's Correction and discussing my research process on the topic.
**[**3:20**] Alternative Definition of Variance**
- Presenting an alternative definition of variance to aid in understanding Bessel's Correction and expressing curiosity about its presence in the literature.
**[**4:45**] Quick Recap of Mean and Variance**
- Briefly revisiting the concepts of mean and variance, demonstrating how they are calculated with examples, and explaining how variance reflects different distributions.
**[**7:05**] Sample Mean and Variance Estimation**
- Explaining the challenges of estimating the mean and variance of a distribution using a sample and discussing why sample variance is not a good estimate.
**[**8:49**] Bessel's Correction and Why \( n-1 \) is Used**
- Explaining how Bessel's Correction provides a better estimate of variance and why we divide by \( n-1 \) instead of \( n \). Emphasizing the importance of making a correct variance estimate.
**[**10:51**] Why Better Estimation Matters?**
- Discussing why the original estimate is poor and why making a better estimate is crucial. Explaining the significance of sample mean as a good estimate.
**[**13:02**] Issues with Variance Estimation**
- Illustrating the problems with variance estimation and demonstrating with examples why using the correct mean is essential for accurate estimates. Explaining the accuracy of estimates made using \( n-1 \).
**[**15:04**] Introduction to Correcting the Estimate**
- Discussing the underestimated variance and the need for correction in estimation.
**[**15:57**] Adjusting the Variance Formula**
- Explaining the adjustment in the variance formula by changing the denominator from \( n \) to \( n - 1 \).
**[**16:22**] Calculation Illustration**
- Demonstrating the calculation process of variance with the adjusted formula using examples.
**[**16:57**] Better Estimate with Bessel's Correction**
- Discussing how the corrected estimate provides a more accurate variance estimation.
**[**18:24**] New Method for Variance Calculation**
- Introducing a new method for calculating variance without explicitly calculating the mean.
**[**20:06**] Understanding the Relation between Variance and Variance**
- Explaining the relationship between variance and variance, and how they are related mathematically.
**[**21:52**] Demonstrating a Bad Calculation**
- Illustrating a flawed method for calculating variance and explaining the need for correction.
**[**23:37**] The Role of Bessel's Correction**
- Explaining why removing unnecessary zeros in variance calculation leads to better estimates, equivalent to Bessel's Correction.
**[**25:08**] Summary of Estimation Methods**
- Summarizing the difference between the flawed and corrected estimation methods for variance.
**[**26:02**] Importance of Bessel's Correction**
- Emphasizing the significance of Bessel's Correction for accurate variance estimation, especially with smaller sample sizes.
**[**30:19**] Mathematical Proof of Variance Relationship**
- Providing two proofs of the relationship between variance and variance, highlighting their equivalence.
**[**35:24**] Acknowledgments and Conclusion**
@SerranoAcademy 3 месяца назад ⁺¹
Thank you so much! @mcan543
@SerranoAcademy 3 месяца назад ⁺²
I pasted it into the comments, it's a really good breakdown. :)

Следующие

Автовоспроизведение

What are degrees of freedom?!? Seriously.