Unbiased Estimators (Why n-1 ???) : Data Science Basics

ritvikmath

Просмотров 43 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 сен 2024

Комментарии • 88

@davidszmul2141 3 года назад ⁺²⁸
In order to be even more practical, I would simply say that:
- Mean: You only need 1 value to estimate it. (Mean is the value itself)
- Variance: You need at least 2 values to estimate it. Indeed the variance estimates the propagation between values (the more variance, the more spreaded around the mean it is). It is impossible to get this propagation with only one value.
For me it is sufficient to explain practicaly why it is n for mean and n-1 for variance
@chonky_ollie 2 года назад
Best and shortest example I’ve ever seen. What a gigachad
@YusufRaul 3 года назад ⁺⁴¹
Great video, now I understand why I failed that test years ago 😅
@venkatnetha8382 3 года назад
payhip.com/b/ndY6
@jamiewalker329 3 года назад ⁺¹⁸
How I think about it: suppose you have n data points: x1, x2, x3, x4.., xn. We don't really know the population mean, so let's just pick the data point on our list which is closest to the sample mean, and use this to approximate the population mean. Say this is xi
We can then code the data, by subtracting xi from each element - but this doesn't affect any measure of spread (including the variance). But then after coding we will have a ist x1', x2', ...., xn' but the i'th position will be 0. Then only the other n-1 data points will contribute to the spread around the mean, so we should take the average of these n-1 square deviations.
@gfmsantos 3 года назад ⁺¹
I guess the only other n-1 data points will contibuite to the spread around zero not the mean.... I got lost.
@jamiewalker329 3 года назад ⁺¹
@@gfmsantos 0 is the mean of the coded data.
@gfmsantos 3 года назад ⁺¹
@@jamiewalker329 Yes, but you didn't know the mean before you chose the point. As far as I understood, you've just picked a point that might be close to the sample mean, haven't you?
@jamiewalker329 3 года назад ⁺²
@@gfmsantosYes, the sample mean. It's not supposed to be rigorous, just a way of thinking that given any data point as a reference point then there are n-1 independent deviations from that point. One data point gives zero indication of spread. With 2 data points, only the 1 distance between them would give an indication of spread, and so on...
@gfmsantos 3 года назад ⁺¹
@@jamiewalker329 I see. Good. Thanks
@Physicsnerd1 3 года назад ⁺⁷
Best explanation I've seen on RUclips. Excellent!
@ritvikmath 3 года назад ⁺¹
Wow, thanks!
@abderrahmaneisntthatenough6905 3 года назад ⁺¹⁸
I wish you provide all math related to ml and data science
@Matthew-ez4ze 11 месяцев назад ⁺¹
I am reading a book on Jim Simons, who ran the Medallion fund. I’ve gone down the rabbit hole of Markov chains and this is an excellent tutorial. Thank you.
@ritvikmath 11 месяцев назад
Wonderful!
@699ashi 3 года назад ⁺²
I believe this is the best channel I have discovered in a long time. Thanks man.
@stelun56 3 года назад
The lucidity of this explanation is commendable.
@DistortedV12 3 года назад ⁺³
I watch all your vids in my free time. Thanks for sharing!
@venkatnetha8382 3 года назад ⁺¹
For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
payhip.com/b/ndY6
You can download the sample pages so as to see the quality of the content.
@Ni999 3 года назад ⁺²
That last blue equation looks more straightforward to me as -
= [n/(n-1)] [σ²-σ²/n]
=[σ²n/(n-1)] [1-1/n]
=σ²[(n-1)/(n-1)] = σ²
... but that's entirely my problem. :D
Anyway, great video, well done, many thanks!
PS - On the job we used to say that σ² came from the whole population, n, but s² comes from n-1 because we lost a degree of freedom when we sampled it. Not accurate but a good way to socialize the explanation.
@junechu9701 Год назад
Thanks!! I love the way of saying "boost the variance."
@ritvikmath Год назад
Any time!
@cadence_is_a_penguin Год назад
been trying to understand this for weeks now, this video cleared it all up. THANK YOU :))
@neelabhchoudhary2063 10 месяцев назад
dude. this is amazingly clear
@vvalk2vvalk 3 года назад ⁺⁴
What about n-2 or n-p, howcome more estimators we have the more we adjust? How does it exactly transfer intro calculation and ehat is the logic behind it?
@ChakravarthyDSK 2 года назад
Please do one lesson on the concept of ESTIMATORs. It would be good if the basics of these ESTIMATORs is understood before getting into the concept of being BIASED or not. Anyways, you are doing extremely good and you way of explanation is simply superb. clap.. clap ..
@tyronefrielinghaus3467 11 месяцев назад
Good intuitive explantation,,,thanksd
@subhankarghosh1233 7 месяцев назад
Marvelous... Loved it...❤
@ritvikmath 7 месяцев назад ⁺¹
Thanks a lot 😊
@user-mg2me7tg6v 5 месяцев назад
Th last section is so helpful thank you！
@ritvikmath 5 месяцев назад
Glad it was helpful!
@AbrarAhmed-ox2fd 3 года назад
Exactly what I have been looking for.
@yassine20909 2 года назад
Now it makes total sense. Thank you 👏👍
@kvs123100 3 года назад ⁺²
Thanks for the great explanation! But one question! why minus 1? Why not 2? I know the DoF concept would come over here! but all the explanation I have gone through, they have fixed the value of the mean so as to make the last sample not independent!
but in reality as we take samples the mean is not fixed! It is itself dependent on the value of the samples! then DoF would be number of samples itslef!
@musevanced 3 года назад ⁺¹⁵
Great video. But anyone else feel unsatisfied with the intuitive explanation? I've read a better one.
When calculating the variance, the values we are using are x_i from 1 to n and x_bar. Supposedly, each of these values represents some important information that we want to include in our calculations. But, suppose we forget about the value x_n and consider JUST the values x_i from 1 to (n-1) and x_bar. It turns out we actually haven't lost any information!
This is because we know that x_bar is the average of x_i from 1 to n. We know all the data points except one, and we know the average of ALL of the data points, so we can easily recalculate the value of the lost data point. This logic applies not just for x_n. You can "forget" any individual data point and recalculate it if you know the average. Note that if you forget more than one data point, you can no longer recalculate them and you have indeed lost information. The takeaway is that when you have some values x_i from 1 to n and their average x_bar, exactly one of those values (whether its x_1 or x_50 or x_n or x_bar) is redundant.
The point of dividing by (n-1) is because instead of averaging over every data point, we want to average over every piece of new information.
And finally, what if we were somehow aware of the true population mean, μ, and decided to use μ instead of x_bar in our calculations? In this case, we would divide by n instead of (n-1), as there would be no redundancy in our values.
@cuchulainkailen 3 года назад ⁺²
Right. The phraseology is this: the system has only n-1 degrees of freedom when you use xbar. ...Xbar has "taken it away".
@richardchabu4254 3 года назад ⁺¹
well explained very clear to understand
@GauravSharma-ui4yd 3 года назад ⁺²
Amazing...
@venkatnetha8382 3 года назад
For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
payhip.com/b/ndY6
You can download the sample pages so as to see the quality of the content.
@martinw.9786 2 года назад
Great explanation! Love your videos.
@missghani8646 2 года назад ⁺²
this is how we can understand stats not by just throwing some number to students
@DonLeKouT 3 года назад ⁺¹
Try explaining the above ideas using the degrees of freedom.
@cuchulainkailen 3 года назад
correct.
@venkatnetha8382 3 года назад
For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
payhip.com/b/ndY6
You can download the sample pages so as to see the quality of the content.
@jeffbezos4474 2 года назад
you're hired!
@braineater351 3 года назад
I wanted to ask a question. For E(x bar), x bar is calculated using a sample of size n, so is E(x bar) the average value of x bar over all samples of size n? Other than that, I think this has been one of the more informative videos on this topic. Additionally, many times people tie in the concept of degrees of freedom into this, but usually they show why you have n-1 degrees of freedom and then just say "that's why we divide by n-1", I understand why it's n-1 degrees of freedom, but not how that justifies dividing by n-1. I was wondering if you had any input on this?
@nelsonk1341 Год назад
you are GREAT
@soumikdey1456 2 года назад
just wow!
@chonky_ollie 2 года назад
Great video, thanks!
@nguyenkimquang0201 Год назад
Thank you for great content!!!❤❤❤
@ritvikmath Год назад
You are so welcome!
@chinmaybhalerao5062 2 года назад
I guess second approach for n-1 explanation will be right when both population and sample will follow same distribution which is very rare case.
@EkShunya Год назад
good one
@mm_ww_2 3 года назад
tks, great explanation
@Set_Get 3 года назад
Thank you. Could you please do a clip on Expected value and it's rules and how to derive some results.
@AmineChM21 3 года назад
Quality video , keep it up !
@alexandersmith6140 10 месяцев назад
Hi @ritvikmath, I want to understand those derivations in the red brackets. Do you have a good set of sources that will explain to me why those three expected values return their respective formulae?
@yitongchen75 3 года назад ⁺¹
is that because of we lose 1 degree of freedom when we used the estimated mean to calculate the estimated variance?
@cuchulainkailen 3 года назад
Correct. It's NOT as author states, that the Variance is boosted.
@venkatnetha8382 3 года назад
For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
payhip.com/b/ndY6
You can download the sample pages so as to see the quality of the content.
@jingsixu4665 2 года назад ⁺¹
Thanks for the explaination from this perspective. Can u talk more about why 'n-1'? I remember there is something with the degree of freedom but I never fully understand that when I was learning it.
@samtan6304 2 года назад ⁺³
I also had this confusion when I first learned it. Say you have a sample with values 1,2,3, Now, you calculate the sample variance. The numerator will be [(1 - 2) + (2 - 2) + (3 - 2)]. Notice in this calculation, you are implicitly saying the sample mean must be 2, because you are subtracting every value by 2. Using this implicit information, you will realize that one term in the numerator cannot vary given the other two terms.
@plttji2615 2 года назад
Thank you for the video, can you help me how to prove that is unbiased in this question? Question: Compare the average height of employees in Google with the average height in the United States, do you think it is an unbiased estimate? If not, how to prove it is not mathced?
@BigHotCrispyFry 3 года назад
good stuff!
@pranavjain9799 Год назад
You are awesome
@ritvikmath Год назад ⁺¹
Thanks you too!
@user-or7ji5hv8y 3 года назад
Great video but still not convinced on the intuition. How do you know that the adjustment compensates for missing tail in sampling? And if so, why not n-2, etc? I guess, if anywhere there would be missing data, it would be in the tail.
@yezenbraick6598 2 года назад
yes why not n-2 Jamie Walker's comment explains it in another way check that out
@prof.g5140 2 года назад ⁺¹
incorrect intuition.
this is more accurate: ideally the actual sample mean equals the population mean, however the actual sample mean is rarely ideal and there's an error amount. if the sample is more concentrated on lower values, then the sample mean will be lower than the population mean. since the sample is concentrated on lower values and the sample mean is also lower, the differences between the samples and the sample mean will mostly be lower than the samples and the population mean thus lowering the sample variance. if the sample is instead concentrated on higher values, then the sample mean will be higher than the population mean. since the samples are concentrated on higher values and the sample mean is higher than the population mean, the distance between the samples and the sample mean will mostly be higher than the differences between the samples and the population mean thus lower the sample variance. whether the sample is concentrated on lower or higher values (not concentrated is unlikely for small sample sizes), the sample variance (using n as denominator) will prob be lower than the population variance. therefore, we need to add a correction factor.
@mohammadreza9910 7 месяцев назад
useful
@asifshikari Год назад
Why n-1...we could adjust even better by doing n-2
@Titurel 8 месяцев назад
4:38 You really should give links to the derivation otherwise we still feel it's hand wavy
@jtm1283 7 месяцев назад
Two criticism (of an otherwise very nice video): 1. all the real work in the proof is done by the formulae in black on the right, for which you provided no explanation; and 2. to talk about sample sd without mentioning degrees of freedom seems incomplete. WRT to the latter, just look inside the summation and ask "how many of these are there?" For the mean, there are n different things (the x-sub-i values), so you divide by n. For sample sd there are n things (the x-sub-i values) minus 1 thing (x-bar), so it's n-1.
@yepitsodex 10 месяцев назад
the 'we need it to be slightly smaller to make up for it being a sample and not the population' argument isnt needed or realistic. Having n-1, regardless of the size of the sample, says that the one is completely arbitrary just to tweak it the smallest amount. in reality when you go to the sample space from the population space, you lose exactly one degree of freedom. It seems like thats why its n - 1 and not n-2 or something else. if you had all of the sample space numbers except for one of them, the value of the last one would be fixed, because it has to average out to the sample variance. Since it cant be just anything, that is a loss of a degree of freedom, which justifies the use of n-1
@gianlucalepiscopia3123 3 года назад
Never understood why "data science" and not "statistics"
@thomaskim5394 3 года назад ⁺¹
You still are not clear why we use n-1 instead n in the sample variance, intuitively.
@jamiewalker329 3 года назад
See my comment.
@thomaskim5394 3 года назад
@@jamiewalker329 I have already seen a similar argument like yours.
@cuchulainkailen 3 года назад
@@jamiewalker329 It's convoluted. The answer is what I posted. # of degrees of freedom is reduced to n-1 by use of xbar.
@venkatnetha8382 3 года назад
For a 1200 long pages of question bank on real world scenarios to make you think like a data scientist. please visit:
payhip.com/b/ndY6
You can download the sample pages so as to see the quality of the content.
@thomaskim5394 3 года назад ⁺¹
@@venkatnetha8382 What are you talking about?
@alexcombei8853 3 года назад
@tooirrational 3 года назад
Bias is not the factor that is used to deside the best estimates...its Mean Squares Error...n-1 is used because error is low not because its unbiased
@rhke6789 10 месяцев назад
Ah. Learning is i the details. You just skipped over "not interesting" that permits the logic to flow. Not good, Even mentioning the names of the quoted formulas you used but not explain be helpful.... variance decomposition formula or the deviation square formula

Следующие

Автовоспроизведение

Curse of Dimensionality : Data Science Basics