Dividing By n-1 Explained

Поделиться
HTML-код
  • Опубликовано: 26 фев 2023
  • In this video I answer the common question of why we divide by n-1 when calculating variance from a sample, known as Bessel's Correction. I focus on conceptual understanding of why this adjustment is needed and why n-1 is the appropriate adjustment on average, rather than making up a population and possible samples to illustrate this. I show why x-bar (the mean of the sample) tends to underestimate the squared deviations, then provide 2 arguments for why n-1 adjusts for this; one based on degrees of freedom, and the other based on trying to estimate the average amount of bias of the sample variance.

Комментарии • 69

  • @shushantgambhir2002
    @shushantgambhir2002 5 месяцев назад +6

    Haven't seen someone nail it better.
    Thanks a lot.

  • @augierakow
    @augierakow 9 месяцев назад +4

    Best explanation I’ve found so far. Thank you sir.

  • @GabriEla-eb1sn
    @GabriEla-eb1sn 7 месяцев назад +4

    Your explanation is very clear! Thank you so much for your effort and willingness to share your knowledge with us!💚🙏

  • @brazhell
    @brazhell 23 дня назад

    I want to thank you for that information, clear and precise.

  • @NotEasyDude
    @NotEasyDude 5 месяцев назад +3

    Hands down,the best explanation

  • @samircanoiriarte7646
    @samircanoiriarte7646 7 месяцев назад +2

    Finally!!! This is the best explanation on the topic. Thank you so much

  • @statjackson
    @statjackson 25 дней назад

    Great proof. Thank you for the perspective.

  • @stuck-in-a-catch-22
    @stuck-in-a-catch-22 6 месяцев назад +2

    Great explanation, concise and easy to follow - explains missing gaps in an intuitive manner (helps significantly if you’re a curious/inquisitive individual lol).

  • @robomatt1600
    @robomatt1600 7 месяцев назад

    Thanks! This is one of the best explanations I have found so far for Bessel's Correction.

  • @sharan9993
    @sharan9993 Месяц назад

    The under estimating explanation was the best intuitive guide to why we need to adjust. Thank you very much. Hope i can learn other statistics lessons from your channel.

    • @PsychExamReview
      @PsychExamReview  Месяц назад

      You're welcome, glad to hear it was helpful!

  • @CliffSedge-nu5fv
    @CliffSedge-nu5fv Месяц назад +1

    This did better in 14 minutes what other videos try to do in 30 or more minutes.

  • @aviroopmitra5353
    @aviroopmitra5353 6 месяцев назад

    very clear explanation! Got the logic, thank you very much

  • @garykeyvan662
    @garykeyvan662 3 месяца назад

    Almost magical, Thanks a lot you saved my day good Sire.

  • @qkloh6804
    @qkloh6804 Месяц назад

    This should be the standard explaination in all classes.

  • @forniteguruji9409
    @forniteguruji9409 3 месяца назад

    Thank you so much!!!! You are to the point!

  • @sksahil4997
    @sksahil4997 3 месяца назад

    Great Intuitive Explanation. Thanks.

  • @ct8veylm3kzj68
    @ct8veylm3kzj68 15 дней назад

    Perfect clear. Thanx. Gracias

  • @preetiprajapat536
    @preetiprajapat536 4 месяца назад

    wow that is a clear Ans of this, Amazing💯

  • @jai_radhekrishna
    @jai_radhekrishna 2 месяца назад

    Best explanation 🙌🏻🙌🏻

  • @faresmhaya
    @faresmhaya 5 месяцев назад +2

    I don't know what to say. I've been looking for an intuitive understanding for this and all that the data science channels can do is just say "degrees of freedom" without elaborating any further. I didn't expect a Psychology channel of all things to give me a better answer, but here we are!
    I don't know if I understand everything perfectly, bias=sigma²/2 wasn't fully clear to me, but I read your long ass answer in a comment below and it made it a bit more clearer, and I have nothing but respect for you for actually taking the time to write that answer.
    I have to ask though, why do we assume that n would go to infinity (taking the bias to zero) if we know that any population would have to have a finite number of elements?
    Edit: I guess some populations would be represented by continuous functions, so technically they can have an infintie number of elements, but that's not always the case.
    Secondly, if you can make another video, I'm curious to know why X bar is an unbiased estimator even though X bar will never be exactly equal to Mu.

    • @PsychExamReview
      @PsychExamReview  5 месяцев назад +2

      Glad it was helpful and that I was able to explain it more clearly than some other places. I'm also glad that taking the time to write that long ass explanation of the average bias was helpful to another person 😅
      For n going to infinity, it probably would have been better for me to say as the sample n gets as large as possible; you're right that often the population would be finite. Once we have the full population N then the bias would be zero and we would no longer need to make Bessel's correction because in that case x-bar would be equal to mu.
      I do plan to make a video about the sampling distribution of the mean, which relates to your question. X-bar is an unbiased estimator for mu because the distribution of x-bar is centered around mu. So even though each individual sample x-bar may not equal mu, the average of all possible x-bars from all samples will be equal to mu. Hope this makes sense!

    • @faresmhaya
      @faresmhaya 5 месяцев назад

      @@PsychExamReview Ah, that's referring to the notion of the "expected value" I assume. Yeah I can see how that works. Although if I'm being honest, I'm not sure if I intuitively understand why the same reasoning doesn't work for all other estimators (variance included). Why doesn't the average of all possible estimated varianes from all samples equal sigma²? I mean don't get me wrong, I get it with your explanation, but I don't know how to debunk the expected value reasoning without referring to your explanation and it feels like I should. Is it because estimated variances are inherently built on X bar, which is already an estimator, and that plagues them with an error that the average of all estimated variances cannot metigate? If my assumption is correct, can we conclude that any estimator that is built on top of other estimators is inherently biased?

    • @PsychExamReview
      @PsychExamReview  5 месяцев назад +1

      @@faresmhayaThat's exactly right, it's an unbiased estimator because the expected value is always equal to mu, even though individual samples may not be, and this is true even if the population the samples are drawn from is not normally distributed, due to the central limit theorem.
      But the distribution of the sample variance doesn't follow the CLT; it isn't normal and it depends on the population, so we can't really know the expected value without making assumptions (such as assuming the population is normally distributed).
      Basically the problem is that the amount of bias in our estimate of the variance depends on how much our x-bar differs from mu, but we don't know mu, and so our estimate of the bias due to x-bar also involves using x-bar. So we make some assumptions and use n-1, to try to get our expected value for the sample variance closer to the population variance, but it's not guaranteed that it will.
      I'll try to cover this in more detail in future videos, but generally you're right that whenever we have to use an estimated value in a calculation in place of a parameter (like using x-bar for mu here) this introduces bias or uncertainty into the result that we will need to correct for, usually by reducing degrees of freedom (just as we did here, replacing n with n-1). Hope this helps!

  • @PauloPereira-wl7qf
    @PauloPereira-wl7qf 6 месяцев назад

    Dang, that was really helpful! Thank you very much!!!!!

  • @sarthak-salunke
    @sarthak-salunke 3 месяца назад +1

    brother i m beginner please help me understand concept .what i know and thinking
    underestimate is value less than actual value and overestimate is above actual value ,
    in video we underestimated and overestimated deviation of sample mean so when we consider value close to sample mean will give us low variance then actual when we consider entire set of dataset in general to estimate value under under or over category and when we observe bell curve for it specific sample most likely underestimate value fit in it is this you want to convey? does actual value refer to population mean ?

    • @PsychExamReview
      @PsychExamReview  3 месяца назад +2

      Yes, we want to know the population variance, but we can't calculate this value directly because we don't know the true population mean and we usually can't measure the entire population in order to find it. So we can only use x-bar, or the mean of our sample. Using x-bar tends to give us an underestimate of the population variance so we increase this estimate slightly by dividing by n-1 instead of n. This doesn't guarantee a better estimate for all possible samples, but if we have a population that follows a normal distribution using n-1 will tend to give us a better estimate on average.

  • @alaminpriyodu
    @alaminpriyodu 17 дней назад

    Love you MAN!!!❤

  • @Titurel
    @Titurel 5 месяцев назад +1

    Even though the equations are very clear I don’t understand intuitively why taking two samples on average cuts the bias in half or why taking 3 cuts it by a third and ten cuts it by 1/10. Please clarify.

    • @PsychExamReview
      @PsychExamReview  5 месяцев назад +3

      Sure, I'll try. We can think of this bias relating to how bad our estimate of mu (and thus our estimate of deviations) can possibly be.
      If we think about using 1 score to estimate mu, it could be anywhere on the full range of x in the population. Now we can ask how much adding a 2nd score would improve our estimate of mu and our deviations. If we assume a normal distribution, the probability of any one score being above or below mu is 50%, but the probability of selecting 2 scores on the same side is only 25%.
      And the lower the first score is, the greater the probability the second score will be above that score and improve the estimate (though it could over-correct in the other direction).
      If our first score happened to be the lowest possible value of x, then any 2nd score could only move our estimate of x-bar closer to the true population mean or keep it the same. We can't be any more wrong in our estimate of mu; getting the same extreme low value again wouldn't change x-bar, and even if the 2nd score were the highest possible value of x, this would just bring x-bar to the population mean because the distribution is symmetrical. So a 2nd score could improve our estimate up to being exactly correct, with 0% chance of over-estimating mu.
      But as the first score falls closer to the true population mean, the probability that a 2nd score will improve the estimate of x-bar decreases, and the possibility of worsening the estimate starts to increase (because it's more and more likely to get values below the first score, and high values could now over-correct, giving us estimates that are farther above mu than the first score was below it). But the maximum amount it could possibly worsen the estimate is cut in half because the 2nd score will only move x-bar half its distance from the 1st score.
      So if the first score happened to be the true population mean (which we wouldn't actually know), a second score could only keep this the same or worsen this a maximum of 50% in either direction compared to how bad the estimate could possibly be with only one score by itself.
      So then if we imagine all the different possible combinations of 2 scores and the probabilities of their pairings, having a second score will improve some estimates and worsen others (depending on how close the first score was to mu) but cut the occurrence of extremes in half. The extreme estimates are still possible (2 extremely low or high scores together) but these are half as likely to occur as they would be if we had only picked one score to estimate mu.
      This continues as n increases. If the current x-bar differs from mu, then the probability an additional score will shift it in the correct direction is always greater than 50% (because more than half of scores in the population will be above or below that estimate) while the amount it might over-correct gets smaller and smaller (the 10th score can only pull the mean a maximum of 10% of its distance from the estimate using 9 scores, etc.).
      To give a concrete example; if I had 2 people, one with an IQ of 100 and one with 140, my estimate of mu (assume it's 100 in the population) could be off by 40 using just the 2nd score, but only off by 20 using both scores. If I had an average of 100 from 9 people and added a 10th at 140, my estimate would only move to 104, compared to possibly being off by 40 if I had used the single 10th score by itself.
      And as we get to higher sample sizes, the possibility of drawing more extreme scores is very low because there just aren't that many scores there. If we have a sample size of 10,000 we won't get all extreme high/low values on one side not just because it's unlikely but because there just aren't 10,000 scores there in the population, so very extreme values for mu start to become impossible to select.
      Hope this helps!

    • @Titurel
      @Titurel 5 месяцев назад

      @@PsychExamReview Thank you for your wonderfully clear and thorough explanation. I was thinking it had to do with probability density but was having a hard time intuiting how each additional N would change sigma^2. You saved me a sleepless night. Ten points to Ravenclaw!

    • @PsychExamReview
      @PsychExamReview  5 месяцев назад +2

      @@Titurel Yeah, it's hard to intuit or imagine all possible cases, so I find thinking of the extreme cases and recognizing that the probability will fall between these on average can make it a little easier. Glad I could help, sleep is important 😂

  • @guillermootuama2616
    @guillermootuama2616 6 месяцев назад

    Great video, subscribing!😁

  • @adityapandey8319
    @adityapandey8319 4 дня назад

    why did we cut our bias by half when considering sample size of 2?

    • @PsychExamReview
      @PsychExamReview  3 дня назад +1

      @adityapandey8319 Great question, I didn't fully explain this in detail here as it has to do with probability. Essentially, the lower the first score, the greater the probability the second score will fall above that point, leading to a less extreme average. By taking the average of 2 scores, extreme scores are still possible but half as likely to occur. For a more detailed explanation and examples, you can see some of my other replies to comments here. Hope this helps!

  • @skynet_cyberdyne_systems
    @skynet_cyberdyne_systems 5 месяцев назад

    Excellent

  • @hannahminerva9439
    @hannahminerva9439 6 месяцев назад

    I could not understand why sample variance equals to population variation minus bias. Is it because the mean underestimates the sample variance so we have to add bias to the sample variance in order to reach population variance ?

    • @PsychExamReview
      @PsychExamReview  6 месяцев назад

      Yes, since the deviations tend to be underestimates using the sample mean instead of mu, the formula for the biased sample variance will likely be too low, so we could say the biased sample variance + the bias = the population variance, I've just started with a rearranged version of that equation

    • @hannahminerva9439
      @hannahminerva9439 6 месяцев назад

      @@PsychExamReview thank you very much!! This video and your reply helped me a lot :))

  • @eloy618
    @eloy618 2 месяца назад

    Could someone explain why when there are 2 sample scores, you divide your bias by 2?

    • @PsychExamReview
      @PsychExamReview  2 месяца назад

      I've written a more detailed explanation and examples in a reply to Titurel's comment below but the basic idea is that sampling 2 scores reduces the probability of extreme deviations from mu because both scores have to be extreme on the same side, otherwise they balance out and their average is pulled toward mu. The probability of one score being below mu is .5 but the probability of both scores being below mu is only .25 The more extreme one score is, the greater the probability the second score will move their average closer to mu, reducing bias in the estimate, and this continues as n increases because each single score's ability to pull the average away from mu is reduced. Hope this helps!

  • @foyzulhaque6773
    @foyzulhaque6773 2 месяца назад

    Just wow

  • @benlee3545
    @benlee3545 Месяц назад

    Hi Sir, at 2:27 I wonder you can elaborate this Overestimates and underestimate when the xbar is below or even higher than the population mean. I am not expert in this. So what you said may carry some implicit knowledge of this issue as you breeze through without much explanation.

    • @PsychExamReview
      @PsychExamReview  Месяц назад +1

      I probably should have given a concrete example to help make this clear.
      Let's imagine that mu, our population mean, is 10 (even though we can't really know this) and x-bar for our sample is 9. In this case, when we compare each score to x-bar to calculate the variance, any scores that are below 9.5 will be underestimates compared to the population mean, because they will be closer to 9 than 10. So a score of 8 is only 1 point away from x-bar, but if we were able to compare it to the population mean (10) it would be 2 points away. So the deviation from x-bar is smaller than from mu, meaning we have an underestimate.
      Scores above 9.5 would cause overestimates, because they are farther from 9 than from 10. But because x-bar is 9, more of the scores in that sample must be below 9.5, so we'll have more underestimates than overestimates when we add up all the deviations.
      The same situation occurs in the opposite direction if x-bar is greater than mu. If mu were 10 and our x-bar is 11, now scores above 10.5 would underestimate the deviation (closer to 11 than 10) and scores below 10.5 would overestimate (farther from 11 than 10). But again, for x-bar to equal 11 there must be more scores above 10.5, meaning more underestimates.
      The only time we won't have underestimates is when x-bar is exactly equal to mu, but since we can't actually determine mu we won't know when this has happened. It's rare, so we generally assume it hasn't happened and that we probably have an underestimate of the deviations.
      Hope this helps, thanks for commenting!

    • @benlee3545
      @benlee3545 Месяц назад

      @@PsychExamReview Thank you Sir. I think I get what you mean. But not sure whether you can advise. Since for some cases, we will not know the population mean, so when we should use the n-1? Based on what you said, it seems like for sampling, it is likely that the standard deviation calculated based on sampling will be smaller than the population deviation. Hence n-1 is used to adjust. But when we do not know the population mean, when shall we use n-1?

    • @PsychExamReview
      @PsychExamReview  Месяц назад +1

      @@benlee3545 Yes, in almost all cases we can't measure the entire population to know the population mean and we can only measure a sample, so that means that we will almost always use n-1 when calculating the variance.

    • @benlee3545
      @benlee3545 Месяц назад

      @@PsychExamReview Thank you Sir

  • @xcalibur070
    @xcalibur070 Год назад

    nicee

  • @xcalibur070
    @xcalibur070 Год назад

    can you do a vedio on probability distribution

  • @user-zx7vh6mp8n
    @user-zx7vh6mp8n 2 месяца назад

    I am sorry that I didnt understand what Mew and X bar are? I understand X Bar is the mean. Mew is what?

    • @PsychExamReview
      @PsychExamReview  2 месяца назад +1

      Mu is the population mean, or the true average of everyone in the population. But since we'd have to measure the entire population to know this, we usually don't know it and we only have an estimate from a sample of the population, which is x-bar.

  • @amanmishra9891
    @amanmishra9891 Месяц назад

    i think so you completely changed the sample and population variance and proved the wrong equation. i.e you proved population variance is been divided by n-1 which is not the case.
    can you explain it clearlt?

    • @PsychExamReview
      @PsychExamReview  Месяц назад

      The equation shows that the sample variance (which is probably an underestimate) + the bias (the average amount of underestimate) is equal to the population variance. I probably should have started with this form of the equation instead of starting with sample variance = population variance - the bias. This is the same equation but perhaps not as clear of a starting point. So it's showing that the biased sample variance * (n/n-1) = the population variance, which simplifies to the sum of squared deviations / n-1. This means that dividing the sample deviations from x-bar by n-1 will tend to equal the population variance, though we have to remember that this is based on estimating the average bias as sigma squared / n, which it may not actually be for a given sample, so it's only true on average, not in every case. Hopefully this makes things clearer.