Why Sample Variance is Divided by n-1

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024

Комментарии • 87

  • @abhinandandey_ricky
    @abhinandandey_ricky 2 года назад +41

    n-1 is actually degree of freedom
    Why it used for sample s.d or variance?
    The sample mean is supposed to be equal or closest to population mean. But you have the freedom to choose the samples randomly.
    So, to keep the sample mean equal to the population mean you can change (n-1) numbers of sample.
    Suppose, there are 5 samples, you can change only (5-1) = 4 sample... Because the only sample you are not changing will keep the sample mean closest or equal to population mean.

    • @kumudsharma007
      @kumudsharma007 2 года назад +7

      This was the correct explanation. Crish has explained it very wrong. I think he just had copied this from someone else without even understanding himself.

    • @tushartiwari7929
      @tushartiwari7929 Год назад +1

      Can someone pin this to the top.
      As this is correct explanation.

    • @paulaugustus4671
      @paulaugustus4671 Год назад +1

      This is correct explanation, its because of degrees of freedom.

    • @vigneshStack
      @vigneshStack 8 месяцев назад

      Bro if you don't mind can you explain me

    • @wyburp7970
      @wyburp7970 6 месяцев назад +1

      ​@@vigneshStack This question hunted me for the past few weeks. I havent find one contemporary who actually explained it well (and, therefore, for my standards, understand it). Basically, Fisher used it a 100 years ago and Walker (1940) explained it rather clearly (search walker degree of freedom ; there's a guy who retranscribed it. his version is scuff but otherwise you have to pay for her article). Fisher wanted to estimate with precision the population mean and variance from a sample. To do, he imagined an infinite population. In brief, Fisher used N-geometry to do calculations ; he allocated a dimension for sample values possibilities (1 sample = 1 axis) for a total of "N" dimensions. Therefore, every sample had infinite value in their realm. However, if you set a mean, they get stuck to move together around the value of the mean. By doing so, their "degree of liberty" was dependant of 1 value in a sample, which reduce their degree of liberty by 1 (N-1). In the same way a train exist on a straight path, you can know its position with only one coordinate even though it also exist in a 2D system (excluding height and time). See the article of Walker 1940. There's also the article of JL Rodgers 2019 (degree of freedom at the start of the second 100 years) that gives great insights of the manipulation of degree of freedom complementarely to Walker.
      Otherwise, there's the concept that a sample is always sampler than the population. Therefore, you never ever can be sure of the mean/average of the population estimated from a sample without a) collecting everysingle sample of the population or b) using mathematics to calculate the value toward which the samples tend toward (notion of limits). In those formulas, there's the explanations of the apperance of 1/(n-1).
      TLDR : The reason why it has a 1/(n-1) in the variance formula has basically been forgotten by the majority of the majority of the people who should know it.
      www.studocu.com/row/document/jamaa%D8%A9-alkahr%D8%A9/faculty-of-graduate-studies-for-statistical-research/degrees-of-freedom/43598306?fbclid=IwAR0ujzGHqcqm6DL-eRKbZs1LbcP_X8XdO5KR_8eOxoY8JEKlp7fqJV4xWdg_aem_AZwtXgCL3q6XBDsNS4z_aOCNDQTeGcaIpA76yrROzeoq9KlY7EpBL_R5ZtOcoPVl8GoR6JrgtEd1xQgvHjvUzOfc

  • @bankimdas9517
    @bankimdas9517 3 года назад +40

    Dividing by 'n' underestimates the true variance so to correct the bias we use (n-1) in denominator.

  • @Zaheer-r4k
    @Zaheer-r4k Год назад +16

    Answer :
    The calculations for both the sample standard deviation and the sample variance both contain a little bias (that’s the statistics way of saying “error”).
    Bessel’s correction (i.e. subtracting 1 from your sample size) corrects this bias.
    In other words, you’ll usually get a more accurate answer if you use n-1 instead of n.

    • @Callmeflamee
      @Callmeflamee 10 месяцев назад +2

      if both contain errors then why do we only subtract from sample and not the population?

    • @darkclaw12
      @darkclaw12 5 месяцев назад

      @@Callmeflamee both indicates to both the sample sd and the sample variance

    • @DarkPrincess_M
      @DarkPrincess_M 4 месяца назад

      Doesn't n-1 overestimate the variance?

    • @calebsteinmetz9471
      @calebsteinmetz9471 2 месяца назад +1

      @@Callmeflamee Because the mean could be over or under estimated, but given how variance is calculated it will always be under estimated.

    • @calebsteinmetz9471
      @calebsteinmetz9471 8 дней назад

      Actually it isn't always under estimated, but it general is.

  • @binarystar4947
    @binarystar4947 2 года назад +35

    I came here by searching the ans to this question from your live day 2 - basic to intermediate statistics video and got my ans ✨

    • @talkswithRishabh
      @talkswithRishabh 2 года назад

      ++

    • @user-jk1gb7wm6z
      @user-jk1gb7wm6z 2 года назад +1

      What if we pick all sample values from right side ? n +1 ???

    • @ashishvinod2193
      @ashishvinod2193 Год назад

      @@user-jk1gb7wm6z it is gives same bcz the sample values that you chose from the right side doesn't give Approximatly mean compare to poulation mean their is big difference that's why..
      if n-1 is small then sample variance is large that's why..

  • @bharratkhanna6096
    @bharratkhanna6096 Год назад +8

    The exact reason is bias introduced by sample mean since it is an estimated value of population mean and because sum of deviations should be 0, hence this constraint restricts the freedom of data points, hence to counter the bias and constraints introduced we use n-1, remember sample variance is unbiased but sample std deviate still is biased because of concave functionality of square root introducing a negative bias and n-1 is a linear function which fails to correct sample std to a level as good as sample variance. Remember this is not experiment this is mathematically proved and Bessel correction(n-1) is not used when population mean is present.

    • @Touristtt4028
      @Touristtt4028 Месяц назад

      I have some doubts regarding this. How can I reach out to you? Could you provide your LinkedIn profile link if that's not a problem?

  • @pratikmanghwani7417
    @pratikmanghwani7417 3 года назад +8

    With all due respect you should explain why it is unbiased with the math behind it. Thanks

  • @arnabghosh2818
    @arnabghosh2818 3 года назад +5

    Theoritically in core statistics there is another justification related to degrees of freedom.

    • @Sean-oh3ph
      @Sean-oh3ph 3 года назад +5

      this is the correct answer

    • @Kumbutranjaami
      @Kumbutranjaami 2 года назад +2

      Right. There is a math behind why its n-1. that's not just trial and error thing explained in this video.

    • @jonpit4342
      @jonpit4342 2 года назад +1

      And the core question is why we divide by the df. No one has explained that away

    • @user-jk1gb7wm6z
      @user-jk1gb7wm6z 2 года назад +1

      what happens if we pick value from right side ? we will get only higher numbers values... then ? .... n +1 ??

    • @Kumbutranjaami
      @Kumbutranjaami 2 года назад

      @@user-jk1gb7wm6z Its not about right or left side numbers. Because the metric is calculated by subtracting the value from mean. 2 - 1 = 1 and 10001 - 10000 = 1 right? Are you able to understand this maths?

  • @abcefg7045
    @abcefg7045 Год назад +10

    if we take right-skewed data, then dividing with n - 1 will create more difference ?

    • @uditkumar370
      @uditkumar370 Месяц назад

      I got the same question, do you find the answer yet??

  • @snehalhon
    @snehalhon 2 года назад +1

    After watching each video lecture of your sir ....i m getting my concept more clear ...no words for you ...u r great

  • @sumangupta871
    @sumangupta871 2 года назад +12

    What happen if someone choose 5 samples from the right. Why we are not dividing by n+1 in that case?

    • @priyanshujain5286
      @priyanshujain5286 2 года назад +6

      still it should be divided by (n-1), you can try it with some example. The reason is that once you choose 5 samples from right, their mean will also shift towards right and the difference of the data point and mean would become low value

    • @himanshumaurya4737
      @himanshumaurya4737 Год назад

      @@priyanshujain5286 exactly

  • @anivesh2225
    @anivesh2225 Год назад

    Thanks for this krish, So to negate the bias factor for sample variance w.r.t to population variance we divide by n-1, as it is a tested value by the statistician amd also it is a bessel correction

  • @modemnaveen6240
    @modemnaveen6240 2 года назад +4

    According to logic explained in this video , the sample mean and population mean can be different based on samples we are picking right ? So why we are just under estimating variance . Why not mean ? Can some one please explain

    • @APaleDot
      @APaleDot 2 года назад +4

      The sample mean might be higher or lower than the population mean, depending on what samples you happen to pick. So, in the long run if you average the sample means, you get closer to the population mean. However, when calculating the sample variance, you use the sample mean (because you don't know the population mean) and the samples will always be closer to the sample mean than the population mean, because the sample mean is precisely that value which minimizes the variance for those particular samples. Therefore, the sample variance tends to be lower than the population variance using this method.

    • @HyperDangerousThing
      @HyperDangerousThing Год назад

      @@APaleDot *and because the sample variance is way lower than the population variance, because of the use of the sample mean for determining the variance (say with just "n" in the formula in the denominator, NOT "n-1") People found out that (n-1 in the denominator) is bringing the samples variance closer to that of the population variance (approximating it), since the resulting Worth of the Variance is getting bigger (because you're dividing with a smaller number). But I still don't get the bigger picture, why this slight approximation is so important in the long run for statistics. Just why on earth is it so important for the interpretation of the result later, that the tiny growth of sample variance (trough n-1) approximates that of the population variance....

  • @iwatchtvwithportal5367
    @iwatchtvwithportal5367 Год назад +1

    But you failed to explained how that n-1 comes from.

  • @mahammadodj
    @mahammadodj 2 года назад +3

    starts at 2:30

  • @andrew.schaeffer4032
    @andrew.schaeffer4032 Год назад

    great explanation thanks. I wish my statistics book talked about this.

  • @thevoiceofdarkness7655
    @thevoiceofdarkness7655 Год назад +2

    This may be a silly question, but why do we assume it is more likely for my sample to be skewed below the mean than above?

  • @snehaagarwal7640
    @snehaagarwal7640 Год назад +1

    but what if we take the data on the right side of population variance and divide that by n-1...that will be inaccurate then

  • @me_debankan4178
    @me_debankan4178 2 года назад

    let's assume a city has a population and more than 50 % are 80 yrs old but in the time of sampling or surveying we can possibly get biased data which has a sample mean which is around 40-50 yrs and it can cause a problem during analyzing the data because our biased data showing population age mean is around 50 yrs .. and variance is more because of the squaring factor ... but we can eliminate this problem by dividing the variance by n-1

    • @me_debankan4178
      @me_debankan4178 2 года назад

      this usually happens when you are surveying a few thousand within millions of people

    • @mahiraj8522
      @mahiraj8522 2 года назад

      nice man... thank you

  • @sudiptasen634
    @sudiptasen634 2 года назад +3

    Hi Krish, Thank you for helping us to understand the concept. I had one question.
    When we try to calculate Popultion variance or Population Standard Deviation, it is always small/less than Sample Variance or Sample Standard deviation (tried with excel formula). Is this because of the Bessel's correction that the Sample Std. dev is always greater than Population Std. Dev?

  • @annamelody5724
    @annamelody5724 9 месяцев назад

    Hi, i have a line of data which consists of these numbers {4, 3, 5, 6 ,4, 5, 7,6,5,4} and i have Mean = 4,9 and the variance of 1.4333. My question is, is this variance considered high or low ?

  • @-Neutron-Star
    @-Neutron-Star 9 месяцев назад

    so they use "n-1" just based on empirical research and not based on some hardcore mathematics? why not use "n+1"?

  • @keyyyla
    @keyyyla 3 года назад +1

    Simple answer is: because dividing by n-1 makes the estimator unbiased.

    • @Kumbutranjaami
      @Kumbutranjaami 2 года назад

      It's not that easy.

    • @prithvidhyani2002
      @prithvidhyani2002 Год назад

      This doesn't explain anything. What's the bias? That's what people are asking. And neither the video nor your comment addressed that.

  • @Dilaram123
    @Dilaram123 6 месяцев назад

    Brother sorry but you havent mention about the degrees of freedom which is the actual staistical reason behind this division of n-1 degrees of freedom.

  • @karthikeyanr1804
    @karthikeyanr1804 2 года назад

    # in simple words:
    Both have a slight bias when calculating the sample standard deviation and sample variance. so we do n-1 to correct the bias

  • @Ezio-ft8zm
    @Ezio-ft8zm 9 месяцев назад

    poonam kumari se irritate hogya tha and so i landed up here before my stats exam ! feels so good to learn a concept which i didnt even understand after listening two times in a 45 min class !

  • @harikrishna220
    @harikrishna220 2 года назад +1

    Why can't we use n-1 for mean and all

  • @file4318
    @file4318 Год назад

    Thank you very much for your video, it was very very good at explaining. But I have one more question, If descriptive statistics do not try to generalize to a population (since there is no uncertainty in descriptive statistics), then why does the sample standard deviation try to best estimate the population mean? Yet it is still considered a descriptive statistic

  • @anilkumarsharma8901
    @anilkumarsharma8901 2 года назад

    Bell curve ki koi fix height hotee hain Kya ????
    Statics key sarey formula ek app banva do jo sarey formula ka database mil jayega ???

  • @abhi-zc8ub
    @abhi-zc8ub 3 года назад +4

    From morning i was trying to understand this but i didn't find any clear explanation 😅 finally a video from krish sir 😌 thank you so much sir 😇

  • @ViolinCineMusic
    @ViolinCineMusic 8 месяцев назад

    super explanation loved it

  • @-isotope_k
    @-isotope_k 2 года назад +1

    Thanks !!!

  • @priyaljain5274
    @priyaljain5274 Год назад

    This is really helpful Mr Krish 👍😊

  • @KartikKuri-qe6ye
    @KartikKuri-qe6ye Год назад

    thank you

  • @rubayetalam8759
    @rubayetalam8759 Год назад

    thanks

  • @nithink94
    @nithink94 3 года назад

    Thank you for this video.

  • @raj-nq8ke
    @raj-nq8ke 3 года назад

    Thanks.

  • @mrrishiraj88
    @mrrishiraj88 3 года назад +1

    Hello Krish

  • @vankadavathrohith1589
    @vankadavathrohith1589 8 месяцев назад

    tq so much :)

  • @KisaanTuber
    @KisaanTuber 3 года назад +1

    well explained

  • @gunjanagrawal8626
    @gunjanagrawal8626 2 года назад +5

    This could be understood well from the simulation video used by the Khan Academy.

  • @Rana-yc6yt
    @Rana-yc6yt 3 года назад

    I appreciate what you doing for free but I would really love to better view of your presantation . Honestly your hand writting thing really mess up when watching videos.

  • @aashishmalhotra
    @aashishmalhotra 2 года назад

    got it thanks

  • @adityams1659
    @adityams1659 3 года назад +2

    WHICH SOFTWARE IS THAT !??

  • @DoingMyIkigai
    @DoingMyIkigai Год назад +2

    wasted 10 mins of life ---- not exzact reason given

  • @josephravi7722
    @josephravi7722 Год назад

    additional information on how it can be seen from the perspective of degrees of freedom - ruclips.net/video/9ONRMymR2Eg/видео.html

  • @dipayanbhadra8332
    @dipayanbhadra8332 Год назад

    "Researchers experimented and saw n-1 gives good estimates"- I am not satisfied with this explanation. If this were an interview question, would the interviewer be happy with this and? Poor explanation.

  • @tanmaygupta8288
    @tanmaygupta8288 Месяц назад

    Sir aap bs ghuma rhe aur kuchh bhi bol rhe in this video 😂

  • @kumudsharma007
    @kumudsharma007 2 года назад

    Cris please remove this video as soon as possible if you want save your reputation. This is completely teaching students very wrong concept.

  • @MeghaDeySarkar
    @MeghaDeySarkar 3 месяца назад

    Best explanation - ruclips.net/video/ke8nSbXUJjQ/видео.html