Simulation showing bias in sample variance | Probability and Statistics | Khan Academy

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024
  • Courses on Khan Academy are always 100% free. Start practicing-and saving your progress-now: www.khanacadem...
    Simulation by Peter Collingridge giving us a better understanding of why we divide by (n-1) when calculating the unbiased sample variance. Simulation available at: www.khanacademy...
    Practice this lesson yourself on KhanAcademy.org right now:
    www.khanacadem...
    Watch the next lesson: www.khanacadem...
    Missed the previous lesson?
    www.khanacadem...
    Probability and statistics on Khan Academy: We dare you to go through a day in which you never consider or use probability. Did you check the weather forecast? Busted! Did you decide to go through the drive through lane vs walk in? Busted again! We are constantly creating hypotheses, making predictions, testing, and analyzing. Our lives are full of probabilities! Statistics is related to probability because much of the data we use when determining probable outcomes comes from our understanding of statistics. In these tutorials, we will cover a range of topics, some which include: independent events, dependent probability, combinatorics, hypothesis testing, descriptive statistics, random variables, probability distributions, regression, and inferential statistics. So buckle up and hop on for a wild ride. We bet you're going to be challenged AND love it!
    About Khan Academy: Khan Academy offers practice exercises, instructional videos, and a personalized learning dashboard that empower learners to study at their own pace in and outside of the classroom. We tackle math, science, computer programming, history, art history, economics, and more. Our math missions guide learners from kindergarten to calculus using state-of-the-art, adaptive technology that identifies strengths and learning gaps. We've also partnered with institutions like NASA, The Museum of Modern Art, The California Academy of Sciences, and MIT to offer specialized content.
    For free. For everyone. Forever. #YouCanLearnAnything
    Subscribe to KhanAcademy’s Probability and Statistics channel:
    / @khanacademyprobabilit...
    Subscribe to KhanAcademy: www.youtube.co...

Комментарии • 54

  • @PeterCollingridge
    @PeterCollingridge 12 лет назад +61

    Wow, I'm famous!

  • @Skandalos
    @Skandalos 11 лет назад +30

    damn I really wanted to see how the graphs would look like with the unbiased formula.

  • @MrUndergrounddweller
    @MrUndergrounddweller 12 лет назад +14

    How the hell can you know all this? it seems unreal how much knowledge you have on so many mathematical subjects....are you an Alien? serious question.

  • @user-gz5mx2nd5p
    @user-gz5mx2nd5p 4 года назад +3

    Unfortunately I am very confused now. Can you relate it with a real life example

  • @MrVpassenheim
    @MrVpassenheim 6 лет назад +8

    I like this video because it actually shows you not just conceptually, but with mathematical logic the reason for the "correction" done mathematically with df. Thanks! Yes, Peter, you're famous.

  • @fl45hman
    @fl45hman 8 лет назад +3

    This, right here, is why Khan Academy is the best resource for learning maths. Fuck all these overly convoluted academic textbooks

    • @philandthai
      @philandthai 7 лет назад +1

      Yes, this is very compelling, illustrative and memorable. Unfortunately, it is not rigorous and it doesn't really explain WHY.

  • @brunosimoes751
    @brunosimoes751 3 года назад +2

    Great visual explanation! Is there a video showing the analytical reason of n-1?

    • @sisyphus645
      @sisyphus645 2 года назад +1

      I looked everywhere for it. Couldn't find a lead. Is this the best the internet could offer? D:

  • @SupeHero00
    @SupeHero00 6 лет назад +4

    So it seems that n-1 technique was found empirically, right? I mean there's no real mathematical proof that says if you do this "this way" then it will be less biased but not too unbiased.

    • @Cashman9111
      @Cashman9111 6 лет назад

      just type proof of unbiased variance estimation

    • @SupeHero00
      @SupeHero00 6 лет назад +2

      Found it already but I hoped that khan academy would have a video proving this.. They have like 1000 videos about n-1 and the intuition behind it without any proofs

  • @MrRohammers
    @MrRohammers 4 года назад +1

    GALAXY BRAIN SIMULATION

  • @gmvsea
    @gmvsea 12 лет назад +2

    thank you---this is beautiful and so very well done---i've usually heard the n-1 for sample standard deviation/variance (s or s^2) explained in terms of degrees-of-freedom---you have n degrees of freedom for a sample of size n, but you also have one estimated parameter (the average, x-bar), so you subtract one degree of freedom from n degrees of freedom, giving n-1 degrees of freedom---it makes sense when you consider more complex statistics, like the chi-square, f-distribution test, ...

  • @AuroraClair
    @AuroraClair 5 лет назад +2

    I'm always so grateful for your videos. Thank you!

  • @ZoD1ACBeA5T
    @ZoD1ACBeA5T 12 лет назад +1

    13 minutes later and I'm one of the first comments? More people need to SUBSCRIBE!

  • @marvinlear5848
    @marvinlear5848 Месяц назад

    I was about to post that I have a suspicion that if he'd simulated using a strongly normal distribution instead of a random distribution, the effect he shows where sample size n=2 trends towards causing a multiplier of 0.5 that needs to be compensated for (by using "n-1" instead of just "n" in the denominator) would go away-and maybe even overestimate the parent/true population STDEV instead of underestimating it. And then I noticed on the original Khan Academy page that this video came from, that someone had already done a "spin off" simulation of this experiment (user Grant Webster) that was exactly what I was proposing. That person was able to use the code for this simulation because the code/simulation is posted at the Khan Academy site. So that person used a normal distribution, and guess what?! Just as I predicted, the differences between sample size completely went away and there was no underestimated STDEV, either. This means that everything in this video about the need to use "n-1" in the denominator is applicable only for non-normal distributions. Just now, to check, I've run it for myself on a somewhat normal distribution. I'm now up to 3000 simulations run and the n=2 and n=4 sample sizes are actually slightly _overpredicting_ STEDEV and are doing so moreso than the other sample sizes (which tend to average almost exactly 100% of the parent population STDEV, actually).
    I'm not convinced about the degrees of freedom explanation many people bring up, which you'll notice this video's narrator never mentions. The way I reasoned it out (i.e., before I found the "spin off" simulation to confirm my suspicion) is that it's not surprising the lower sample sizes in this video would show lower STEDEV because the population parent STDEV is so wide here, at ~±6.1, which is over half of the entire 0-20 range. First I'll talk about the scenario where the distribution is so random that the source population is quite flat across the entire 1-20 range and its STDEV is ±5, taking up half the range. Then I'll talk about the scenario where the parent population is normally distributed and so the STDEV is only ±2.
    First, considering small samples sizes on a nearly evenly distributed parent population...The only reliable way for smaller sample sizes, like n=2, to very significantly overestimate the true STDEV is if your first sample is out on the edge and your second sample is out on the opposite edge (aka, sample A occurs at value=1 & sample B occurs at value=20). But of course, getting values at 1 & 1, which underestimates STDEV, is just as likely to occur as at 1 & 20, so the overall effect is canceled out. But if your first sample point, sample A, is near a middle value=10 or 11, you can only underestimate the STDEV because the highest STDEV you can achieve is if sample B is out on the edge as far away from sample A as possible (e.g., you have values 10 & 20), giving you a STDEV=√((10-15)^2)/√2=3.53, which is less than the STDEV of the parent population. So even in that best case scenario, you'll still have underestimated the STDEV! And it's why the distribution shown in this video generally underestimates STDEV, with the effect being greatest for small sample sizes. Large sample sizes in this video do it too, but they do it relatively less because larger sample sizes tend to have a sort of 'averaging' effect. In other words, with every additional sample you add, it's less likely your samples will be able to maintain an extreme distribution pattern like 1 & 1 or 1 & 20 that produce a STDEV much different than the true parent population STDEV.
    But when your parent distribution is normal rather than random and evenly distributed, it's STDEV will be narrower (let's say ±2) because of the effect of having so many samples grouped close together near the middle of the entire range. In such a scenario, the case where the two samples occur out at the very edges isn't that important because that won't happen very often (by definition, they are at the tails of the probability distribution). At the center, where you're seeing most of your action (let's assume the mean is at exactly 10 to make the math easier), if your first sample is at 10, sample B will have to be at 12 (or 8) to correctly predict the STDEV of ±2. Within that range, toward the left, the average value will be maybe 8.75 and toward the right it will be 11.25 (instead of the more intuitive 9 and 11 because the shape of the normal distribution pushes the average values in that range out toward the tails of the normal curve just a bit), so you'll be underestimating the true STEDEV by 0.75 across that 40% of the entire range (i.e., because 8.75-8=0.75 and 12-11.25=0.75), and, keep in mind, that 40% of the range contains 68.3% of the samples because it's ±1 STDEV. Outside of that range, toward the tails, your average values will be maybe 6.4 & 13.6 (instead of the more intuitive 4 and 16 because the shape of the normal distribution will push the averages quite a bit closer toward the center of the entire range than where they'd be otherwise), so you'll overestimate the STDEV by 1.6 over that remaining 60% of the range that contains the remaining 31.7% of the samples (because 8-6.4=1.6 and 13.6-12=1.6). Well, 31.7%*1.6 + 68.3%*(-0.75) = -0.005, which is basically just zero. Therefore, no underestimation of the STDEV, even without the extra "minus 1" in the denominator, even for a sample size as low as n=2.

  • @santoshsahu-oy5vo
    @santoshsahu-oy5vo 7 месяцев назад

    I never have seen this type of teacher for statistics and math...
    It's Amazzzzzzzzzzzzig...
    Dear Khan Academy, Thank you so much for your dedication and approach for this subject.
    When started to watch the videos, there was no way to stop it and go for comments... because the flow of the subject's deep approach and smooth reaching of the solution point is amazing...
    Now I am commenting on this tutorial it is a really amazing and real tutorial for those earners who did not do the math after the 10th class.
    Thank you so much again...👍👍

  • @ajitesh_thakur
    @ajitesh_thakur 10 месяцев назад

    All the investors poured their money into greedy edtech startups, while Khan academy did it better for free. We need to make Khan academy popular again

  • @scienceblossom6197
    @scienceblossom6197 5 лет назад

    I wonder whether it has been discovered by statisticians empirically too? (I mean without mathematical proofs...)

  • @shukriisse6917
    @shukriisse6917 8 лет назад

    does anyone know how to check a simulation, you know to make sure you're getting the right probability. _Is_ there any one right answer to a simulation?

  • @Titurel
    @Titurel 3 года назад

    Wow. Seen many different takes on this. This is one of the best!

  • @analog1170
    @analog1170 2 года назад

    population=인구(x)->모집단(O), Variance=변화량(x)->분산(o)

  • @ulfy01
    @ulfy01 12 лет назад +1

    Really good software!

  • @valtih1978
    @valtih1978 11 лет назад

    Can you answer how is this correction related with the standard error correction, stats.stackexchange⁄questions ⁄72281?

  • @Corrup7ioN
    @Corrup7ioN 12 лет назад

    I realise this probably gets asked on every video, but does anyone know what drawing package he's using?

  • @Corrup7ioN
    @Corrup7ioN 12 лет назад

    Ah I thought he was using something fancy that reacted to pen pressure/angle =( Thanks.

  • @Corrup7ioN
    @Corrup7ioN 12 лет назад

    Not sure whether you're serious, but it doesn't look like paint to me...

  • @ProofDetectives
    @ProofDetectives 7 месяцев назад

    Thank you.

  • @VGF80
    @VGF80 4 года назад

    Let's say a report comes out that mentions standard deviation. How are we supposed to know which formula was used to calculate that standard deviation.

    • @mysteryperson706
      @mysteryperson706 3 года назад +1

      I'm sorry if this is late, but if it's unsaid you can assume it's likely the unbiased test. It would be acedemic dishonesty to use the biased test without expressly mentioning that you did

    • @VGF80
      @VGF80 3 года назад

      @@mysteryperson706 I guess it makes sense now.

  • @voodooman08
    @voodooman08 12 лет назад

    There is a formal proof. Look for Bessel's correction.

  • @nm800
    @nm800 5 лет назад

    I just would like to know why (xi − x¯) is squared.

    • @redangrybird7564
      @redangrybird7564 5 лет назад +1

      Because if you work with unsquared numbers you will end up always with a sum to zero.

  • @unaliveeveryonenow
    @unaliveeveryonenow 12 лет назад

    At least you are on topic.

  • @wadhwank
    @wadhwank 12 лет назад

    Hmm. That makes sense.

  • @RudeCabbage
    @RudeCabbage 12 лет назад

    Oooooo Pretty ColOuRs !

  • @manikantansrinivasan5261
    @manikantansrinivasan5261 4 года назад

    thanks a lot!!

  • @ClickLikeAndSubscribe
    @ClickLikeAndSubscribe 4 года назад +1

    Why are we "approaching n-1 over n times population variance" (4:57)? Isn't the pink formula on the left the population variance?

  • @verbalswordsman
    @verbalswordsman 12 лет назад

    Haha, Schwing!

  • @GellyBoy93
    @GellyBoy93 12 лет назад

    FIRST

  • @dashingstag
    @dashingstag 12 лет назад

    thanks!

  • @BPMorais1
    @BPMorais1 11 лет назад

    haha nice!