5 statistics questions you should really know

Поделиться
HTML-код
  • Опубликовано: 13 янв 2025

Комментарии • 54

  • @tystovall6574
    @tystovall6574 Месяц назад +105

    Everybody knows about Type 1 Errors and Type 2 Errors, but few know about Type 3 errors: confusing Type 1 Errors with Type 2 Errors

    • @bp56789
      @bp56789 Месяц назад +4

      People should take a course in naming things before they establish new terms. Worst names ever. Names should be few syllables and somewhat self-explanatory. E.g. good hit, bad hit, good miss, bad miss.

    • @tystovall6574
      @tystovall6574 Месяц назад +3

      @bp56789 for real. I have no idea how "Type One" and "Type 2" sounded like a good, memorable, or intuitive name for these.

    • @pipertripp
      @pipertripp 28 дней назад +1

      Oh, I would say type 3 is by far the most common error. 😅

  • @jamesdavis3851
    @jamesdavis3851 Месяц назад +25

    It's also important to emphasize p-values require defining *a priori* exactly what a "more extreme" result is. If it can't be defined, or you define it afterward, you haven't actually generated a p-value.

  • @GabrielAPPer
    @GabrielAPPer Месяц назад +17

    Great video idea! I do think, however, that since these concepts are all quite too simple for someone with a wide experience in statistics, it would be cool to see more versions of this video concerning deeper concepts. As someone which deals with a huge amount of statistics, but since it's under Econometrics, quite limited to Casual Inference, I'd love to see what I'm missing out in the other subfields. Keep up the great work!

    • @very-normal
      @very-normal  Месяц назад +9

      Yeah that’s the tricky thing about videos like this one. On one hand, my audience is full of people who do have deep stats experience, so it’s more of a quick check. But on the other, these are also ideas that I regularly have to teach to researchers during consults. I appreciate the input, I’ll try to think of ways to strike a balance here

    • @GabrielAPPer
      @GabrielAPPer Месяц назад +3

      Love that you care mate. It's always hard to balance complexity with educational themes, just know you make great videos!

    • @lexinwonderland5741
      @lexinwonderland5741 Месяц назад +2

      @@very-normal i would love if instead, you made another one of these videos but "for advanced viewers", because i thought this was too perfect of an introduction to miss out on! keep up the great work dude!!

    • @blueberrypanda931
      @blueberrypanda931 Месяц назад

      @@very-normal just wanted to chime in and say the depth of this video was perfect for a stats beginner like me!

  • @RyJones
    @RyJones Месяц назад +8

    There are two types of statisticians: those who understand power, those that don’t, and those that aren’t sure

  • @im_83n
    @im_83n Месяц назад

    I finally figured out how to remember the difference after watching this video. "false positive" is pretty common language, so that one is type one, as opposed to a false negative which, at least I, didn't hear as often prior to taking stats.
    Thank you again, love the content, and the channel name.

  • @falconarea
    @falconarea Месяц назад +3

    Woow great video. I felt really engaged with the idea of first watch if I know the concept, and later seeing the explanation. It is a much better format to feel I am actually improving, but setting first that I dont fully understand that topic

    • @falconarea
      @falconarea Месяц назад

      Reading the other comments, at least for me I learned a lot. I had a couple of courses in deep statistics, but i dont use it in my day to day (im a computer engineer), so the subjects were not new, but it definitely show me i dont remember anything about them.

    • @falconarea
      @falconarea Месяц назад

      Lastly, I think for me the best method to engrave this type of knowledge is by practical examples (hopefully outside the medical trial ones, they are all over the place and overused)

  • @kononivskipolya3950
    @kononivskipolya3950 Месяц назад

    really interesting to see some other ways to explain the same concepts I'm taught in uni. even though I think I had all the questions right I found it helpful to hear the ideas paraphrased and visualized. good way to enhance intuition.

  • @the_multus
    @the_multus Месяц назад +10

    10:35 that's obviously wrong!
    Everybody knows the word »frequentist« comes from the word »freaqy« ( ͡° ͜ʖ ͡°)

    • @very-normal
      @very-normal  Месяц назад +4

      oops fell for the classic pitfall

  • @parthosen5942
    @parthosen5942 Месяц назад +3

    Hey mate, great work! Would love some videos on the difference between doing causal inference on observational vs experimental data, the pitfalls of linear regression, etc; econometrics topics that aren't technically rigorous but form the foundations of model based inference.

    • @very-normal
      @very-normal  Месяц назад +1

      Yeah I think some causal questions would be good! They come up a lot in Biostat as well

    • @parthosen5942
      @parthosen5942 Месяц назад

      @@very-normal Haha just noticed I had typed casual instead of causal; the exact opposite of what to do XD

  • @yusufspahi1693
    @yusufspahi1693 Месяц назад +2

    more of these please

  • @RyeCA
    @RyeCA Месяц назад

    I wonder how I have never heard of the long run idea...
    Anyway, great video, I was able to answer questions 1-4! Looking forward for more question videos :)

  • @A93758SGasdhhfgsaiey
    @A93758SGasdhhfgsaiey Месяц назад +1

    8:50 thats so unhinged for you lol

  • @pfizerpflanze
    @pfizerpflanze Месяц назад

    I think that a more general math formula for the pvalue should be
    2*min(P(T≥t|H0), P(T≤t|H0)) because for example when testing
    H0: σ²=σ²_0 vs H0: σ²≠σ²0 with a normal srs the usual test statistics is asymmetrical (it's a chi squared with n-1 df).
    It's not very common though, because most of the tests are either chi squared with bigger values the farther from the null or normal/student's t distributed under the null

  • @giovannimantovani795
    @giovannimantovani795 Месяц назад

    Super idea, go on please!

  • @tofonofo4606
    @tofonofo4606 Месяц назад +1

    Very Nice 👏

  • @Inexorablehorror
    @Inexorablehorror Месяц назад

    Thank you for the video. Just a small remark: at 4:30 , your text says "... the two have a similar effect." I am not a native speaker, but doesn't it imply that they are not equal, but have a small difference, i.e. a small effect size. In that case, the drug would be indeed different from placebo and the null hypothesis WOULD be wrong and you did NOT commit a type 1 error (which is a valid criticism to frequentism and NHST, that in the real world, the Null is never true...).

  • @the_multus
    @the_multus Месяц назад

    Simple questions really help me to up my terminology game in english! So thx, I guess.

  • @nabibunbillah1839
    @nabibunbillah1839 Месяц назад

    It is useful. Thanks 😊

  • @user-hl6xe8dz9x
    @user-hl6xe8dz9x Месяц назад +1

    One request please make more elaborative video on Type-1,2,3 or sthg else or similar if exist and also on power with real use case in biology as you are Biosatistician.

  • @d_b_
    @d_b_ 24 дня назад

    It'd be nice if the terminology was more descriptive of what they're measuring. Especially type 1/2 error. Like when programming, you'd want good variable names. Instead of type 1, maybe "false alarm rate"/"cry wolf rate". Type 2: "overlooked rate"/"failed rejection rate", power: "detection rate"/"bullseye rate"

    • @very-normal
      @very-normal  24 дня назад +2

      lol yeah I agree. Unfortunately, statisticians are the worst at naming things. Don’t even get me started on stuff like “sufficiency”, “completeness”, or “almost sure convergence”.
      But to be fair, statistics can be used in so many different contexts that it almost has to suffer from needing to use very vague, general terms

  • @mesplin3
    @mesplin3 Месяц назад

    With question 5, would it be okay to say that there is a number L such that probability that distance between a sample proportion and L can be arbitrarily small yet that probability will approach one over many trials?

    • @very-normal
      @very-normal  Месяц назад +2

      Yeahh, that’s about right. At that part, I made a vague reference to the Law of Large Numbers, which is similar to what you’re describing. Convergence in probability (or almost surely, depending on what law is used)

  • @tomasroosguerra8338
    @tomasroosguerra8338 Месяц назад

    This way of teaching is really good - with questions leading forward a full narrative. I think you're on to something. Thank you.

  • @braineaterzombie3981
    @braineaterzombie3981 Месяц назад

    Nice vid

  • @SwissPGO
    @SwissPGO Месяц назад +1

    try to make a video where you question chatgpt's models on statistical data analysis and how it succeeds or fails?

  • @LoganHolmes-og5jm
    @LoganHolmes-og5jm Месяц назад

    Baba Is You music in a statistics video 🤯

  • @nicolasrobertovitordemorae9396
    @nicolasrobertovitordemorae9396 Месяц назад +1

    Nice

  • @mertaliyigit3288
    @mertaliyigit3288 Месяц назад

    Omg baba is you music!!!

  • @GrahamBornholt
    @GrahamBornholt 22 дня назад

    Technically, the standard p-value is not a conditional probability.

    • @very-normal
      @very-normal  21 день назад

      how would you describe it tho lol

  • @simonpedley9729
    @simonpedley9729 Месяц назад +1

    Let`s say I have a Bayesian prediction system. I test it 1000000 times. I find that the 90% quantile is exceeded 20% of the time. So it`s a fail, obviously. This illustrates that even if you are a Bayesian, you also have to be a frequentist, because not being a frequentist breaks the definition of probability and probability doesn`t make sense any more. B and F are not opposites. They refer to different things. They can work hand in hand.

    • @the_multus
      @the_multus Месяц назад +2

      Not really. We can assign a probability for the explosion of a bomb, but once it happens it is no longer replicable. Sure, you can argue, that the bomb could be rebuilt, but let's say it's unique for the sake of the argument. In this case there is no frequency.
      You could also think of a locally probabilistic process, which evolves over time: e.g. reproduction of organisms in a sequestered eco-system (they do breed, but they go extinct after some, likely long, time)

    • @simonpedley9729
      @simonpedley9729 Месяц назад

      @@the_multus Wrt the bomb, the explosion can't be repeated. But the model that you use to model the explosion can be tested as many times as you want, using random data based on the assumptions in the model. So let's say you test it, and you find that the 90% quantile is exceeded 20% of the time. That means that the model is inconsistent with it's own assumptions. Folk are not going to accept a model like that, once that's been pointed out. The point is...even Bayesian predictions need to have good frequentist properties in this way (the modelling methodology has to have good frequentist probabilities), or the probabilities that come out are not plausible. Over your career as a bomb disposal expert, you would try and defuse 1000s of bombs. You'd hope that the models that you use would have the property that the 90% quantile is exceeded 10% of the time, over the course of your career.
      I'm not sure I quite understand the second point. Is the point that it's non-stationary? If that's the point, then that's just like weather forecasts. And weather forecasts need to have the property that the daily 90% quantile is exceeded 10% of the time (over many repeats), otherwise they are not plausible (whether they come from a Bayesian analysis or something else).
      There was a famous paper on this topic by Philip Dawid back in the 80s, which is a good starting point. And I just wrote a paper about it, but it's not published just yet...
      The difference between what you are saying and what I am saying is very subtle, and I'm not sure I 100% understand it. I am making a point about predictions, that all predictions need to have good frequentist properties. I don't think anyone would really disagree with my point. With your bomb, you are making a point about the definition of probability for real events, and I don't think anyone would disagree with your point either. I think the resolution is that we're not really disagreeing because we are talking about different things.

    • @the_multus
      @the_multus Месяц назад

      @@simonpedley9729 oh, I see now. We were just talking about different things: predictions in a model environment and a real environment. I do agree, that a model should stand to frequentist analysis. I've just provided two examples of objects, which don't demonstrate a cyclical (in a sense) behaviour but could still be reasonably described by probabilistic methods.
      That's all. Good point.

  • @Blahcub
    @Blahcub Месяц назад +2

    These were too easy and basic.

    • @very-normal
      @very-normal  Месяц назад +1

      You know your stuff!! If you have another question you get tripped up by, I’m game to try to help out

    • @yoeri7004
      @yoeri7004 Месяц назад +4

      @@very-normal I would love to see more of these type of videos, but for more advanced topics.
      E.g. pitfalls when using MCMC or pitfalls when doing logit, to name a few topics you've covered earlier

    • @ratpackenterprises1607
      @ratpackenterprises1607 Месяц назад

      @@yoeri7004 Truuu