Testing normality is pointless. Do this instead

Поделиться
HTML-код
  • Опубликовано: 15 янв 2025

Комментарии • 78

  • @Saynotoclipontiescch
    @Saynotoclipontiescch 8 месяцев назад +19

    In twenty five years as a psychologist, I have never tested assumption of normality. Now I know I was right not to.

  • @killamaniac08
    @killamaniac08 8 месяцев назад +2

    Hey man, this was a really interesting video. In my master's forever ago they never explicitly mentioned this idea but rather implied it the language used to evaluate models. Namely, using tests at the introductory stages to later saying how robust a model is to deviations of the underlying assumptions.
    Also I'm a huge fan of you're emphasis on diagnostics. The first few times in industry I encountered some bespoke model my company had been paying for I was greeted with all shoulders from management and customer services for the model providers when I asked for model diagnostics to be included. Drove me nuts.

  • @pipertripp
    @pipertripp 8 месяцев назад +9

    And if Kolmogorov-Smirnov says your residuals are not normally distributed, it's big trouble for moose and squirrel!

    • @QuantPsych
      @QuantPsych  8 месяцев назад +2

      ?

    • @pipertripp
      @pipertripp 8 месяцев назад +3

      @@QuantPsych Borris and Natasha from Rocky and Bullwinkle. Good old fashioned cold war stuff.

  • @billyboy1997
    @billyboy1997 7 месяцев назад +4

    I found a new hidden gem channel! Nice video.

  • @karimkhader2526
    @karimkhader2526 2 месяца назад

    According to one of my stats professors, a professor in another department wanted some help and apparently the question they were interested in getting help on was to find out if their data, (consisting of only 2 numbers, came from a normal distribution.

  • @yulia6354
    @yulia6354 7 месяцев назад +3

    as a russian person I think you nailed the russian accent! Well done :D and thanks for your videos! As a medical doctor and a big fan of statistics I really love your way of teaching people complicated stuff)

    • @QuantPsych
      @QuantPsych  6 месяцев назад +2

      High praise from a native :)

  • @lelgazelle
    @lelgazelle 2 месяца назад +1

    Yours and Very Normal are my new favourite channels.

  • @perfectmoments3876
    @perfectmoments3876 5 месяцев назад +4

    I went on to telling my students to run hist(rnorm(n, 0, 1)) a few times with n being their sample size, to get a feeling of what would all be totally fine samples of normal distributions. If their residuals (lm) or samples (t.test) look like they would fit in there, they're good. What do you think of this approach?

    • @QuantPsych
      @QuantPsych  5 месяцев назад +1

      I'm actually developing a package to do basically what you're suggesting

  • @Eloss69
    @Eloss69 8 месяцев назад +3

    Out of the topic but the video makes me think of it : why do we use Pearson correlation when modeling data ? Why not Kendall measure or even better, use Copulas ?
    Using Pearson looks to me like you know nothing about your variables interactions but you want to measure their linear interaction … you will obtain something but is it a useful information ?

    • @galenseilis5971
      @galenseilis5971 8 месяцев назад

      Whether some piece of math is useful depends on what you want/need to know combined with what constraints you are working under.

  • @mohammedyounes98
    @mohammedyounes98 5 месяцев назад +1

    I read in a book that p-values such as 0.25, 0.5, 0.75, and 0.95 lack a rigorous scientific foundation and are largely arbitrary and we’re using it just because some guy said so over 50 years ago.

  • @TheHeadincharge
    @TheHeadincharge 8 месяцев назад +3

    I’ve always wondered why we don’t look at effect size when running these tests at least to make them slightly more useful. Although, I would argue that is true for all parametric tests. Turkey’s quote about parametric tests has always been my favorite to help me understand this interpretation properly. Great video though, normality testing is truly the most misunderstood concept by most psychologists in my experience.

    • @galenseilis5971
      @galenseilis5971 8 месяцев назад

      In principle you should be thinking about effective sample sizes if you are ever performing a null hypothesis significance test (NHST). In practice people doing NHST often don't know to do it or don't care to do it or don't know how to do it.

    • @galenseilis5971
      @galenseilis5971 8 месяцев назад +1

      I'm sure Tukey is rolling in his grave knowing he is now referred to as "Turkey". 😉

  • @RichmondDarko-qo2me
    @RichmondDarko-qo2me 7 месяцев назад +1

    Thank you very much for such informative videos. I spent several years in class and didn't understand all these concepts, but watching this video has made things easier for my comprehension.
    I have a few questions I would like to ask:
    When performing a statistical test, we use a parametric test if the data or variable in question is normally distributed, and a non-parametric alternative if the data or variable is not normally distributed.
    My question is: when does the central limit theorem come into play here?
    Also, a colleague of mine told me to always use parametric tests even if the data is not normally distributed. His explanation was that parametric tests are more powerful than non-parametric tests.
    So, should I straightforwardly use the non-parametric alternative when I observe that my data is not normally distributed, or should I take the CLT into consideration and use the parametric test?

    • @QuantPsych
      @QuantPsych  6 месяцев назад +1

      Central limit theorem makes linear models very robust to violations of normality. That means your inferences will probably be sound (i.e., p-values and confidence intervals will be fairly accurate). But, inference is just *one* thing I'm trying to do with stats; I also want to accurately model the data. If the distribution isn't normal, I shouldn't assume a normal distribution. I instead use generalized linear models (not non-parametric tests).
      Your colleague is wrong. They're only more powerful if you meet the assumptions. But your colleague is right--use parametric models (but the parametric may be a negative binomial regression rather than a typical regression).

  • @deyvismejia7529
    @deyvismejia7529 8 месяцев назад +2

    Why do I feel personally attacked lol I like to test assumptions but great video!!

  • @OnLyhereAlone
    @OnLyhereAlone 2 месяца назад +1

    I have 2 questions:
    1. Although i think this applies to poisson family of models, would you say checking for overdispersion falls under such needless tests also?
    2. More importantly and generally, i believe your statistical doctrines as I've learned a lot from you about generalized linear models (been using mixed models too), but the problem is that i can't cite a youtube video, especially when a reviewer who doesn't belong to the same school of thought challenges me as your statistical disciple. Could you please include pivotal references that could be cited for these concepts and if not available, are you doing anything to make your school of thought represented in literature?
    I hope you see and respond to these questions.
    Good stuff and presentation as always. Thanks.

  • @Pedritox0953
    @Pedritox0953 2 месяца назад +1

    Great video!

  • @Tascioni49
    @Tascioni49 8 месяцев назад +2

    Super useful, especially in ecology, because I rarely get normal data from my field experiments. And when I do, is usually because something went wrong 😆

  • @igorbione4796
    @igorbione4796 7 месяцев назад +1

    Oh my, this video would save me a lot of work if I checked earlier! Thanks!

  • @idodlek
    @idodlek 7 месяцев назад

    Hello Mr. Fife 😀
    Does, for example, running general linear model as t-test versus mann-whitney u test and comparing theirs results count as sensitivity analysis? Or only transformations, bootstraping and trimming would count as sensitivity analysis?

    • @QuantPsych
      @QuantPsych  6 месяцев назад

      Yes, that could count a sensitivity analysis. I do wonder though if you might run into a situation where MW and t-tests agree, but modern robust methods would disagree.

  • @AC-go1tp
    @AC-go1tp 8 месяцев назад +6

    I learned something today. Thank you. But too much comedyto the point that it is distracting.

  • @galenseilis5971
    @galenseilis5971 8 месяцев назад +1

    I don't see a link in the description to the data set. 🐕

    • @QuantPsych
      @QuantPsych  8 месяцев назад +1

      Ah! Thanks for the reminder. It's there now.

  • @samj.vizcaino-vickers8512
    @samj.vizcaino-vickers8512 8 месяцев назад

    @Quant Psych Where's the paper? :c

    • @QuantPsych
      @QuantPsych  8 месяцев назад

      Yes, thanks for the reminder. It's there now.

  • @galenseilis5971
    @galenseilis5971 8 месяцев назад

    If you want your model to be as correct as possible, then you should aim for your model to do a good job of predicting the data distribution. Predicting the conditional expectation is a pretty rough approximation, especially with data sets like this where it is apparent that most of what is going on is not compressed well by a line.

  • @ndrmkhn6559
    @ndrmkhn6559 8 месяцев назад +2

    As Russian I may say your "Russian" pronunciation is adapted from the Snatch or similar quality spy series.

    • @QuantPsych
      @QuantPsych  8 месяцев назад +1

      So you're saying it's perfect? ;)

  • @TheJucuska10
    @TheJucuska10 8 месяцев назад +1

    Thanks for the video, it was great! You can also do one about the independence, because I had problems with it in my last rejected manuscript ;)

    • @QuantPsych
      @QuantPsych  8 месяцев назад

      You can see my videos on mixed models. My introductory to mixed models video talks about it.

    • @TheJucuska10
      @TheJucuska10 8 месяцев назад +1

      @@QuantPsych thank you, I'll check it!

  • @naftalibendavid
    @naftalibendavid 8 месяцев назад +1

    The more power you have, the more power you have to show that your data aren't normal. GREAT! (But maybe a non-parametric...) What is a "meaningful" departure from normality? I don't know...is it big enough to make my real Type I error rate larger than my nominal alpha? Is it so far from normality that my power takes a beating?

    • @QuantPsych
      @QuantPsych  8 месяцев назад +3

      Yes, these are great questions! None of them can be answered by a statistical test.

  • @tomasroosguerra8338
    @tomasroosguerra8338 2 месяца назад

    Great stuff! Not used to all the jokes, but hey. I enjoyed and learned from the part where you gave comments of interpretation of the graphs that were presented.

  • @Nyonyokki
    @Nyonyokki 8 месяцев назад

    I'd love to follow your steps in R but flexplot is not compatible with my R version 4.3.2. Which version do you use?

    • @QuantPsych
      @QuantPsych  8 месяцев назад

      It should be compatible. Are you installing from github?

    • @Nyonyokki
      @Nyonyokki 7 месяцев назад

      @@QuantPsych Ahh, thanks for the hint! And also thanks for sharing your absolutely enjoyable humor!

  • @jishanzaman3421
    @jishanzaman3421 8 месяцев назад

    I've already imagined that one day you'll make a video on this topic...now I got that..thank u so much❤

  • @dimitrioskioroglou4316
    @dimitrioskioroglou4316 8 месяцев назад +3

    You're actually telling me to overlook the p-values and use my brain to... think? Come on!

    • @QuantPsych
      @QuantPsych  8 месяцев назад +1

      Weird, eh?

    • @dimitrioskioroglou4316
      @dimitrioskioroglou4316 8 месяцев назад +2

      @@QuantPsych Well, I cannot tell! The H0 says that it's not weird, so I need to test against it.

  • @zimmejoc
    @zimmejoc 8 месяцев назад

    So is all this proving that our model is robust to violations of the normality assumption? That class was back in 1995 and my professor said we assume normality, independence, and one other thing, but that if we violated one of those assumptions it wasn’t a big deal because our test was robust to those violations.

    • @QuantPsych
      @QuantPsych  8 месяцев назад +1

      There's two different issues: 1. robustness and 2. informativeness of tests. This video is about #2. Tests of assumptions are not informative.
      Robustness, on the other hand, is a different issue. Most models are very robust to normality violations, fairly robust to homoscedasticity violations, and not at all robust to independence or linearity.

    • @idodlek
      @idodlek 8 месяцев назад

      ​@@QuantPsych Could you please tell which models are most robust from normality and which are fairly robust to heteroskedasticity?

    • @naftalibendavid
      @naftalibendavid 8 месяцев назад

      @@idodlek It would depend upon which assumption you violate and in what direction and how severely, but robust alternatives (permutation, winsorized means, M-estimators, percentage-bend correlations) are your best friend. Check out Rand Wilcox's work. Like everything in Stats, it depends...

    • @0x7f16
      @0x7f16 2 месяца назад

      @@idodlekIn this case, they’re talking about the linear model (OLS estimator, to be exact). The OLS estimator is (a) consistent (b) unbiased (c) asymptotically normal under very weak assumptions (“moment conditions”: E(e)=E(Xe)=0). So for large datasets, normality is never a problem (the OLS estimator is going to be asymptotically normal regardless of the error distribution), and heteroskedasticity is also not a problem, just use robust standard errors (e.g. Huber-White, HC2, etc.) Autocorrelation is also not a problem, you can always cluster the standard errors to allow for autocorrelation within each cluster (this is really useful in panel data).

  • @nizogos
    @nizogos 4 месяца назад

    Doesn't normality imply t distribution of the models coefficients? Which in turn is necessary for their confidence intervals and all other things?If we're using the model merely for its output when we plug in the x values then yeah,normality doesn't play a huge role,but that's not how models are used to my understanding,at least that's not a scientific way to report on data.

    • @QuantPsych
      @QuantPsych  3 месяца назад

      Even if we're using confidence intervals, having non-normal data doesn't usually screw up our estimates of the confidence intervals.

    • @nizogos
      @nizogos 3 месяца назад

      @@QuantPsych is there a mathematical proof of that or is it heuristic from years of experience?

    • @0x7f16
      @0x7f16 2 месяца назад

      @@nizogosThe OLS estimator, under “moment conditions”, is asymptotically normal regardless of the error distribution. That’s why we can compute p-values using percentiles of a standard normal distribution (for large datasets, of course).

  • @caviper1
    @caviper1 3 месяца назад

    Heterocedasticidad mata Normalidad (en importancia...)

  • @hamidjess
    @hamidjess 8 месяцев назад +6

    Dude, with all my respect to the depth of the content, could you please do accents more frequently?

    • @QuantPsych
      @QuantPsych  8 месяцев назад +1

      Ha! I've had two entitled Karens tell me I need to change my approach to videos and stop doing accents. But, i side with your preferences :)

  • @NikolaiKaverin
    @NikolaiKaverin 8 месяцев назад

    I have to say that your Russian accent is pretty good

    • @QuantPsych
      @QuantPsych  8 месяцев назад

      Many thanks, comrade.

  • @danhallatt4954
    @danhallatt4954 8 месяцев назад +1

    Second (but, more accurately third ;) )

  • @djangoworldwide7925
    @djangoworldwide7925 8 месяцев назад +3

    I want a version of your videos without the stupid comments. Instead of a 5 mins vid it became 20

    • @QuantPsych
      @QuantPsych  8 месяцев назад +10

      I shall change my entire approach and structure to making videos to accommodate your preferences.

    • @batesthommie2660
      @batesthommie2660 8 месяцев назад

      Hahahahaha Good One

    • @galenseilis5971
      @galenseilis5971 8 месяцев назад +1

      If you want cut-and-dry technical descriptions, then I recommend you read mathematical stats papers. You will find the concision and lack of humour you are searching for there.

  • @bmebri1
    @bmebri1 8 месяцев назад +2

    First

    • @QuantPsych
      @QuantPsych  8 месяцев назад +3

      Technically, I saw it before you did ;)

  • @SkepticalMate
    @SkepticalMate 8 месяцев назад

    Please, please don’t do silly voices, or other clown stuff. Its terribly annoying, and one reason I unsubscibed.

    • @QuantPsych
      @QuantPsych  8 месяцев назад +3

      That's probably for the best. I am who I am, I do what I do.

    • @galenseilis5971
      @galenseilis5971 8 месяцев назад

      RUclips tries to match creators with audiences, but it doesn't always find good matches. I hope you find something else you enjoy watching. I'm partly here for the silly voices.

  • @excelfanboy_
    @excelfanboy_ 8 месяцев назад +1

    This dude is just yapping, don't waste your time

    • @QuantPsych
      @QuantPsych  8 месяцев назад +3

      Seriously. This guy's an idiot.

    • @galenseilis5971
      @galenseilis5971 8 месяцев назад

      I see it very differently. Fife has identified substantial problems with how statistical analysis is conducted and he has dedicated a lot of time, attention, and energy into helping address those problems. For all my commentary disagreeing with him on this channel, he and I are mostly on the same team: statistical practice must get better.