Misunderstanding Regression to the Mean

Поделиться
HTML-код
  • Опубликовано: 27 янв 2025

Комментарии • 64

  • @wei-ching_lin
    @wei-ching_lin 4 года назад +14

    this is the most accurate and rigorous explanation on RUclips after Ive skip through top ten craps in search results.

  • @daniellemayall4463
    @daniellemayall4463 4 года назад +3

    Thank you Joel- this was wonderful! I appreciate you taking the time to make this!

  • @kaushikdr
    @kaushikdr 5 лет назад +2

    Truly a wonderful video - you speak with the mindset of someone who didn't understand it before and it gives me confidence that I too may one day be able to understand it!

  • @kangre63
    @kangre63 2 года назад +1

    Your explanation is excellent. I just subscribed. Plus your voice is so easy to listen to. Can’t wait to check out your other videos. Thank you!

  • @gregs6403
    @gregs6403 3 года назад +2

    Fantastic explanation. I was looking for a video to help me understand how the gambler’s fallacy and regression to mean can coexist (I was sure I was misunderstanding one), and this video did the trick.
    I’m not good at math but your explanations and phrasing were so succinct that I think I get it now:
    The mean is dictated *by* the regression pattern. It isn’t an external force, but simply a pattern that exists inherently.

  • @aifan6148
    @aifan6148 4 года назад +2

    Thank you so much for the detailed explanation!!! I've learned so much!!!

  • @TL-lc9mb
    @TL-lc9mb 3 года назад +1

    A very very good explanation; thanks Joel

  • @stugoldberg3592
    @stugoldberg3592 5 лет назад +2

    This was a great explanation of Regression to the mean. I now understand how it is a mathematical certainty! The extensions to composite IQ's was not as clear. But, overall a great explanation of what is often a misrepresented concept. Thanks!

  • @SkodaUFOInternational
    @SkodaUFOInternational Год назад +1

    Thanks for this. You are amazing.

  • @orlevitas2944
    @orlevitas2944 4 года назад

    Great video!
    Could you please explain:
    1. 19:15 - why Zx = Pxy * Zy (how to get to this formula)?
    2. 14:40 - How did get the expression for b1?
    Appreciate you answers!
    Thanks!

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  4 года назад +1

      The z-score formula is z = (x - mean of x) / ( sd of ). It is just a way to convert any variable so that it has standard deviation units. The regression formula implies the relationship you are asking about.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  4 года назад +1

      The second formula can be found here: www.wikiwand.com/en/Simple_linear_regression#/Fitting_the_regression_line

    • @orlevitas2944
      @orlevitas2944 4 года назад

      @@JoelSchneiderPsy Thank you!

  • @geraldramos3961
    @geraldramos3961 Год назад

    Thank you for the video. You say that regression to the mean (RTM) happens only at group level. Can you comment on the statement to the contrary in the paper by Barnett et al. (2005; doi: 10.1093/ije/dyh299) wherein they state that RTM at the subject level occurs because of the repeated measurements of random errors? Would you consider their position different from yours in this regard (I.e., incorrect)? Thanks in advance!

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  Год назад +1

      Barnett et al. are correct. My apologies for the imprecise wording of my statement. When I said it, I was thinking of a situation in which variable x predicts variable y and the person is the level 1 variable (i.e., 1 row per person). In that context, RTM is a group-level phenomenon. However, with repeated measurement, the data are clustered by person (i.e., the person is the level 2 variable). In that context, there are multiple rows per person, and each person is associated with a "group" of scores. If we predict a level 1 variable with another level 1 variable, regression to the mean will occur within each group (i.e., person). Thus, RTM is still a group-level phenomenon, even though the "group" refers to repeated measurements clustered by individuals.

  • @jameskloberdance9338
    @jameskloberdance9338 5 лет назад +1

    Great presentation. What book would you recommend for learning more about this topic?

  • @Daft_Sage
    @Daft_Sage 8 лет назад +4

    Thanks for ruining my day. I now know less about regression than i did before watching this.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  8 лет назад

      Which part was confusing?

    • @Daft_Sage
      @Daft_Sage 8 лет назад

      At the end of the video, i still get the impression that regression to the mean should reduce variability.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  8 лет назад +2

      When prediction is uncertain, we play it safe by making guesses about an outcome that are closer to the mean than the predictor variable. For example, if a person's high school GPA is 2 standard deviations above the national high school GPA mean, it would be a mistake to predict that that person's college GPA will be 2 standard deviations above the national college GPA mean. Maybe it will be that high in this particular person's case, but that is not the safe bet. If the correlation between those two variables is 0.5, the safe bet is to predict that the person's college GPA will be 1 standard deviation above the national college GPA mean (2 * 0.5 = 1).
      Notice that it is our predictions that have reduced variability, not the the variable we are trying to predict. College GPA has the same variability as before, no matter what predictions we make. In other words, the data do not regress to the mean, but our predictions do. If we used an irrelevant predictor (e.g., shoe size), our predictions of college GPA would regress all the way to the mean, but the variability of college GPA stays the same.
      If regression to the mean caused college GPA to have reduced variability, it would lead to absurdities. For example, the variability of college GPA would change from moment to moment depending on which variable we used to predict it (e.g., ACT, SAT, parental income, and so forth).

    • @Daft_Sage
      @Daft_Sage 8 лет назад +2

      Joel Schneider it sounds very cool reading it, but i am having trouble wrapping my head around it right now. Maybe i will rewatch the video tomorrow (when i have gotten more sleep) and try to come up with questions if i still don't get it.

    • @kaushikdr
      @kaushikdr 5 лет назад

      @@JoelSchneiderPsy I am a bit confused because we are not assuming or saying that the SD of high school GPA's should be the same as the SD of college GPA's, right? And I do agree that it would be crazy to assume that variability changes - in the case of the father and son, if we are looking at the correlation from son to father, the variability would reduce as well as if we are looking from the father to son! Finally, if we see a correlation of 0.8 for father vs. son, shouldn't we see a correlation above 1 for son vs. father?

  • @yogesh_ganesh_chawat
    @yogesh_ganesh_chawat 5 лет назад +1

    Thanks
    Nice explanation
    You have a soothing voice also.

  • @CheburashkaGenovna
    @CheburashkaGenovna Год назад

    A tutor I wish I had! 👏

  • @adawang9147
    @adawang9147 4 года назад +2

    "17.10" Boom, a whole new world.

  • @crwa111
    @crwa111 3 года назад

    At 31:00, I’m assuming this is where you make the case that Composite Score is a better reflection of General Ability than Average Score because .97>.95. But how did you get .97 to begin with?

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  3 года назад

      Sorry, I should have explained that, but the video was already long and technical. The solution can be obtained from my R package simstandard (wjschne.github.io/simstandard/index.html). This R code would do the trick:
      # Model: IQ1 and IQ2 are indicators of latent IQ
      model

  • @naftalibendavid
    @naftalibendavid 10 лет назад +3

    Nicely done!

  • @minz3638
    @minz3638 8 лет назад +1

    Thank you Joel for a great explanation. I was wondering if this statement from Wikipedia about clinical trails is true. " due to the body's natural healing ability and statistical effects such as regression to the mean, many patients will get better even when given no treatment at all." I suppose the correlation/coefficiency between the treatment and no treatment is not perfect, how do you determine the r in this case

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  8 лет назад +4

      +Elaine Zoe Good point. The Wikipedia statement is not wrong but it might be easily misunderstood. Regression to the mean has no healing properties. However, even with chronic conditions, people's health and symptoms fluctuate over time. People are much more likely to seek treatment when they are at their worst than when when they are at their personal baseline level of health. A spike in symptoms often prompts the person to seek additional help. Even if the additional help is ineffective, a natural return to the person's (untreated) baseline might be attributed to the intervention instead of to the natural ups and downs of the condition.
      Thus, if we were to select people at random to receive a sham intervention, the group's subsequent long-term health would not be better on average. However, if people were selected such that many of them were feeling worse than they usually feel, the group's subsequent health would appear to improve because each person's condition would tend to regress to each individual's (untreated) personal mean. Including a randomly assigned control group into a study's design allows us to partial out effects like this.

  • @Mr.Caring
    @Mr.Caring 3 года назад +1

    Well done 👍

  • @anindyab
    @anindyab 4 года назад

    Can we please have some material on the composite score calculation? Intuition and maths both please. Couldn't grasp it at all.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  4 года назад

      What would you like to know, specifically?

    • @anindyab
      @anindyab 4 года назад

      @@JoelSchneiderPsy I can't grasp the intuition that the composite will be more extreme. Also, it'll help if you can point me to the mathematical derivation of the formula.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  4 года назад +1

      @@anindyab See if this paper helps: www.hmhco.com/~/media/sites/home/hmh-assessments/clinical/woodcock-johnson/pdf/wjiv/wjiv_asb_7.pdf

    • @anindyab
      @anindyab 4 года назад

      @@JoelSchneiderPsy Thanks for your prompt replies, I will check this out.

  • @OsRaunio
    @OsRaunio 8 лет назад +2

    Simple: test subjects have some random variance in testing. There are more average subjects, than higher -> more average subjects get higher score by chance, than higher subjects get average by chance. In re-testing, the average subjects get their average score and high subjects get the higher score. The averages outnumber the high ones, and make a regression to the mean.

    • @Garrettthief
      @Garrettthief 8 лет назад

      I would really like @Joel Schneider to respond to this one. Its quite a commen explanation for the regression to mean which, from what I see, is different from what this video proposes.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  8 лет назад +1

      If most people were at the extremes and no one was at the average, there would still be regression to the mean. It is the random variance in the test score (resulting in imperfect correlation over time) that causes regression to the mean, not the type of distribution.

    • @OsRaunio
      @OsRaunio 7 лет назад

      So if there would be a population of 200 and 100 were IQ 130 and 100 were IQ70, we would test someone who would get IQ 70 and someone who would get IQ130 on the first test. The second test would likely result somthing like IQ75 and IQ 125, beause regression to the mean (IQ100) ?

    • @Garrettthief
      @Garrettthief 7 лет назад

      Not sure if you can describe the regression like that for an individual subject.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  7 лет назад +1

      @Garretthief is right. Regression to the mean is a group-level concept.
      Let's say that 70 and 130 are the only possible scores. If on the second testing at least some people's IQ changed from 130 to 70 and some people's IQ changed from 70 to 130, then the test-retest correlation is less than perfect.
      For the people who scored 130 on the first testing, their average score is less than 130 on the second (because at least some of them have an IQ of 70 instead of 130). Note that none of them have anything but a 70 or a 130. Yet the mean score is between 70 and 130.
      Likewise, for the people who scored 70 on the first testing, their average score on the second testing is higher than 70 (because at least some of them have an IQ 130 instead of 70).

  • @ZuckThat
    @ZuckThat 8 лет назад

    Just made a video about Regression toward the Mean and the Sports Illustrated Cover Curse!
    Be sure to check it out. New videos will be coming every week :)
    ruclips.net/video/DSdAQL7tIqI/видео.html

  • @Mister.Psychology
    @Mister.Psychology 9 лет назад

    Okay, I didn't get why 75 in IQ equals 66 in IQ. Weird.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  9 лет назад +1

      +Jurij Fedorov Yes, it is weird. I'll have a short paper coming out soon that explains this phenomenon in more detail. The short answer is that it is unusual to have one low score (e.g., 75). It is even more unusual to have multiple low scores (e.g., all 75). A composite score that is a summary of multiple scores of 75 must be lower than 75 to reflect how unusual it is to have that many low scores.

    • @Mister.Psychology
      @Mister.Psychology 9 лет назад

      Joel Schneider Thanks for replying. I am currently watching all videos on IQ I can find.
      So if my IQ is 66 then I would probably get 75 IQ average on the 10 factor tests?
      And if you know something about IQ are you willing to do an AMA on reddit?

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  9 лет назад +2

      +Jurij Fedorov Yes, that is the correct interpretation. A person with an IQ of 66 will, on average score higher on any particular part of the test.
      I am not excited to do an AMA on reddit. However, I am happy to answer any questions you might have. Email me at wjschne@ilstu.edu

  • @yehudalimony1899
    @yehudalimony1899 9 лет назад

    an excellent explanation !

  • @pandorasmind4735
    @pandorasmind4735 4 года назад

    I am not a math geek but I am not an idiot either, this is one hell of a hard video to follow
    I need a simplified version my smol brain hrts

    • @pandorasmind4735
      @pandorasmind4735 4 года назад

      I AM FUCKING CRYING :(((
      I lost it when he went into the graph with slanted Y axis

    • @pandorasmind4735
      @pandorasmind4735 4 года назад

      I am lost honestly, how does anything relate to regression to mean.... i am soo depressed rn i watched 42 minutes and understood less than what i was going in...
      You did a hella effort but its really hard to understand

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  4 года назад +2

      @@pandorasmind4735 So sorry my video made you feel this way. The math part is not really essential to understanding the main idea conveyed in the first few minutes. One of the reasons I included so much mathematical detail was that I made the video for a team of lawyers defending a man with an intellectual disability on death row. I posted the video in case anyone else found it useful. If I were making the video for the general public, I would have simplified everything considerably.
      Sidenote: It is unconstitutional in the U.S. to execute someone with an intellectual disability. However, this case was controversial because some of his IQ test results were above the diagnostic threshold for intellectual disability, and his legal team needed to understand why this can happen even though the diagnosis was correct. The man's case went all the way to the U.S Supreme Court, and the lawyers had to understand in great detail how regression to the mean produces surprising patterns when IQ tests are given repeatedly. In the end, the legal team was successful in reducing the sentence from capital punishment to life in prison. My contribution was just a small part of a large effort by many people.

    • @pandorasmind4735
      @pandorasmind4735 4 года назад

      @@JoelSchneiderPsy I never expected for you to answer tbh and thank you for clarifying
      I did get the main idea but I am confused of 1 thing If variability doesnt decreases, then what is regression to mean? (I understand its a explanation or theory but not a causation of the pattern that occurs)

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  4 года назад +1

      @@pandorasmind4735 Yep, it is confusing. Perhaps this will help: Suppose we measure the same variable twice, and the Time 1 and Time 2 scores are positively correlated. We use the Time 1 scores to predict the Time 2 scores. Regression to the mean puts no constraints on how dispersed the Time 2 scores will be compared to the Time 1 scores---anything can happen. However, when the correlation between the two variables is imperfect, regression to the mean necessarily implies that the predicted Time 2 scores are less variable than the actual Time 2 scores. So, regression to the mean is more about the variability of predictions than it is about the variability of observed scores. This makes sense because if we need to make a guess, we need to be at least a little bit cautious. Thus, our guesses will fall in a more narrow range than the real scores, particularly when the correlation between Time 1 and Time 2 scores is low.

  • @ispinozist7941
    @ispinozist7941 7 лет назад +1

    One of the most confusing explanations of a not so confusing thing. Seriously.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  7 лет назад +2

      For my benefit, which part was confusing?

    • @ispinozist7941
      @ispinozist7941 7 лет назад +1

      Joel Schneider for those who don't know the subject there is too much complicated detail, too fast, with little elaboration before proceeding. Also, it helps if you frame this topic in terms of probability rather than just time series data.

    • @JoelSchneiderPsy
      @JoelSchneiderPsy  7 лет назад +2

      Thanks for the feedback.

  • @woowooNeedsFaith
    @woowooNeedsFaith 5 лет назад

    I never took statistic course but I think you pretty much showed composite to be a rubbish concept. But what comes to capital punishment, it is idiotic concept.