The Unbelievable Reality of Simpson's Paradox

Поделиться
HTML-код
  • Опубликовано: 18 дек 2024

Комментарии • 24

  • @MindLaboratory
    @MindLaboratory Год назад +3

    The best part of the name is that it appears to be a paradox at first, but then isn't a paradox at all when you examine it closer

  • @hempelsraven4893
    @hempelsraven4893 Год назад +1

    Thank you, this is the first video I've found that's explained this in a way I can easily understand

  • @epictetus__
    @epictetus__ 9 месяцев назад +1

    What a great explanation. Instantly subscribed ❤

  • @softerseltzer
    @softerseltzer Год назад +4

    I'd certainly like a series on data ethics.

  • @sharks1349
    @sharks1349 Год назад +5

    Are there any things you should think about when analyzing data to avoid falling into the pitfall of Simpson's paradox?

    • @robharwood3538
      @robharwood3538 Год назад +2

      Yes, the key concept to understand is called 'conditional probability'. RUclips comments too short to give a good intro, but if you search for videos on it, you should get a decent intro.
      Also, a related concept is called Bayes' Theorem, which is one of the most important tools for calculating conditional probabilities, based on other conditional probabilities.

    • @puma7171
      @puma7171 Год назад

      In my experience with working with data is that it is of paramount importance to always look at the data graphically, to get a feeling of what's going on. Relying purely on statistical methods, including standard methods is not enough. Judgement about why distributions look a certain way, the existence of clusters, is key.

  • @iSJ9y217
    @iSJ9y217 10 месяцев назад

    Thank you for such a good examples and explanation

  • @georgesmith4768
    @georgesmith4768 Год назад +1

    I think the case without discrete groups is also relevant. If looking at the last example you had a continous variable (say age as a proxy for amount of education) and a test that did not create two discrete groups, say instead of it entirely consisting of early graduate content it was content selected with equal probability from all relevant difficulty levels (so presumably all college level things), and students with more education studied less for it evenly with higher age (depends on why the graduate students studied less as a group, it could be a discete cuttoff becouse of the qualatative differences betwean grad and undergrad on say free time to study for an unimportant test, but it could be a responce to how hard they expect the test to be, in which case the change to the test should work perfectly); You now should have a thick downward sloping band for your data sample of test score compared to hours studied, and now instead of some wierd subgroup effect it just looks like you just got screwed over by the confounding variable of age when realy hours studied causes higher test scores.
    Thought it was good to mention just becouse I helps me remember that simpsons paradox realy isn’t anything different normal confounding variable problems and so you don’t actualy need any special reasoning, it’s just discrete instead of continuous. And in terms of bad behavor it has the same problems, not reporting a subgroup effect can cause the wrong conclusion to be pushed, but how do you know that a subgroup is relavant? Collecting a ton of subgroup information basicaly p-hacks your new “real” results. The real problem is that the results are taken far to seriously for a non-causative study of data that is to small to reliably try to rigourously detect the presence of counfounding variables. Though in the 2nd example you probably could easily detect the clustering event with a dozen other unrelated group classifiers thrown in

  • @ResilientFighter
    @ResilientFighter Год назад +1

    thanks for your videos as always

  • @Set_Get
    @Set_Get Год назад

    i , actually faced both situations you described, during research on some soils' data.

  • @junkbingo4482
    @junkbingo4482 Год назад +1

    nowadays, we have datascience, and datascientists ( who are engineers)... the principle? programm a class in python for a deep ann or a random forest, fill it up with data thanks to hadoop and other datalakes, and you have a perfect result......
    BUT, when i was young, we studied statistics and econometrics, and our teachers gave us another mentality! they said ' play with your data'
    this means ' before you create a model, you have to analyse deeply the pb, and this will take time, and patience'
    such a paradox was not a pb, but it is with a deep ann
    cheers

  • @Yaara_1
    @Yaara_1 Год назад +4

    I've never been this early to a video before

    • @rewanthnayak2972
      @rewanthnayak2972 Год назад

      me too

    • @Yaara_1
      @Yaara_1 Год назад

      ​@@rewanthnayak2972 Tera bhi JEE advanced kal tha kya?

    • @rewanthnayak2972
      @rewanthnayak2972 Год назад

      @@Yaara_1 😆😆nahi bro mai already btech mai hu. exam kaise gaya

    • @Yaara_1
      @Yaara_1 Год назад

      @@rewanthnayak2972 bekaar. 3 din baad result aayega

  • @Septumsempra8818
    @Septumsempra8818 Год назад +1

    Jay-Z: "Numbers don't lie, check the scoreboard."
    Statistician: "umm...Simpson's Paradox"

  • @softerseltzer
    @softerseltzer Год назад

    Very good topic!