1.2 - Motivating Example: Simpson’s Paradox

Brady Neal - Causal Inference

Просмотров 37 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 дек 2024

Комментарии • 40

@chongsun7872 Месяц назад
I remembered I have read many many materials to try to understand Simpson's paradox and understand when to combine groups. This is THE most clear one I have ever listened to!
@ukn01642 4 года назад ⁺²¹
Brady, thanks for taking the time to put together a course on causal inference. I have watched the first week's lecture and appreciate the clarity of your explanations and the recommended reading material. I look forward to being part of this course in the coming weeks and months! Once again, Thank you!!!
@BradyNealCausalInference 4 года назад ⁺²
Welcome to the course! Thanks for the support, Uday!
@farbodsafe 4 года назад ⁺⁹
Thank you for this amazing course. I am a little confused about how the patient condition data in the COVID-27 table is recorded.
In the textbook you state: "You have data on the percentage of people who die from COVID-27,
given the treatment they were assigned and given their condition *at the time treatment was decided* ."
However my understanding from 8:33 is that when treatment is a cause for the condition, a patient who is assigned treatment B needs to wait for a long period to be administered treatment B and as a result many patients who had a mild condition at the time treatment B was decided for them transitioned into a severe condition. Therefore the data in the table for patients assigned treatment B record a high number of severe cases. In other words the 500 out of 550 patients that were both assigned treatment B and also had a sever condition consist of patients who had a severe condition at the time their treatment was decided plus those who transitioned from mild to severe potentially *long after the treatment was decided* .
Therefore the exact definition of C=Mild/Severe is not clear to me. Is it
1.the condition at the time treatment was decided or
2. the condition either at the time treatment was administered or right before the patient died before treatment could be administered but after it was decided
@failfection 4 года назад ⁺³
This is my question as well. If anything wouldn't we have two conditions one prior to treatment and one at the time of treatment then? It feels oversimplified to "assume treatment B takes a long time" without defining that upfront, that would also demand more data as well. Otherwise we could insert any counterfactual at that point. What if waiting for Treatment B actually made things better. It's like we're just picking reasons to justify the numbers, it makes it more confusing for me. Or is it that the overall point is the reasons don't necessarily matter, just if we use treatment as cause for the condition, we need to use the total percentage as our assumption, the why the numbers look like that are less important?
@escargot8854 8 месяцев назад
I found the same discrepancy in the example. Researchers are not changing the condition mid-experiment. And if the mild B users worsen we would see higher mortality rates in that subgroup. The data does not match the causal explanation for the T -> C case
@jiaxinyuan3408 2 дня назад
I have the same confusion here. I'm thinking in this scenario, doesn't we introduce a new variable of Waiting time? In this case, the causal graph should incorporate the new variable as a mediator into it, no? like, T->WT->C->O
@rchou17 2 года назад
Thank you for the presentation! As a lay person, I can capture from the movie the very first concept of causation analysis.
@ivankrokhalyov8459 4 года назад
great, clearly explained! I didn't understand that in TED video, but now, with causal graph it's clear!
@BradyNealCausalInference 4 года назад
Glad to hear it! What TED video are you talking about?
@ivankrokhalyov8459 4 года назад
@@BradyNealCausalInference ruclips.net/video/sxYrzzy3cq8/видео.html&ab_channel=TED-Ed about Simpson's paradox too, but it's better to explain on causal graph. We need to understand, severe state causes treatment, or treatment causes severe state.
@toddkelman3967 4 года назад
Many thanks for all the time & effort putting this course together and making it publicly available! During the viewing of the motivating example, I was wondering if it would be worth taking a minute to introduce (only) the very basic aspects of a causal graph for those unfamiliar with the concept. I suspect there will be many in that boat, since the only prerequisite for the course is basic probability...? I think the motivating example will resonate even better with that tiny bit of background provided to the viewer, as they won't be wondering how to make sense of the graph while you're explaining its ramifications on deciding treatment A vs B. Thanks so much again for your efforts!
@BradyNealCausalInference 4 года назад
All I mean with A --> B is that A is a cause of B (changing A can result in changes in B). I'll define causal graphs more completely in week 3 :)
@LoneXeaglE 2 года назад
Very well explained. one friendly advise tho is to edit mouth click sounds and breathing sounds, for example using adobe Audition is super easy for that. or free software Audacity!
Thank you very much!!
@kangchenghou5027 4 года назад ⁺³
thank you Brady for teaching this great course! looking forward to following this and learn more about causal inference.
Regarding the two causal diagram, is it possible to represent a mixture of causal explainations? For example, treatment B does have better effects, which will be the reason that doctors tend to assign patients with severe conditions to treatment B. At the same time, due to shortage of treatment B, patients tend to wait more and the condition tends to be worse. In this situation, if we are chossing treatment for a particular patient, we need to consider both factors, and summarize the total effects of treatment B. Does this make sense? It seems to me it's hard to represent this situation using things like causal diagram. Does causal inference have some tools to deal with this situation?
@BradyNealCausalInference 4 года назад ⁺¹
Great question. If I understand you correctly, it sounds like you're saying the DIRECT effect of treatment B (effect once it's taken) is better than the direct effect of treatment A, but the TOTAL effect of treatment A is better then the total effect of treatment B (because of waiting). In fact, the graph in scenario 2 describes this. The direct effect is represented by the causation flowing along the arrow from T to Y (B is better). The total effect is represented by the causation flowing along both T --> C --> Y and T --> Y together (A is better).
We will see this "flow of causation" more in depth in week 3. We will see direct effects and indirect effects (in contrast to total effects) much later in the course, when we get to mediation.
@kangchenghou5027 4 года назад
@@BradyNealCausalInference very clear explaination, thanks!
@ebiiseo 3 года назад ⁺¹
Hi Brady, thanks for the video. From your two causality graph scenarios, there seem to be additional variables: scarcity and treatment duration. Should have these variables been included in the graphs? And also, how do we evaluate which causality graph is the more suitable one? Thanks
@haojiezhou4708 3 года назад
Good question!
@caralee800 3 года назад
Thank you so much! - from Korea
@JTedam 5 месяцев назад
Neal, I would like to propose an alternative explanation to yours using a scientific realism perspective on causal inference. You said,
In scenario 1, treatment has an effect on the condition which has an effect on the probability of the outcome. And in scenario 2: the condition has an effect on the treatment which has an effect on the probability of the outcome.
The treatment is the mechanism and the condition, the context or circumstances. I would argue the mechanism is the cause and the condition is the trigger. In scenario one, the outcome is considered in the basis of the mechanism alone and in scenario 2, the outcome is deduced on the basis of the trigger condition of the mechanism. Mechanisms are always triggered in context to create outcomes. Mechanism are causes but their outcomes are always shaped by context. I would also add that some mechanisms may be hidden, meaning there are other unobservable mechanisms - which under experimental conditions (closed systems) can be controlled. In open systems, the outcome may not be easily predictable because of these unobservable mechanisms.
In you Mr example, these mechanisms could be environmental - co-morbidities, contra-indications, emotional state, age related etc.
So I would be cautious about relying on data driven inferencing alone.
@Gabriel-pt3ci 3 года назад
Thank you for the lecture. I have a question: How do we know which is the categorization of the population that is relevant to detect causality in this case? I mean, this same effect of Simpson's paradox could have happened with the two genders, a division by age, race, you name it... Is there a universal way of detecting that there are categories correlating differently from the whole data?
@anonymousdragon8734 4 года назад ⁺¹
I have a small question: in scenario #1, if the condition C is not a cause of the treatment T, i.e., there is no arrow pointing from C to T, the conclusion for scenario #1 still holds, correct? Meaning that, in scenario #1, Treatment B would still be preferred?
@BradyNealCausalInference 4 года назад ⁺²
Unfortunately, it isn't that simple. It depends what the source of the (Simpson's paradox) flipping is. The data in that example wouldn't actually be compatible with a causal graph where there is no "unblocked" path between T and Y that goes through C (to account for the flipping). Unfortunately, the definition of an "unblocked path" won't come until week 3 / Chapter 3.
@anonymousdragon8734 4 года назад
@@BradyNealCausalInference I see, thanks! I should have thought about that the observational data is only compatible with (can be described by) certain causal graphs. look forward to week 3 :))
@anonymousdragon8734 4 года назад
@@BradyNealCausalInference I have a quick followup question... I just watched 1.5 and was wondering what happens if the causal graph is the one in scenario #2. I think in this case, the causal effect E [ Y | do(t) ] would be the same as the conditional expectation E [ Y | t ] ? Because the do(t) in this case does not break the dependence of C on T, so you will still weigh E [ Y | T , C ] with p ( C | T ) instead of p(C) ?
@BradyNealCausalInference 4 года назад ⁺¹
@@anonymousdragon8734 You are exactly correct, and it sounds like you understand the correct thing from two different perspectives :)
@Ming_Qiu 4 года назад
thank you for the courses! If you can tell me for the situation where "c -> T", "T" is the "prescription of the treatment" or the "subsequent reception of the treatment"? And for the two different definitions of T in this situation, the preference is always A?
@BradyNealCausalInference 4 года назад
It's probably best to make T the same in both scenarios, so "prescription of the treatment."
"And for the two different definitions of T in this situation, the preference is always A?"
Yes, just because the details of scenario 1 differ from those of scenario 2 (e.g. wait time could be short in scenario 1).
@williamdu9200 4 года назад
Thank you for your course ! I have a question : Whether scenario 1 and scenario 2 are based on the maximum number of people, regardless of the number of deaths.
@BradyNealCausalInference 4 года назад
I don't quite understand your question. Feel free to re-ask. The number of people that fall in each group is important. It has to be this sort of diagonal imbalance to see Simpson's Paradox.
@williamdu9200 4 года назад
@@BradyNealCausalInference Thanks for your reply! Sorry for my poor English. My question is that when we consider which treatment we choose, neither Scenario 1 nor Scenario 2 focusing on the number of deaths, only on the number of casual elements which one is more? In Scenario 1, C is the casual of T, C only considers which T has the highest number of people in it. In Scenario 2, T is the casual of C, T only focuses on which C has the highest number of people in it.
@williamdu9200 4 года назад
@@BradyNealCausalInference so the simpson paradox is to illustrate that because the two sets of data are distributed differently, we cannot compare them directly
@thomasjoseph8634 3 года назад
Hi Bradly, thank you for this amazing course.
I have a follow up question on scenario 2, where the treatment T is the cause of condition T ------- > C. It was discussed that in such cases treatment A is better than treatment B. Would the equations for that case be like the following
A == > P(Y|M,A) * P(M|A) + P(Y|S,A) * P(S|A) == > .15 * 1400/1500 + .3 * 100/1500 == > 16%
B == > P(Y|M,B) * P(M|B) + P(Y|S,B) * P(S|B) == > .10 * 50/550 + .20 * 500/550 ==> 19%
Here S ----> Severe condition , M-----> Mild condition
Is this representation correct ?
@joelcurtis562 3 года назад ⁺¹
Why would a doctor be more likely to prescribe treatment B to a patient with severe symptoms, if not because they already think treatment B is more effective? In which case they're begging the question.
@yuliiayarmolenko9008 3 месяца назад
Because “more effective” treatment is usually associated with more side effects. So why risk having them if “less effective” treatment might potentially help at first place
@English1108 11 месяцев назад
It seems like the wording on this was reversed, right? scenario 1 - condition causes the treatment. scenario 2 - the treatment causes the condition.
@yubai6549 4 года назад
Thank you!
@frankl1 10 месяцев назад
Covid-27 got me 😂😂😂
@JTedam 5 месяцев назад
Neal, I would like to propose an alternative explanation to yours using a scientific realism perspective on causal inference. You said,
In scenario 1, treatment has an effect on the condition which has an effect on the probability of the outcome. And in scenario 2: the condition has an effect on the treatment which has an effect on the probability of the outcome.
The treatment is the mechanism and the condition, the context or circumstances. I would argue the mechanism is the cause and the condition is the trigger. In scenario one, the outcome is considered in the basis of the mechanism alone and in scenario 2, the outcome is deduced on the basis of the trigger condition of the mechanism. Mechanisms are always triggered in context to create outcomes. Mechanism are causes but their outcomes are always shaped by context. I would also add that some mechanisms may be hidden, meaning there are other unobservable mechanisms - which under experimental conditions (!closed systems) can be controlled. In open systems, the outcome may not be easily predictable because of these unobservable mechanisms.
In you Mr example, these mechanisms could be environmental - co-morbidities, contra-indications, emotional state, age related etc.
So I would be cautious about relying on data driven inferencing alone.

Следующие

Автовоспроизведение

1.3 - Correlation Does Not Imply Causation and Why