I remembered I have read many many materials to try to understand Simpson's paradox and understand when to combine groups. This is THE most clear one I have ever listened to!
Brady, thanks for taking the time to put together a course on causal inference. I have watched the first week's lecture and appreciate the clarity of your explanations and the recommended reading material. I look forward to being part of this course in the coming weeks and months! Once again, Thank you!!!
Thank you for this amazing course. I am a little confused about how the patient condition data in the COVID-27 table is recorded. In the textbook you state: "You have data on the percentage of people who die from COVID-27, given the treatment they were assigned and given their condition *at the time treatment was decided* ." However my understanding from 8:33 is that when treatment is a cause for the condition, a patient who is assigned treatment B needs to wait for a long period to be administered treatment B and as a result many patients who had a mild condition at the time treatment B was decided for them transitioned into a severe condition. Therefore the data in the table for patients assigned treatment B record a high number of severe cases. In other words the 500 out of 550 patients that were both assigned treatment B and also had a sever condition consist of patients who had a severe condition at the time their treatment was decided plus those who transitioned from mild to severe potentially *long after the treatment was decided* . Therefore the exact definition of C=Mild/Severe is not clear to me. Is it 1.the condition at the time treatment was decided or 2. the condition either at the time treatment was administered or right before the patient died before treatment could be administered but after it was decided
This is my question as well. If anything wouldn't we have two conditions one prior to treatment and one at the time of treatment then? It feels oversimplified to "assume treatment B takes a long time" without defining that upfront, that would also demand more data as well. Otherwise we could insert any counterfactual at that point. What if waiting for Treatment B actually made things better. It's like we're just picking reasons to justify the numbers, it makes it more confusing for me. Or is it that the overall point is the reasons don't necessarily matter, just if we use treatment as cause for the condition, we need to use the total percentage as our assumption, the why the numbers look like that are less important?
I found the same discrepancy in the example. Researchers are not changing the condition mid-experiment. And if the mild B users worsen we would see higher mortality rates in that subgroup. The data does not match the causal explanation for the T -> C case
I have the same confusion here. I'm thinking in this scenario, doesn't we introduce a new variable of Waiting time? In this case, the causal graph should incorporate the new variable as a mediator into it, no? like, T->WT->C->O
@@BradyNealCausalInference ruclips.net/video/sxYrzzy3cq8/видео.html&ab_channel=TED-Ed about Simpson's paradox too, but it's better to explain on causal graph. We need to understand, severe state causes treatment, or treatment causes severe state.
Many thanks for all the time & effort putting this course together and making it publicly available! During the viewing of the motivating example, I was wondering if it would be worth taking a minute to introduce (only) the very basic aspects of a causal graph for those unfamiliar with the concept. I suspect there will be many in that boat, since the only prerequisite for the course is basic probability...? I think the motivating example will resonate even better with that tiny bit of background provided to the viewer, as they won't be wondering how to make sense of the graph while you're explaining its ramifications on deciding treatment A vs B. Thanks so much again for your efforts!
Very well explained. one friendly advise tho is to edit mouth click sounds and breathing sounds, for example using adobe Audition is super easy for that. or free software Audacity! Thank you very much!!
thank you Brady for teaching this great course! looking forward to following this and learn more about causal inference. Regarding the two causal diagram, is it possible to represent a mixture of causal explainations? For example, treatment B does have better effects, which will be the reason that doctors tend to assign patients with severe conditions to treatment B. At the same time, due to shortage of treatment B, patients tend to wait more and the condition tends to be worse. In this situation, if we are chossing treatment for a particular patient, we need to consider both factors, and summarize the total effects of treatment B. Does this make sense? It seems to me it's hard to represent this situation using things like causal diagram. Does causal inference have some tools to deal with this situation?
Great question. If I understand you correctly, it sounds like you're saying the DIRECT effect of treatment B (effect once it's taken) is better than the direct effect of treatment A, but the TOTAL effect of treatment A is better then the total effect of treatment B (because of waiting). In fact, the graph in scenario 2 describes this. The direct effect is represented by the causation flowing along the arrow from T to Y (B is better). The total effect is represented by the causation flowing along both T --> C --> Y and T --> Y together (A is better). We will see this "flow of causation" more in depth in week 3. We will see direct effects and indirect effects (in contrast to total effects) much later in the course, when we get to mediation.
Hi Brady, thanks for the video. From your two causality graph scenarios, there seem to be additional variables: scarcity and treatment duration. Should have these variables been included in the graphs? And also, how do we evaluate which causality graph is the more suitable one? Thanks
Neal, I would like to propose an alternative explanation to yours using a scientific realism perspective on causal inference. You said, In scenario 1, treatment has an effect on the condition which has an effect on the probability of the outcome. And in scenario 2: the condition has an effect on the treatment which has an effect on the probability of the outcome. The treatment is the mechanism and the condition, the context or circumstances. I would argue the mechanism is the cause and the condition is the trigger. In scenario one, the outcome is considered in the basis of the mechanism alone and in scenario 2, the outcome is deduced on the basis of the trigger condition of the mechanism. Mechanisms are always triggered in context to create outcomes. Mechanism are causes but their outcomes are always shaped by context. I would also add that some mechanisms may be hidden, meaning there are other unobservable mechanisms - which under experimental conditions (closed systems) can be controlled. In open systems, the outcome may not be easily predictable because of these unobservable mechanisms. In you Mr example, these mechanisms could be environmental - co-morbidities, contra-indications, emotional state, age related etc. So I would be cautious about relying on data driven inferencing alone.
Thank you for the lecture. I have a question: How do we know which is the categorization of the population that is relevant to detect causality in this case? I mean, this same effect of Simpson's paradox could have happened with the two genders, a division by age, race, you name it... Is there a universal way of detecting that there are categories correlating differently from the whole data?
I have a small question: in scenario #1, if the condition C is not a cause of the treatment T, i.e., there is no arrow pointing from C to T, the conclusion for scenario #1 still holds, correct? Meaning that, in scenario #1, Treatment B would still be preferred?
Unfortunately, it isn't that simple. It depends what the source of the (Simpson's paradox) flipping is. The data in that example wouldn't actually be compatible with a causal graph where there is no "unblocked" path between T and Y that goes through C (to account for the flipping). Unfortunately, the definition of an "unblocked path" won't come until week 3 / Chapter 3.
@@BradyNealCausalInference I see, thanks! I should have thought about that the observational data is only compatible with (can be described by) certain causal graphs. look forward to week 3 :))
@@BradyNealCausalInference I have a quick followup question... I just watched 1.5 and was wondering what happens if the causal graph is the one in scenario #2. I think in this case, the causal effect E [ Y | do(t) ] would be the same as the conditional expectation E [ Y | t ] ? Because the do(t) in this case does not break the dependence of C on T, so you will still weigh E [ Y | T , C ] with p ( C | T ) instead of p(C) ?
thank you for the courses! If you can tell me for the situation where "c -> T", "T" is the "prescription of the treatment" or the "subsequent reception of the treatment"? And for the two different definitions of T in this situation, the preference is always A?
It's probably best to make T the same in both scenarios, so "prescription of the treatment." "And for the two different definitions of T in this situation, the preference is always A?" Yes, just because the details of scenario 1 differ from those of scenario 2 (e.g. wait time could be short in scenario 1).
Thank you for your course ! I have a question : Whether scenario 1 and scenario 2 are based on the maximum number of people, regardless of the number of deaths.
I don't quite understand your question. Feel free to re-ask. The number of people that fall in each group is important. It has to be this sort of diagonal imbalance to see Simpson's Paradox.
@@BradyNealCausalInference Thanks for your reply! Sorry for my poor English. My question is that when we consider which treatment we choose, neither Scenario 1 nor Scenario 2 focusing on the number of deaths, only on the number of casual elements which one is more? In Scenario 1, C is the casual of T, C only considers which T has the highest number of people in it. In Scenario 2, T is the casual of C, T only focuses on which C has the highest number of people in it.
@@BradyNealCausalInference so the simpson paradox is to illustrate that because the two sets of data are distributed differently, we cannot compare them directly
Hi Bradly, thank you for this amazing course. I have a follow up question on scenario 2, where the treatment T is the cause of condition T ------- > C. It was discussed that in such cases treatment A is better than treatment B. Would the equations for that case be like the following A == > P(Y|M,A) * P(M|A) + P(Y|S,A) * P(S|A) == > .15 * 1400/1500 + .3 * 100/1500 == > 16% B == > P(Y|M,B) * P(M|B) + P(Y|S,B) * P(S|B) == > .10 * 50/550 + .20 * 500/550 ==> 19% Here S ----> Severe condition , M-----> Mild condition Is this representation correct ?
Why would a doctor be more likely to prescribe treatment B to a patient with severe symptoms, if not because they already think treatment B is more effective? In which case they're begging the question.
Because “more effective” treatment is usually associated with more side effects. So why risk having them if “less effective” treatment might potentially help at first place
Neal, I would like to propose an alternative explanation to yours using a scientific realism perspective on causal inference. You said, In scenario 1, treatment has an effect on the condition which has an effect on the probability of the outcome. And in scenario 2: the condition has an effect on the treatment which has an effect on the probability of the outcome. The treatment is the mechanism and the condition, the context or circumstances. I would argue the mechanism is the cause and the condition is the trigger. In scenario one, the outcome is considered in the basis of the mechanism alone and in scenario 2, the outcome is deduced on the basis of the trigger condition of the mechanism. Mechanisms are always triggered in context to create outcomes. Mechanism are causes but their outcomes are always shaped by context. I would also add that some mechanisms may be hidden, meaning there are other unobservable mechanisms - which under experimental conditions (!closed systems) can be controlled. In open systems, the outcome may not be easily predictable because of these unobservable mechanisms. In you Mr example, these mechanisms could be environmental - co-morbidities, contra-indications, emotional state, age related etc. So I would be cautious about relying on data driven inferencing alone.
I remembered I have read many many materials to try to understand Simpson's paradox and understand when to combine groups. This is THE most clear one I have ever listened to!
Brady, thanks for taking the time to put together a course on causal inference. I have watched the first week's lecture and appreciate the clarity of your explanations and the recommended reading material. I look forward to being part of this course in the coming weeks and months! Once again, Thank you!!!
Welcome to the course! Thanks for the support, Uday!
Thank you for this amazing course. I am a little confused about how the patient condition data in the COVID-27 table is recorded.
In the textbook you state: "You have data on the percentage of people who die from COVID-27,
given the treatment they were assigned and given their condition *at the time treatment was decided* ."
However my understanding from 8:33 is that when treatment is a cause for the condition, a patient who is assigned treatment B needs to wait for a long period to be administered treatment B and as a result many patients who had a mild condition at the time treatment B was decided for them transitioned into a severe condition. Therefore the data in the table for patients assigned treatment B record a high number of severe cases. In other words the 500 out of 550 patients that were both assigned treatment B and also had a sever condition consist of patients who had a severe condition at the time their treatment was decided plus those who transitioned from mild to severe potentially *long after the treatment was decided* .
Therefore the exact definition of C=Mild/Severe is not clear to me. Is it
1.the condition at the time treatment was decided or
2. the condition either at the time treatment was administered or right before the patient died before treatment could be administered but after it was decided
This is my question as well. If anything wouldn't we have two conditions one prior to treatment and one at the time of treatment then? It feels oversimplified to "assume treatment B takes a long time" without defining that upfront, that would also demand more data as well. Otherwise we could insert any counterfactual at that point. What if waiting for Treatment B actually made things better. It's like we're just picking reasons to justify the numbers, it makes it more confusing for me. Or is it that the overall point is the reasons don't necessarily matter, just if we use treatment as cause for the condition, we need to use the total percentage as our assumption, the why the numbers look like that are less important?
I found the same discrepancy in the example. Researchers are not changing the condition mid-experiment. And if the mild B users worsen we would see higher mortality rates in that subgroup. The data does not match the causal explanation for the T -> C case
I have the same confusion here. I'm thinking in this scenario, doesn't we introduce a new variable of Waiting time? In this case, the causal graph should incorporate the new variable as a mediator into it, no? like, T->WT->C->O
Thank you for the presentation! As a lay person, I can capture from the movie the very first concept of causation analysis.
great, clearly explained! I didn't understand that in TED video, but now, with causal graph it's clear!
Glad to hear it! What TED video are you talking about?
@@BradyNealCausalInference ruclips.net/video/sxYrzzy3cq8/видео.html&ab_channel=TED-Ed about Simpson's paradox too, but it's better to explain on causal graph. We need to understand, severe state causes treatment, or treatment causes severe state.
Many thanks for all the time & effort putting this course together and making it publicly available! During the viewing of the motivating example, I was wondering if it would be worth taking a minute to introduce (only) the very basic aspects of a causal graph for those unfamiliar with the concept. I suspect there will be many in that boat, since the only prerequisite for the course is basic probability...? I think the motivating example will resonate even better with that tiny bit of background provided to the viewer, as they won't be wondering how to make sense of the graph while you're explaining its ramifications on deciding treatment A vs B. Thanks so much again for your efforts!
All I mean with A --> B is that A is a cause of B (changing A can result in changes in B). I'll define causal graphs more completely in week 3 :)
Very well explained. one friendly advise tho is to edit mouth click sounds and breathing sounds, for example using adobe Audition is super easy for that. or free software Audacity!
Thank you very much!!
thank you Brady for teaching this great course! looking forward to following this and learn more about causal inference.
Regarding the two causal diagram, is it possible to represent a mixture of causal explainations? For example, treatment B does have better effects, which will be the reason that doctors tend to assign patients with severe conditions to treatment B. At the same time, due to shortage of treatment B, patients tend to wait more and the condition tends to be worse. In this situation, if we are chossing treatment for a particular patient, we need to consider both factors, and summarize the total effects of treatment B. Does this make sense? It seems to me it's hard to represent this situation using things like causal diagram. Does causal inference have some tools to deal with this situation?
Great question. If I understand you correctly, it sounds like you're saying the DIRECT effect of treatment B (effect once it's taken) is better than the direct effect of treatment A, but the TOTAL effect of treatment A is better then the total effect of treatment B (because of waiting). In fact, the graph in scenario 2 describes this. The direct effect is represented by the causation flowing along the arrow from T to Y (B is better). The total effect is represented by the causation flowing along both T --> C --> Y and T --> Y together (A is better).
We will see this "flow of causation" more in depth in week 3. We will see direct effects and indirect effects (in contrast to total effects) much later in the course, when we get to mediation.
@@BradyNealCausalInference very clear explaination, thanks!
Hi Brady, thanks for the video. From your two causality graph scenarios, there seem to be additional variables: scarcity and treatment duration. Should have these variables been included in the graphs? And also, how do we evaluate which causality graph is the more suitable one? Thanks
Good question!
Thank you so much! - from Korea
Neal, I would like to propose an alternative explanation to yours using a scientific realism perspective on causal inference. You said,
In scenario 1, treatment has an effect on the condition which has an effect on the probability of the outcome. And in scenario 2: the condition has an effect on the treatment which has an effect on the probability of the outcome.
The treatment is the mechanism and the condition, the context or circumstances. I would argue the mechanism is the cause and the condition is the trigger. In scenario one, the outcome is considered in the basis of the mechanism alone and in scenario 2, the outcome is deduced on the basis of the trigger condition of the mechanism. Mechanisms are always triggered in context to create outcomes. Mechanism are causes but their outcomes are always shaped by context. I would also add that some mechanisms may be hidden, meaning there are other unobservable mechanisms - which under experimental conditions (closed systems) can be controlled. In open systems, the outcome may not be easily predictable because of these unobservable mechanisms.
In you Mr example, these mechanisms could be environmental - co-morbidities, contra-indications, emotional state, age related etc.
So I would be cautious about relying on data driven inferencing alone.
Thank you for the lecture. I have a question: How do we know which is the categorization of the population that is relevant to detect causality in this case? I mean, this same effect of Simpson's paradox could have happened with the two genders, a division by age, race, you name it... Is there a universal way of detecting that there are categories correlating differently from the whole data?
I have a small question: in scenario #1, if the condition C is not a cause of the treatment T, i.e., there is no arrow pointing from C to T, the conclusion for scenario #1 still holds, correct? Meaning that, in scenario #1, Treatment B would still be preferred?
Unfortunately, it isn't that simple. It depends what the source of the (Simpson's paradox) flipping is. The data in that example wouldn't actually be compatible with a causal graph where there is no "unblocked" path between T and Y that goes through C (to account for the flipping). Unfortunately, the definition of an "unblocked path" won't come until week 3 / Chapter 3.
@@BradyNealCausalInference I see, thanks! I should have thought about that the observational data is only compatible with (can be described by) certain causal graphs. look forward to week 3 :))
@@BradyNealCausalInference I have a quick followup question... I just watched 1.5 and was wondering what happens if the causal graph is the one in scenario #2. I think in this case, the causal effect E [ Y | do(t) ] would be the same as the conditional expectation E [ Y | t ] ? Because the do(t) in this case does not break the dependence of C on T, so you will still weigh E [ Y | T , C ] with p ( C | T ) instead of p(C) ?
@@anonymousdragon8734 You are exactly correct, and it sounds like you understand the correct thing from two different perspectives :)
thank you for the courses! If you can tell me for the situation where "c -> T", "T" is the "prescription of the treatment" or the "subsequent reception of the treatment"? And for the two different definitions of T in this situation, the preference is always A?
It's probably best to make T the same in both scenarios, so "prescription of the treatment."
"And for the two different definitions of T in this situation, the preference is always A?"
Yes, just because the details of scenario 1 differ from those of scenario 2 (e.g. wait time could be short in scenario 1).
Thank you for your course ! I have a question : Whether scenario 1 and scenario 2 are based on the maximum number of people, regardless of the number of deaths.
I don't quite understand your question. Feel free to re-ask. The number of people that fall in each group is important. It has to be this sort of diagonal imbalance to see Simpson's Paradox.
@@BradyNealCausalInference Thanks for your reply! Sorry for my poor English. My question is that when we consider which treatment we choose, neither Scenario 1 nor Scenario 2 focusing on the number of deaths, only on the number of casual elements which one is more? In Scenario 1, C is the casual of T, C only considers which T has the highest number of people in it. In Scenario 2, T is the casual of C, T only focuses on which C has the highest number of people in it.
@@BradyNealCausalInference so the simpson paradox is to illustrate that because the two sets of data are distributed differently, we cannot compare them directly
Hi Bradly, thank you for this amazing course.
I have a follow up question on scenario 2, where the treatment T is the cause of condition T ------- > C. It was discussed that in such cases treatment A is better than treatment B. Would the equations for that case be like the following
A == > P(Y|M,A) * P(M|A) + P(Y|S,A) * P(S|A) == > .15 * 1400/1500 + .3 * 100/1500 == > 16%
B == > P(Y|M,B) * P(M|B) + P(Y|S,B) * P(S|B) == > .10 * 50/550 + .20 * 500/550 ==> 19%
Here S ----> Severe condition , M-----> Mild condition
Is this representation correct ?
Why would a doctor be more likely to prescribe treatment B to a patient with severe symptoms, if not because they already think treatment B is more effective? In which case they're begging the question.
Because “more effective” treatment is usually associated with more side effects. So why risk having them if “less effective” treatment might potentially help at first place
It seems like the wording on this was reversed, right? scenario 1 - condition causes the treatment. scenario 2 - the treatment causes the condition.
Thank you!
Covid-27 got me 😂😂😂
Neal, I would like to propose an alternative explanation to yours using a scientific realism perspective on causal inference. You said,
In scenario 1, treatment has an effect on the condition which has an effect on the probability of the outcome. And in scenario 2: the condition has an effect on the treatment which has an effect on the probability of the outcome.
The treatment is the mechanism and the condition, the context or circumstances. I would argue the mechanism is the cause and the condition is the trigger. In scenario one, the outcome is considered in the basis of the mechanism alone and in scenario 2, the outcome is deduced on the basis of the trigger condition of the mechanism. Mechanisms are always triggered in context to create outcomes. Mechanism are causes but their outcomes are always shaped by context. I would also add that some mechanisms may be hidden, meaning there are other unobservable mechanisms - which under experimental conditions (!closed systems) can be controlled. In open systems, the outcome may not be easily predictable because of these unobservable mechanisms.
In you Mr example, these mechanisms could be environmental - co-morbidities, contra-indications, emotional state, age related etc.
So I would be cautious about relying on data driven inferencing alone.