At 17:20, what allows us to write the equality P(y | do(t)) = sum_m[ P(m | do(t)) * P(y | do(m))] ? I understand the intuition, but what is the formal justification?
Isn't it the case that randomization only gives covariate balance in the limit (of infinite samples?). If you have a small # of samples you can still get stratification (i.e. lack of covariate balance) by chance.
Hi Brady ... I suggest another "intuitive" explanation of why and how the RCT works. This explanation would be based on the potential outcomes functions. For an individual u one cannot observe both Y_u(1) and Y_u(0). But suppose there exists another u' such that Y_u' is equal to Y_u (u and u' are equivalent w.r.t to their potential outcome functions). Suppose u is treated and that u' is not. So the observation of Y_u(1) gives us the unobserved value Y_u'(1) and reciproquelly the observed value Y_u'(0) gives us the unobserved value Y_u(0). Indeed this is what the RCT does : each equivalence class (people having the same potential outcome function) is split in two sub-classes of the same cardinal in such a way that there exists a bijection between the two sub-classes. So for each individual u which is treated there exits a peer individual u' that gives the result for the non-treatment (and vice-versa). By the way the ATE can be computed., as illustrated below ... With boolean outcome functions we have at most 4 outcome functions (say F1,F2,F3,F4) since there are two possibilities for Y_u(1) and two for Y_u(0) ... thus at most 4 classes in the population. Example : suppose that the population is made of S = 2.k+2.n = 2(k+n) individuals (for convenience we suppose that the cardinal of each class is even so that it can be divided in two). For 2.k individuals the outcome function is F1 and for the 2.n others it is F2. The ATE is then composed of 2.2.(k+n) = 4.(k+n) = 2.S terms as for each individual u the ITE is the difference of two terms : Yu(1) - Yu(0) The RCT will split the classes F1 and F2 in two sub-classes (treated, not treated). We have then 4 groups: k of F1-treated, k of F1-not treated, n of F2-treated, n of F2-not-treated The ITE for an F1 member is : F1(1) - F1(0) The ITE for an F2 member is : F2(1) - F2(0) Putting within brackets the observed terms the ATE becomes the average of the ITE : ATE =1/(2.(k+n)) ( k. ([F1(1)] - F1(0)) + k.( F1(1) - [F1(0)] ) + n.( F2(1) - [F2(0)] ) + n.( [F2(1)] - F2(0) ) ) Grouping observed terms together and unobserved together: ATE =1/(2.(k+n)) ( k. ( [F1(1)] - [F1(0)] ) + k.(F1(1) - F1(0)) + n.([F2(1)] - [F2(0)] ) + n.( F2(1) - F2(0) ) ) Consider this value H below (which is computed from the S/2 = k+n results of the RCT experience). H = 1/S . (\sum_{u € Treated} Y_u(1)) - (\sum_{u € Not Treated} Y_u(0)) Though k and n are unknown, we can affirm that H is also equal to the ATE expression above where unoberved terms are removed : H = 1/S ( k. ( [F1(1)] - [F1(0)] ) + n.( [F2(1)] - [F2(0)] ) ) But we know that F1(1) - F1(0) = [F1(1)] - [F1(0)] and that F2(1) - F2(0) = [F2(1)] - [F2(0)] So we can affirm that H is the Half of ATE (whatever are k and n). So ATE = 2.H
Randomization works asymptotically. A culture has prevailed that chooses to ignore this subtlety. Say N=2 and you do a coin flip to assign treatment. With what logic we have covariate balance exactly? Nevermind that there is no guaranty of an identical distribution of a covariate between the two groups if the the support is different.
This is a really great course. I have some limited experience in CI and this has been the most well structured introduction I've come across. Really great work. I felt like it would be worth mentioning the flip-side to randomized experiments, given that so many positive attributes have been mentioned. For one paper critiquing randomization, at least in practice, see www.sciencedirect.com/science/article/pii/S0277953617307359 although I believe Pearl has also touched on similar issues as well.
TLDR: "This was Fisher's insight: not that randomization balanced covariates between treatments and controls but that, conditional on the caveat that no post-randomization correlation with covariates occurs, randomization provides the basis for calculating the size of the error. Getting the standard error and associated significance statements right are of the greatest importance; therein lies the virtue of randomization, not that it yields precise estimates through balance."
I appreciate that Fight Club reference at 22:10
Fanastic Brady !! You're a fabulous tutor .. Pat your back for me :)
At 17:20, what allows us to write the equality P(y | do(t)) = sum_m[ P(m | do(t)) * P(y | do(m))] ? I understand the intuition, but what is the formal justification?
Found in section 6.1 of the book
Hi Brady ... Why a subliminal image of Brad Pitt (in Fight Club) at 22:11 ?
I'm confused about the graph on the slide for the unconfounded children criterion -- why is the presence of W2-->M1 not a problem for this criterion?
Because the unconfounded children criterion concerns backdoor paths from T to its children, not from the children to Y.
Isn't it the case that randomization only gives covariate balance in the limit (of infinite samples?). If you have a small # of samples you can still get stratification (i.e. lack of covariate balance) by chance.
are answers to the last questions: no no and Frontdoor Adjustment?
I think the 2nd is yes
Hi Brady ... I suggest another "intuitive" explanation of why and how the RCT works. This explanation would be based on the potential outcomes functions. For an individual u one cannot observe both Y_u(1) and Y_u(0). But suppose there exists another u' such that Y_u' is equal to Y_u (u and u' are equivalent w.r.t to their potential outcome functions). Suppose u is treated and that u' is not. So the observation of Y_u(1) gives us the unobserved value Y_u'(1) and reciproquelly the observed value Y_u'(0) gives us the unobserved value Y_u(0).
Indeed this is what the RCT does : each equivalence class (people having the same potential outcome function) is split in two sub-classes of the same cardinal in such a way that there exists a bijection between the two sub-classes. So for each individual u which is treated there exits a peer individual u' that gives the result for the non-treatment (and vice-versa). By the way the ATE can be computed., as illustrated below ...
With boolean outcome functions we have at most 4 outcome functions (say F1,F2,F3,F4) since there are two possibilities for Y_u(1) and two for Y_u(0) ... thus at most 4 classes in the population.
Example : suppose that the population is made of S = 2.k+2.n = 2(k+n) individuals (for convenience we suppose that the cardinal of each class is even so that it can be divided in two).
For 2.k individuals the outcome function is F1 and for the 2.n others it is F2.
The ATE is then composed of 2.2.(k+n) = 4.(k+n) = 2.S terms as for each individual u the ITE is the difference of two terms : Yu(1) - Yu(0)
The RCT will split the classes F1 and F2 in two sub-classes (treated, not treated). We have then 4 groups:
k of F1-treated, k of F1-not treated,
n of F2-treated, n of F2-not-treated
The ITE for an F1 member is : F1(1) - F1(0)
The ITE for an F2 member is : F2(1) - F2(0)
Putting within brackets the observed terms the ATE becomes the average of the ITE :
ATE =1/(2.(k+n)) ( k. ([F1(1)] - F1(0)) + k.( F1(1) - [F1(0)] ) + n.( F2(1) - [F2(0)] ) + n.( [F2(1)] - F2(0) ) )
Grouping observed terms together and unobserved together:
ATE =1/(2.(k+n)) ( k. ( [F1(1)] - [F1(0)] ) + k.(F1(1) - F1(0)) + n.([F2(1)] - [F2(0)] ) + n.( F2(1) - F2(0) ) )
Consider this value H below (which is computed from the S/2 = k+n results of the RCT experience).
H = 1/S . (\sum_{u € Treated} Y_u(1)) - (\sum_{u € Not Treated} Y_u(0))
Though k and n are unknown, we can affirm that H is also equal to the ATE expression above where unoberved terms are removed :
H = 1/S ( k. ( [F1(1)] - [F1(0)] ) + n.( [F2(1)] - [F2(0)] ) )
But we know that F1(1) - F1(0) = [F1(1)] - [F1(0)] and that F2(1) - F2(0) = [F2(1)] - [F2(0)]
So we can affirm that H is the Half of ATE (whatever are k and n).
So ATE = 2.H
Randomization works asymptotically. A culture has prevailed that chooses to ignore this subtlety. Say N=2 and you do a coin flip to assign treatment. With what logic we have covariate balance exactly? Nevermind that there is no guaranty of an identical distribution of a covariate between the two groups if the the support is different.
I vote for renaming this lecture into "randomization is magic"
This is a really great course. I have some limited experience in CI and this has been the most well structured introduction I've come across. Really great work.
I felt like it would be worth mentioning the flip-side to randomized experiments, given that so many positive attributes have been mentioned.
For one paper critiquing randomization, at least in practice, see www.sciencedirect.com/science/article/pii/S0277953617307359 although I believe Pearl has also touched on similar issues as well.
TLDR: "This was Fisher's insight: not that randomization balanced covariates between treatments and controls but that, conditional on the caveat that no post-randomization correlation with covariates occurs, randomization provides the basis for calculating the size of the error. Getting the standard error and associated significance statements right are of the greatest importance; therein lies the virtue of randomization, not that it yields precise estimates through balance."