Aren't we looking at predicting E[Y| (D,Z)] in other words how D and Z jointly influence Y as a first step and then E[D|Z] as a second step? In the slide at 10:52, they predict E[Y|Z] instead of E[Y| (D,Z)] which is a bit confusing as treatment is not controlled and stochastic as well...
E[Y|D,Z] would be the ultimate goal (predicting the outcome as a joint function of treatment and covariates). This is what is done for instance by standard ML methods as presentend in the beginning, but in this case doesn't provide a good estimator of the treatment effect. I think the approach here is similar to multilinear regression where we first regress D on Z, then we obtain a residual which we regress on Y to isolate the effect of D independently of Z. So the question is rather here : why do we regress D-E[D|Z] on Y-E[Y|Z] instead of Y ? In Multilinear Regression, the first step ensures that the residual will not be correlated to Z, so regressing this residual on Y or Y-E[Y|Z] is equivalent. But here, since the model is semilinear (I think, but perhaps also because we use ML methods), there may be some effect of g(Z) on Y correlated to D even if we take the residual D-E[D|Z]. So we need to evaluate Y-E[Y|Z] to approach the real treatment effect.
no, because D isn't binary, D is continuous. D=m(z) + V, where V is a random variable which does not depend on z. In a perfect trial, D=V, so for example, D might be drawn from a Gaussian distribution with sufficient support to make the inferences you wish to make. You could go further and say that in a "perfect" trial, V is a uniform distribution over some sufficiently large domain. I think here, "perfect" just means "not confounded at all"
This presentation was helpful, I appreciate it, Professor.
Appreciate your works, Professor.
Aren't we looking at predicting E[Y| (D,Z)] in other words how D and Z jointly influence Y as a first step and then E[D|Z] as a second step?
In the slide at 10:52, they predict E[Y|Z] instead of E[Y| (D,Z)] which is a bit confusing as treatment is not controlled and stochastic as well...
E[Y|D,Z] would be the ultimate goal (predicting the outcome as a joint function of treatment and covariates). This is what is done for instance by standard ML methods as presentend in the beginning, but in this case doesn't provide a good estimator of the treatment effect. I think the approach here is similar to multilinear regression where we first regress D on Z, then we obtain a residual which we regress on Y to isolate the effect of D independently of Z. So the question is rather here : why do we regress D-E[D|Z] on Y-E[Y|Z] instead of Y ? In Multilinear Regression, the first step ensures that the residual will not be correlated to Z, so regressing this residual on Y or Y-E[Y|Z] is equivalent. But here, since the model is semilinear (I think, but perhaps also because we use ML methods), there may be some effect of g(Z) on Y correlated to D even if we take the residual D-E[D|Z]. So we need to evaluate Y-E[Y|Z] to approach the real treatment effect.
The presentation is awesome. Thank you!
In "perfectly set-up" randomized control trials, m_0 wouldn't vanish, but rather would be a constant value of 0.5 for all values of Z, no? (6:25)
Yes, though one can assume that that constant had been partialed out, which would give zero.
no, because D isn't binary, D is continuous. D=m(z) + V, where V is a random variable which does not depend on z. In a perfect trial, D=V, so for example, D might be drawn from a Gaussian distribution with sufficient support to make the inferences you wish to make. You could go further and say that in a "perfect" trial, V is a uniform distribution over some sufficiently large domain. I think here, "perfect" just means "not confounded at all"
The presentation is very helpful! Thank you!
great talk, wondering where to access the slides.
@@mathieumaticien Hi the slides have already been removed
@@ruizhenmai1194 why?
Most impressive.
Machine learner who worked back in the 30's 🤣
"I resisted to call it ML and I gave up " and Machine learners in 30's :) Hilarious