Thank you both (and everyone behind) for that video. Strong didactic structure, very very good for starting and to understand. It's a bit sad, that the guy seemed to be very nervous while talking, even though he knows very very good what he is talking about! You did very well, though!
Synthetic controls are pretty much the big brother of difference-in-differences. You can do so much more with SCM that you can't really do with DD. For example.... I'm writing a synthetic control command for Stata, and it uses LASSO or Ridge to automate donor/variable selection, and this method already outperforms classic SCM. I've even gotten it to do staggered implementation as well as placebo inference, and the best thing is that you only need outcome data, you don't need a long list of covariates to measure the counterfactual.
The variables you choose to construct the synthetic group are subjective. Endogeneity, omitted variable bias, the pre-treatment trend you have are all hidden in the process.
@@brotherbig4651 Yeah you're right, the variables we choose are subjective. And you're also right that the pre-treatment regression uses the donor outcomes to predict the outcomes of the actually treated unit. And in fact, the algorithm can also use other covariates, it just doesn't need to. The cross validation procedure, in addition to combating overfitting, also attempts to ensure we have the best out of sample predictors "k" time periods ahead of a point in the training data. Initially, I was super skeptical about the approach when I read about it for Python and R, I pretty much couldn't believe it. Well, I wrote the routine myself for Stata, roughly based off their code, and it works pretty well, even under suboptimal conditions (short pre-intervention periods, 100s or thousands of donors) and that kinda thing
because riders in the same market share drivers. only treatment users had to walk (if they requested express pool), but that would reduce the average trip duration for all pool riders, even the control users who didn't walk. as a result, an a/b test can't detect the treatment effect.
shouldnt the common trend for uber be a proportion? shouldnt the average trip duration be expected to go up by the same fraction in new york and san francisco? How do you decide that the common trend is an absolute change that is matched in San Francisco and New York? Also, why is there a discontinuity in the blue curve? Is that needed to discuss diff in diff? Surely a discontinuity in gradient is already enough? What does it mean even to have an explicit data point at the boundary?
one question i had is why we need to do the counterfactual prediction on the donor pool (similar cities) instead of using the treatment city's own historical data before the treatment to predict the counterfactuals for the period of interest?
Hey Andrea, I think it's mainly because the donor pool could better capture the seasonality/trend/environment changes and makes the counterfactual prediction on the treated unit more accurate (especially for irregular time series). Imagine when Pandemic starts, there is no way for the treatment city to be able to estimate its own counterfactual by using its own historical data (prior to Pandemic). On the other hand, the donor pool are also affected by the pandemic, their weighted post treatment metric/values would be a better counterfactual to the treated unit.
You need both is my understanding. Donor pool data should represent a world where treatment isn't implemented and you find it by modeling prior data of donor pool to best represent the treatment city's. Then you track how that synthetic control performs after treatment started and use that as a baseline to see how the treatment city's behavior differs from it
I guess you can, but comparison of this kind is hardly convincing. Sometimes temporal data make better predictions and sometimes cross-sectional data make better predictions. For example, say I am interested in the effect of tax on investment gains in the US market, I would rather base my estimation on counterfactual derived from JP/EU market etc than historical data.
I would kindly argue that DiD and Synthetic Controls suffer from the same pitfalls as standard statistical controls. When these two methods are employed within observational designs, confounding can be introduced if the two groups of interest are not balanced on key covariates. We employ methods like Counterfactuals (Propensity Score adjustments) as a way to balance or equal the two groups, which then can be analyzed within the eye toward providing supportive or disconfirming evidence. Synthetic controls also can suffer from confounding likely unobserved. Because the confounding is unobserved, you cannot use Propensity methods, and instead must use something more like instrumental variable methods.
I think we forgot to answer the original question of "how did COVID impact our economy"? I'd probably not use Diff-in-diff to answer that but use an event study design. The whole world was impacted by COVID so it's difficult to find an appropriate control. For example what country is matchable to USA that was not impacted by COVID? An event study allows us to predict the counterfactual in this case and then compare with actual. The residual is our effect size.
I have a question that many people may be confused as well: Other than cases where one event being estimated happened in the history, in what else cases do we feel that it is better to use DID than AB testing to estimate an effect?
We usually can't do controlled experiments/AB testing for an intervention; using DD is what we do practically when experiments aren't possible, and they have quite a lot of pitfalls that many economists don't address when writing about their methods. SCM however is the supreme variant of DD, a generalized version of it which offers a principled way to select donors. My variant of SCM explicitly combats overfitting and noise, for example, with machine learning estimators. DD isn't quite as capable of this, yet
This channel and this video is sooo under-rated!
Thank you both (and everyone behind) for that video. Strong didactic structure, very very good for starting and to understand. It's a bit sad, that the guy seemed to be very nervous while talking, even though he knows very very good what he is talking about! You did very well, though!
Fantastic lecture! The introduction to DiD method is really very intuitive. It is one of the best explanations, to my experience.
Oh my god was this useful. Thank you so much for planning it out and recording it! Amazing job.
You're so welcome!
Wouldn't covid be a bad use of DD because it was worldwide? There are limited economies that were unaffected that can be used as a counterfactual
You would use an event study design to measure effects of COVID
fantastic lecture! thanks Yuan and Emma!
Synthetic controls are pretty much the big brother of difference-in-differences. You can do so much more with SCM that you can't really do with DD. For example.... I'm writing a synthetic control command for Stata, and it uses LASSO or Ridge to automate donor/variable selection, and this method already outperforms classic SCM. I've even gotten it to do staggered implementation as well as placebo inference, and the best thing is that you only need outcome data, you don't need a long list of covariates to measure the counterfactual.
It seems you are using endogenous outcome variables on the right hand side of your regression.
The variables you choose to construct the synthetic group are subjective. Endogeneity, omitted variable bias, the pre-treatment trend you have are all hidden in the process.
@@brotherbig4651 Yeah you're right, the variables we choose are subjective. And you're also right that the pre-treatment regression uses the donor outcomes to predict the outcomes of the actually treated unit. And in fact, the algorithm can also use other covariates, it just doesn't need to.
The cross validation procedure, in addition to combating overfitting, also attempts to ensure we have the best out of sample predictors "k" time periods ahead of a point in the training data.
Initially, I was super skeptical about the approach when I read about it for Python and R, I pretty much couldn't believe it. Well, I wrote the routine myself for Stata, roughly based off their code, and it works pretty well, even under suboptimal conditions (short pre-intervention periods, 100s or thousands of donors) and that kinda thing
This video clears a lot of questions in my mind. Thank you!
for the uber case, what is the argument of NOT using A/B test? (or is it just for the example's case) thanks!
because riders in the same market share drivers. only treatment users had to walk (if they requested express pool), but that would reduce the average trip duration for all pool riders, even the control users who didn't walk. as a result, an a/b test can't detect the treatment effect.
shouldnt the common trend for uber be a proportion? shouldnt the average trip duration be expected to go up by the same fraction in new york and san francisco? How do you decide that the common trend is an absolute change that is matched in San Francisco and New York? Also, why is there a discontinuity in the blue curve? Is that needed to discuss diff in diff? Surely a discontinuity in gradient is already enough? What does it mean even to have an explicit data point at the boundary?
one question i had is why we need to do the counterfactual prediction on the donor pool (similar cities) instead of using the treatment city's own historical data before the treatment to predict the counterfactuals for the period of interest?
Hey Andrea, I think it's mainly because the donor pool could better capture the seasonality/trend/environment changes and makes the counterfactual prediction on the treated unit more accurate (especially for irregular time series). Imagine when Pandemic starts, there is no way for the treatment city to be able to estimate its own counterfactual by using its own historical data (prior to Pandemic). On the other hand, the donor pool are also affected by the pandemic, their weighted post treatment metric/values would be a better counterfactual to the treated unit.
You need both is my understanding. Donor pool data should represent a world where treatment isn't implemented and you find it by modeling prior data of donor pool to best represent the treatment city's. Then you track how that synthetic control performs after treatment started and use that as a baseline to see how the treatment city's behavior differs from it
I guess you can, but comparison of this kind is hardly convincing. Sometimes temporal data make better predictions and sometimes cross-sectional data make better predictions. For example, say I am interested in the effect of tax on investment gains in the US market, I would rather base my estimation on counterfactual derived from JP/EU market etc than historical data.
great great video! Thank you guys!
Loved this video! Thank you both! :)
Thanks for your good information
Phenomenon like pandemic that occurs rarely but has large scale effects are explained by Power Law. Self-organizing criticality is what it's called.
I would kindly argue that DiD and Synthetic Controls suffer from the same pitfalls as standard statistical controls. When these two methods are employed within observational designs, confounding can be introduced if the two groups of interest are not balanced on key covariates. We employ methods like Counterfactuals (Propensity Score adjustments) as a way to balance or equal the two groups, which then can be analyzed within the eye toward providing supportive or disconfirming evidence. Synthetic controls also can suffer from confounding likely unobserved. Because the confounding is unobserved, you cannot use Propensity methods, and instead must use something more like instrumental variable methods.
Good Job Guys!!!is it possible you do a vedio on the commands used in SCM?
Hi, Percy! Thanks for your comment. I've added your suggestion to my list of potential content ideas. 😊
I think we forgot to answer the original question of "how did COVID impact our economy"? I'd probably not use Diff-in-diff to answer that but use an event study design. The whole world was impacted by COVID so it's difficult to find an appropriate control. For example what country is matchable to USA that was not impacted by COVID? An event study allows us to predict the counterfactual in this case and then compare with actual. The residual is our effect size.
yeah i think it is unanswered in the video. found your comment when i was looking for answer in comment section
I have a question that many people may be confused as well: Other than cases where one event being estimated happened in the history, in what else cases do we feel that it is better to use DID than AB testing to estimate an effect?
We usually can't do controlled experiments/AB testing for an intervention; using DD is what we do practically when experiments aren't possible, and they have quite a lot of pitfalls that many economists don't address when writing about their methods. SCM however is the supreme variant of DD, a generalized version of it which offers a principled way to select donors. My variant of SCM explicitly combats overfitting and noise, for example, with machine learning estimators. DD isn't quite as capable of this, yet
@@jaredgreathouse3672 Thanks for the reply!
By using synthetic control, we target to meet the common trend assumption as required by Difference in Differences.
amazing thank you!