An intuitive introduction to Difference-in-Differences

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024
  • Difference-in-Differences is one of the most widely applied methods for estimating causal effects of programs when the program was not implemented as a randomized controlled trial.
    In this video I describe the situations where the method is applicable and give you the intuition behind it. I also explain how and why you might want to use regression to estimate diff-in-diff effects. Throughout, I talk about the key assumption required for the diff-in-diff estimate to be valid.
    Intended audience: Folks who have had some exposure to linear regression models, but want to learn more statistical methods.

Комментарии • 106

  • @pedrocolangelo2458
    @pedrocolangelo2458 3 года назад +13

    This is probably one of the best videos on this subject that I've ever seen. Thanks!!

  • @SalehBabazadeh
    @SalehBabazadeh 9 лет назад +47

    Thank you so much Doug! I just wanted to encourage you for keeping up this great job. your videos are awesome and I believe , they are being used by different people in different field.

    • @dougmckee673
      @dougmckee673  9 лет назад +5

      +Saleh Babazadeh Thanks so much for the kind words! I really should post more of these!

    • @TaroQuispe
      @TaroQuispe 4 года назад +1

      @@dougmckee673 thanks from my side too, very clear and easy to understand. Do consider posting similar vids on regression techniques and similar, cheers!

  • @anuvaagarwal3492
    @anuvaagarwal3492 4 года назад +5

    One of the best and most lucid explanations of the DID method. Thank you for this, Doug. Especially how you explain the intuition behind how the calculation of the DID estimate done by hand is same as that estimated by the regression model. And the part where you elaborate on the simple benefits of using a regression for a DID model, is great.
    Really appreciate it that you having shared your understanding here.

  • @monicamu8013
    @monicamu8013 4 года назад +2

    When I watched the video for the first time, I was totally lost. During the second time, I took pauses in between to allow myself take more time to understand your super intelligent and super long sentences. It is so much clearer now. Thank you so much!

  • @sharonie
    @sharonie 3 года назад +1

    Best Diff-in-diff course I have learned. Thanks!

  • @lawrencecobb2107
    @lawrencecobb2107 2 года назад +1

    This is such a clear and helpful video. I’m taking an exam in an hour and doing last minute double checks. This makes me feel more confident, thank you

  • @bl.l1506
    @bl.l1506 4 года назад +1

    Your videos have been vital for understanding the contents of my statistics course for me! So far, I've supplemented every new concept with your videos. Sometimes, I even watch your video first and then do the readings. Please keep doing these videos!

  • @dianaadamczyk5273
    @dianaadamczyk5273 6 лет назад +1

    Can't tell you how useful your videos are. Thanks for passing on the knowledge!

  • @brothermalcolm
    @brothermalcolm 3 года назад

    Absolutely brilliant tutorial, first result returned, wish youtube was always this helpful!

  • @zaraazami4936
    @zaraazami4936 8 лет назад +4

    Thank you so much! This video was waaay much helpful than reading pages and pages on DD! Very clear and to the point! Thank you!!

  • @xb2856
    @xb2856 Год назад

    way more intuative than previously thought, well put thanks

  • @Itachi0567
    @Itachi0567 4 года назад +1

    thanks a lot for this clear explanation, you dont know how much it helped me

  • @Josefk40
    @Josefk40 8 лет назад +1

    Excellent explanation in 12 minutes. Thank you

  • @Non-disjunction
    @Non-disjunction 3 года назад

    You are such a legend mister McKee

  • @techierealestate
    @techierealestate 5 лет назад +1

    Clear and right to the point. I always wondered why the multiplication coefficient is the DD coeff, Now I know :D

  • @marben7062
    @marben7062 8 лет назад +1

    Thank you very much Doug.
    It helped me to analyse my data (pooled cross section).

  • @thefadingmoonlight
    @thefadingmoonlight 8 лет назад

    Thank you so much for uploading this! I had looked online at DID and was confused. This made it so easy to understand and apply.

  • @kevinvandenbrink8214
    @kevinvandenbrink8214 9 лет назад +3

    Thanks for the video, really helped me in my finance research. Just one thing when you talk about the dummy variable Dtr, I think it takes 1 if the person is in the treatment group and 0 if the person is the control group.

    • @dougmckee673
      @dougmckee673  9 лет назад +1

      Kevin van den Brink You're exactly right--When (if) I re-record this I'll fix that. Thanks!

  • @huekim589
    @huekim589 2 года назад

    Very good and funny videos bring a great sense of entertainment!

  • @digray6732
    @digray6732 2 года назад

    Thank you for this! I didn't quite understand the very last point, i.e. the difference between the points made for when DD is 'ok' (appropriate) and 'not ok'

  • @anglofranses8205
    @anglofranses8205 3 года назад

    This is pure gold. Thanks!

  • @yading9202
    @yading9202 5 лет назад

    Very clear, easy to understand. Great job!

  • @GradualReportSerbia
    @GradualReportSerbia 4 года назад +1

    Abrupt ending, good video

  • @sembilanbereguler2602
    @sembilanbereguler2602 9 лет назад +4

    Based on regression result (at 8:59), what is criteria to reject null hypothesis (to say that the effect of lunch program is statistically significant)?

  • @leopan54321
    @leopan54321 2 года назад

    Dude. This saved me thanks :)

  • @VikramSingh-sf1ev
    @VikramSingh-sf1ev 3 года назад

    Very clear to the point

  • @zhouchen7682
    @zhouchen7682 8 лет назад

    Very useful, wait for more.

  • @braddoremus588
    @braddoremus588 6 лет назад

    Thank you - very good explanation. Helped clear a lot up for me.

  • @rheabanerjee4938
    @rheabanerjee4938 5 лет назад

    I wish you would post more, you're great!

  • @wisuraweerathunga2188
    @wisuraweerathunga2188 4 года назад

    Thanks for this one ! You made it clear !

  • @thej1091
    @thej1091 2 года назад

    Thank you kind sir! :)

  • @tuhinurrahmanchowdhury9705
    @tuhinurrahmanchowdhury9705 3 года назад

    Great video. It saved me!

  • @DavidLihm
    @DavidLihm 8 лет назад +6

    Thank you so much, this has been really useful!

  • @Non-disjunction
    @Non-disjunction 3 года назад

    Amazing video

  • @hd81504
    @hd81504 7 лет назад +1

    First off, thanks for the great video, Doug! I have a follow-up question to one of the comments below:
    One person commented:
    So do I understand correctly an extension of the model for 3 treatment groups and 1 control with pre and post could look the following:
    y = β0 + β1 * Dpost + β2 * Dtr1 + β3 * Dtr2 + β4 * Dtr3 + β5 * Dpost * Dtr1 + β6 * Dpost * Dtr2 + β7 * Dpost * Dtr3 + β8 * X
    β5: DiD effect for Treatment 1
    β6: DiD effect for Treatment 2
    β7: DiD effect for Treatment 3
    And you replied that is correct.
    So my question is can you do this same procedure in logistic regression when your dependent variable is dichotomous (e.g., disease vs. no disease)?

    • @dougmckee673
      @dougmckee673  7 лет назад

      Interpreting coefficients on interaction terms in nonlinear models (like logistic) is tricky. If it were me, I would just estimate a linear probability model, but there's a much longer (and better) answer here: stats.stackexchange.com/questions/89513/difference-in-differences-estimator-for-logistic-regressions

    • @lemoncobra2563
      @lemoncobra2563 5 лет назад

      To respond to doug, I want to use a word of caution on using LPM is that you can have unbounded probabilities and your errors will be heteroskedastic. The latter can be fixed by an extra option but the former as a fundamental issue within the estimator itself.
      I would argue the point of using DiD is to examine the magnitude of change from a program, etc and with a logit regression you will get your coefficients, calculate the margins, and use the margins to calculate a probability that the DD had on your dependent variable. You're kind of muddling the point of using a logit in this regard but it still works. Kind of loses some explanatory power and loses the charm. Still doable though.

  • @xingu7561
    @xingu7561 5 лет назад

    It is really helpful!This vedio is easy to understand for new learners like me!I really appreciate your help!If i can survive from my phd program,i hope i can make vedios like this in the future!

  • @saraly2
    @saraly2 2 года назад

    Thank you!

  • @linearseller2835
    @linearseller2835 8 лет назад +1

    What a great video. I did miss conclusions about the example, though. Beta3 is 30, but it has a p-value equal to 0.228. Can we conclude that this free lunch plan didn't have a statistical relevance (at 95%), right? Those 30 points could have been by chance, right?

    • @dougmckee673
      @dougmckee673  8 лет назад

      +Linear Seller Absolutely correct and not that surprising given there were only 10 observations in this sample.

  • @oldtree700
    @oldtree700 7 лет назад +1

    Hi, Doug! Thank you so much for your great video. I have a quick question. At the end of the video you mentioned the example for the case where DiD is not ok. If the free lunch program has been implemented already in the control group, is there anyway I can still use it as a control group? Semiparametric DiD can be used?

  • @vedantss
    @vedantss 2 года назад

    Very useful!

  • @rohangopalakrishnan7417
    @rohangopalakrishnan7417 3 года назад

    Big from you Doug

  • @fritzlouw8434
    @fritzlouw8434 8 лет назад

    Much appreciated. Keep it up man!

  • @Dniem
    @Dniem 3 года назад

    Hello Professor Armstrong!

  • @homayoungerami4176
    @homayoungerami4176 4 года назад

    thanks, it was easy to digest

  • @Run4un
    @Run4un 4 месяца назад

    In this EX, are y-scores the post-scores or the pre-post differences? I`m guessing just post scores? Thanks for clarifying!

  • @inferno9004
    @inferno9004 8 лет назад +1

    IGreat video Doug !!!
    if there is just have 1 treatment and control group with pre vs post time data and we want to include many control variables , say 5, how do we fit a model with 5 control variables ? What does the regression equation look like ?

    • @dougmckee673
      @dougmckee673  8 лет назад

      +inferno9004 It looks just like the regression model shown in the video with the addition of your control variables.

  • @libbyalthea3061
    @libbyalthea3061 7 лет назад +1

    Hello! Thank you for a great video! Do you any advice for estimating necessary sample size before implementing treatment? Thanks!

  • @josephdover6822
    @josephdover6822 8 лет назад +1

    Hi Doug!
    Thank you so much for your video
    I just wanted to ask you a small question:
    I am also planning to use the difference in differences model. I am looking at the impact of the EURO (introduiced in 1998 and in circulation in 2002) on trade flows between countries in Europe and I am new to STATA hence I am not too sure how to proceed.
    I did the following regression
    regress Tradeflow Governmenteffectiveness1 Unemployment1 GDPpercapita1 Populationsize1 Governmenteffectiveness2 Unemployment2 GDPpercapita2 Populationsize2 Distance1-2
    But I am not sure what I should do next?
    Any help would be very much appreciated! :)
    Best,
    Joseph

    • @dougmckee673
      @dougmckee673  8 лет назад

      +joseph dover To apply a difference in difference, you'll need to divide your trade flows into some set that might be affected by the introduction of the Euro (treatment) and another set that definitely would not be (control). You will also need to reshape your data so you have observations of each trade flow before and after the Euro was introduced. Then you should be able to apply the regression method shown in the video. Good luck!

  • @Ytremz
    @Ytremz 8 лет назад +2

    Brilliant

  • @emeraldwei6672
    @emeraldwei6672 Год назад

    Thank you! I would like to know, if there isn't a comparable group, like Rio, then how can one figure out the effect of this programme?

  • @tarpinianmt
    @tarpinianmt 9 лет назад +1

    Thank you so much for this, I had never heard of difference in differences until a reading I had for economic development. I'm actually planning to reference this video in a paper; do you have anything you'd want me to include for a citation?
    Thanks again.

    • @dougmckee673
      @dougmckee673  9 лет назад +2

      Matthew Tarpinian I'm really glad you've found the video helpful, but it's probably not appropriate for a citation in your paper. If you want a good reference for the method, I suggest using Angrist and Pischke's _Mostly Harmless Econometrics_ instead.

  • @alfonsoga95
    @alfonsoga95 5 лет назад +1

    Thanks, I have one question though, what's the name of the program you're using for the regression? I'm not familiar with it, I find it quite practical

    • @oyvsni6679
      @oyvsni6679 4 года назад +1

      Doug is using Stata

  • @bright1402
    @bright1402 5 лет назад +1

    Thank you for your video! But at the time 8:06, what is the difference between \beta_0 and \epsilon?

    • @jotaeleoh
      @jotaeleoh 5 лет назад +1

      Beta_0 is the effect or value of outcome "y" (not including the rest of the variables). Epsilon is the error term which basically contains all other components of "y".

  • @sarapluviano410
    @sarapluviano410 7 лет назад

    Hi, thanks for the video. In the beginning you say that DID is useful for estimating causal effects of programs when the program is not implemented as a randomized controlled trial. So, in a randomized controlled trial DID are not necessary? Thanks!

  • @johndupont8596
    @johndupont8596 8 лет назад +1

    Hi Doug
    Thanks a lot for the video! I just have a question. I want to conduct a different in Differences module on STATA between students that received maths lessons and those that didn't . I would like to test when having extra maths lesson help student achieve higher marks.
    My variables are: "StudentID" "TIME" "MATHS_LESSON" "MARKS"
    But the problem I have is that not every students have received maths lessons over the period of time and I would like to create 2 groups one "maths_lesson" one "Nomaths_lesson" by adding them to the variable column "StudentID". How should I proceed?
    Let me recap: I am now trying to obtain is a graph with "time" on the x axis and "marks" on the y axis with two line (one for the group of students who took maths classes and the one for the group that didn't) but I am struggling a bit to achieve this.
    Hope I am clear in describing my problem!
    Best regards,
    John

    • @dougmckee673
      @dougmckee673  8 лет назад +1

      +John Dupont Using your TIME variable, you should divide your observations into "before" and "after" groups. You've already divided your students into those that got the treatment (MATHS_LESSON) and those that didn't. Once you have that, you can compute means of the four cells and subtract them to get the DD estimate. I advise first understanding your data and computing the required numbers before worrying about communicating those numbers with a graph. Hope this helps!

  • @zeinebouni8764
    @zeinebouni8764 8 лет назад +1

    Hi Mr Doug,
    Thank you for this interesting Video.
    Is it possible to do DID with ordinal Outcomes? My variables: Rating Firms (Y), D1 (D1== Treated simple; 0 Control Sample); D2 (D2==1 if after treatment; 0 Before).
    I didn't found any examples to know if is it possible and to see how we can interprete the estimators.
    Your response is very important for me.
    Thank you.

    • @dougmckee673
      @dougmckee673  8 лет назад +1

      +Zeineb Ouni I haven't seen it done, but you I believe you could estimate an ordered logit model (ologit) with the same covariates shown above (D1, D2, and D1*D2 in your case). You have to be careful with interpreting interactions in the ordered logit, but I think the basic idea is valid.

    • @zeinebouni8764
      @zeinebouni8764 8 лет назад +1

      +Doug McKee Thank you so much.

  • @shubrathak.p.7198
    @shubrathak.p.7198 8 лет назад +1

    Hi Doug. Please help me! Can I use DID if my data does not follow the assumption of normality? If not..is there a non-parametric DID?!

    • @dougmckee673
      @dougmckee673  8 лет назад +2

      If you have a large enough number of observations (at *least* 25, and I'd feel comfortable over 100), then your outcome doesn't need to be normal--The Central Limit Theorem says your estimate of the treatment effect will be approximately normal.
      I believe there are nonparametric DiD-like methods when you have a continuous treatment and you believe the effect is nonlinear, but I don't know much about them.

    • @shubrathak.p.7198
      @shubrathak.p.7198 8 лет назад +1

      Thank you Doug!

  • @GoonieFridkin
    @GoonieFridkin 8 лет назад +1

    Hi. Thanks so much for this! Quick question though. I've just run a DD regression on my data. The DD beta score isn't significant, but the group (test vs control) beta is. What does this mean?

    • @dougmckee673
      @dougmckee673  8 лет назад +1

      The insignificant DD beta means there is no significant effect of the treatment. The significant group beta means you have significant pre-treatment differences between the groups.

  • @bright1402
    @bright1402 5 лет назад

    Thank you so much for your video! But in the last slide, I could not understand the Not OK case...

  • @hassanmurtzakhan
    @hassanmurtzakhan 9 лет назад +1

    I am trying to run this through STATA and its omitted Beta3 because of multicolinearity between variables can you guide me how to handle it.
    Thanks

    • @dougmckee673
      @dougmckee673  9 лет назад +2

      Hassan Murtza Khan I don't usually answer Stata questions on RUclips, but I'll make an exception just this once. :) There are two possibilities. The first is that you don't have observations for each group (treatment and control) in both the before and after periods. Tabulate your treatment dummy and your control dummy and make sure all four cells have observations. The second possibility is that you made a mistake constructing the interaction variable. Check this by tabulating the interaction with each of the dummies to make sure the result makes sense.
      Now your job is to try these and report back so everyone can learn!

  • @ec.juanfranulcuangolee3294
    @ec.juanfranulcuangolee3294 4 года назад

    Any impact evaluation it is supossed to be started #Building the #DataBase.. then the methodoly as DID must be analized..isn't???

  • @JM-fr9bc
    @JM-fr9bc 3 года назад

    What are the assumptions of dif in dif?

  • @tjahangon7286
    @tjahangon7286 9 лет назад +1

    Thank you very much. This video really helps me. What statistic program did you use in this video? Stata?

    • @dougmckee673
      @dougmckee673  9 лет назад +1

      ***** I did use Stata to get some of the numbers shown, but the content is fairly independent of the software in this video. Stata plays a bigger role in some of my other videos.

    • @tjahangon7286
      @tjahangon7286 9 лет назад +1

      Thank you very much.

    • @tjahangon7286
      @tjahangon7286 9 лет назад +2

      Doug McKee May I ask one more question? I am using binary dependent variable (dummy). I have search information in internet and find that it is possible to have a regression model with binary dependent variable (in STATA: .probit and.logit command). In your opinion, can it be also implemented in regression of a DD model (I mean, using command .logit y DTr DPost DTrXDPost)?

    • @dougmckee673
      @dougmckee673  9 лет назад +3

      ***** Short answer: Yes. Longer answer: If you use your binary dependent variable in a linear regression model exactly as shown here, you are estimating a linear probability model. The coefficients can be interpreted as effects on the probability of the dependent variable being one. Most economists would do this. You *could* estimate a logistic model with the same variables on the right hand side, but it is much harder to interpret the magnitude of the coefficient on the interaction.

    • @tjahangon7286
      @tjahangon7286 9 лет назад +1

      Doug McKee Do you mean that if y is a binary dependent variable and:
      1. I use command [regress y DTr DPost DTrXDPost], then I am "estimating a linear probability model. The coefficients can be interpreted as effects on the probability of the dependent variable being one."
      2. I use command [.logit y DTr DPost DTrXDPost], then "it is much harder to interpret the magnitude of the coefficient on the interaction."
      I hope your answer is "yes".

  • @eiinre
    @eiinre 7 лет назад

    Hi Doug, how do I add additional controls (i.e. X) into the model? I am using SPSS to do the DiD. Do I just add the control variable and regard it as an independent variable?

  • @monicabraga4344
    @monicabraga4344 2 года назад

    how did you do it can you share with me , thank you

  • @sembilanbereguler2602
    @sembilanbereguler2602 9 лет назад

    Based on regression result (at 8:59), what is criteria to reject null hypothesis?

  • @ahmedseliem3201
    @ahmedseliem3201 3 года назад

    how to do a difference in difference method using SPSS? need practical steps

  • @lauramendezcarvajal5149
    @lauramendezcarvajal5149 8 лет назад

    Douglas thanks for this amazing video, it helped me so much! I just have a question:
    why (y) has only one test score? I am a little bit confused about the pre-test and post-test information. If I have the test scores before the implementation and the scores after, how do I compute them? Thanks

    • @dougmckee673
      @dougmckee673  8 лет назад +1

      They key is to have (or be able to compute) the average test score of both groups before AND after the intervention.

  • @vegasastras9194
    @vegasastras9194 3 года назад +1

    What is that program 8:17, looks very neat

  • @ursulapulyer916
    @ursulapulyer916 8 лет назад

    thank you!

  • @Nem3siS4o
    @Nem3siS4o 7 лет назад

    Thanks!

  • @brucelee7782
    @brucelee7782 5 лет назад +1

    I didnt get the did effect of 30 from 7:35 somebody help please! 😓

    • @liveybeha
      @liveybeha 4 года назад +1

      I didn't either at first! Remember to average (rather than add) each set of observations before doing the DiD calculation.

  • @chocolateyum678
    @chocolateyum678 6 лет назад

    thank . you!!!!!!!!

  • @matinhewing1
    @matinhewing1 6 лет назад +1

    Who down voted this video? Someone who didn't get a free lunch?

    • @weoweoteo
      @weoweoteo 6 лет назад +1

      lol! this vid was super helpful. especially for my econometrics exam tomorrow xd

  • @brothermalcolm
    @brothermalcolm 3 года назад

    everything made sense until @7:55 help!

  • @joaoluistbarroso6917
    @joaoluistbarroso6917 3 года назад

    Show

  • @sjhoenen
    @sjhoenen 9 лет назад

    Thanks!