Hi! Thank you for this interesting tutorial. Is-there a way to interact the endogenous variable with an exogenous variable? I will very appreciate our help.
thank you very much Sebastian for the amazing video. just want to confirm, at the end, if the "e" was significant, then we say there is no endogeneity issue in the model right? and if "e" was not significant, there will be endogeneity issue?
No. If e is significant, it showed that 2SLS pulled endogeneity out of the regression. It does not mean we fixed endogeneity entirely. The idea that any test could show whether or not there is endogeneity is very dangerous. Instead, we need to make a good argument that our instrument was valid.
your video is really helpful. I have a question. if we want to get the return to education for black only uses ivregress command using the same wage2 data. what should be the process of getting that results, and what is a command for it? please respond
Please help me to answer this considering about the regression: reg wage educHat expert at 5.00. I don't understand why we have to run this regression (i.e. what we want to check for this step) Thank you so much!
That is the second stage regression using the "manual" method of 2SLS. I did that to show what's going on "under the hood" of the ivregress command that I use later. You can see the coefficients I got there match up with the ones at 8:00.
As far as I know, you have to run the first-stage regression separately to get the results, if you do need them. See my video on tables in Stata for how to actually export the results.
Hi Sebastian, first of all, thank you so much for making this easy to understand. However, I have a key problem here where my 'e' or the 'residual' gets omitted during the final step of determining endogeneity. Do you have any insights as to why that may be the case?
Hello Sebastian, I have quick question about testing for endogeneity. I have an original regression: Y = X1 + Xi-Xn i.year i.ffi, robust. Now I want to test whether X1 is endogenous or not using the hausman test for exogeneity. X1 is a binary variable, where Y is continuous and i.year and i.ffi are year fixed effects and industry fixed effects respectively. I use the following commands: probit X1 Xi-Xn i.year i.ffi, robust predict shat, pr regress Y X1 Xi-Xn shat i.year i.ffi, robust. Now, shat should be statistically significant in order for X1 to be endogenous. My question is whether I can include the year fixed effects and industry fixed effects in the probit estimation this way? as I have read some contradicting theories about this. shat is only significant if I add these fixed effects in the probit estimation, which is what I actually want
Hello Sebastian,i have a question. Let say i have this model (1) Y1 = X1 + X2 and (2) X1 = Y1 + X3. So in ivreg, should i put (X1=X3) or (X1=Y1 X3) ? or to be more specific, is it ivreg Y1 X2 (X1 = X3) or ivreg Y1 X2 (X1 = Y1 X3)? unfortunately i can use IVREGRESS for Y1 X2 (X1 = X3), but i can 't use IVREGRESS for Y1 X2 (X1 = Y1 X3), why?
@@sebastianwaiecon thank you for your response...it's quite quite confusing actually...i've this data DV= TE, IVs = ROE and TA and at the same time I also noticed that ROE can be measured by TE and DTE which in this case i believe that there will be endogeneity...to solve it, i try to regress ROE=TE+DTE first then i predict ROEhat then i regress it back using this formula TE=ROEhat+TA...(i’m not sure whether it is right or wrong), but then i try to IVREG these equation but i don’t know why the answer is not same as my regression before...i put it like this IVREG TE (ROE=DTE) TA
sir please. when seeing the result, what does R-square and adjusted R squared mean? i mean what is the explanation of R-square = 0.25 for example and adj R square = 0.25. thanks so much
also another question, hypothetically lets say your study was to look at the impact of education on wage and lets say you had measure education as a dummy how willl you tell whether education has an impact on wage?
Is there a way to further control the first stage? I'm doing a regression in which first stage has multiple fixed effects and required me to use a combination of xi: reghdfe. How can I use the predictions from that regression for the second stage without messing up the standard errors?
You can still use FE within ivregress if you use the dummy variable method. I haven't used reghdfe myself, so I can't really comment on that, but you may find some help in the documentation.
Hey! Thank you for this video, it is very helpful. I still have a question, is it possible to do an ivregression 2sls with 4 endogeneous variables and just one instrumental variable. I want to use one instrumental variable for all my endogeneous variables. I hope you can help me out. Thanks :)
Hi, if the coefficient for the explanatory variable is no longer significant after ivreg, does it mean we reject endogeneity? Estat endog suggests no endogeneity as well. Thank you!
Could you, please, help interpret the situation in this case. I check the instrument as you suggest by regressing my explanatory on the instrument + controls, the instrument is significant. When I run ivreg after that, the explanatory variable is no longer significant in the model. When I run the test of endogeneity after ivreg (estat endog), I accept the null that variables are exogenous. I can’t find any examples online interpreting the situation when explanatory loses significance while running ivreg. Thank you!
When you ran the regression with OLS, you found the explanatory variable was significant. However, the estimate could have been biased (perhaps due to omitted variables). Let's assume for now that your instrument was valid. You run the regression using 2SLS and find it's not significant. Again, assuming the instrument was valid, you've now eliminated the bias present in the original regression. What this means is that the OLS result (significant) was misleading and that the variable may not have the effect you thought it did.
Dear Sebastian, Please, help me explain the following. When I regress explanatory on the instrument it is significant at 10%. Explanatory in Ivreg is not significant, but partial R2 = 0.000 AND the test for endogeneity suggests no endogeneity. Thank you! Explanatory in OLS is significant.
Should I run endog, overid, 1ststage tests based on robust model or I run 2sls withot robustness and then after the 3 tests I run the robust regression? thx
Hi Sebastian, I'm Sherry. Sorry to bother you a little time. Could you take time to answer some questions? I have trouble in this. First, I don't know how to indentify which is instrument variable. Second,I have no idea that whether there is a instrument variable when I use 2sls in every model. At last of all, I find it's difficult to use intrusions of 2SLS correctly. Could you introduce me some website or books to study?
what do you mean by ability? because I don't get why you mention ability instead of one of the variables that the data have, did you mean the siblings are no correlated to wage but yes to education, that's why its exogenous to the model. and why did you put exper in the educ regression to see if sibs are significant. thanks and appreciate the video
Ability not being in the dataset is the whole point. Here, ability is an abstract concept of how good a person is at their job. No true numerical measurement of ability exists, so we have to use techniques such as 2SLS to deal with the problems it may cause. Siblings actually is correlated with wage, but, I am arguing, only because of its relationship to education. The reason we put exper in the first stage regression is that it is a control variable. All instruments must appear in the first stage regression, and control variables typically instrument for themselves, as exper is doing here.
That shouldn't happen. Verify that the first stage you are running manually is exactly the same as the one in ivregress. You can use the option "first" to view the first stage regression with ivregress.
SebastianWaiEcon thanks we got it now! the firststage F-stat of the ivregress is now the same of that of the manual first stage regression. However, can you maybe explain why the command "estat firststage" gives us a significantly different F-stat?
estat is giving you the F stat for restricting just the instrumental variables you added in the first stage. This is the test for the relevance of your instruments. The F stat reported in the first stage regression output is the F stat for overall significance. The latter test involves restricting all coefficients to zero, not just the added instruments. If you have just one instrument, you can verify this by squaring the instrument's t stat. If you have multiple instruments, you can verify this by running the test command on the first stage.
Hi prof. A question from me :) It seems I can understand these steps, but cannot interpret the results. So after getting the result for 2SLS, I have to compare the results with the OLS (in my case I use this regression) and if coefficients in both regressions are closer to each other then we decide that the variable(s) is/are endogenous and the results are not correct? or how it should be interpreted and concluded/summed up? Thanks in advance..
If the OLS and 2SLS results are very similar, then you either didn't have endogeneity to begin with or your instruments failed to eliminate the bias. Obviously, these are very different interpretations, and it's up to you to figure out which is more likely.
@@sebastianwaiecon so if the OLS and 2SLS results are very similiar and I have endogeneity, there were another bias? maybe from control variables that I did'nt put in? From these videos, test endogenity should be run for conducting ivregress?
@@popi20101 You need to think about whether your instrument is valid. That is, if your instrument is correlated with an omitted variable. You can potentially solve this by including such variables as controls.
Hi sir I need your help on how to generate instrumental variables according to ivreg2h using STATA. in other words, how to generate instrumental variables from my data because I don't have external instruments. The method developed by (Lewbel, 2012). Please, you help is highly appreciated.
You are my savior! Thank you for a kind explanation and slow demonstration!
This is very beautiful. Incredible explanation and great video. Thank you Sebastian
thanks very professional and straight to the point! you got one more subscriber
Thank you! This helped me so much.
this is lifesaving, thank you
🕺🏿🕺🏿🕺🏿🕺🏿🙌🏿 You are a life save man 👍🏿
Great video and great explanation. Thanks!
Hi! Thank you for this interesting tutorial. Is-there a way to interact the endogenous variable with an exogenous variable? I will very appreciate our help.
thank you very much Sebastian for the amazing video. just want to confirm, at the end, if the "e" was significant, then we say there is no endogeneity issue in the model right? and if "e" was not significant, there will be endogeneity issue?
No. If e is significant, it showed that 2SLS pulled endogeneity out of the regression. It does not mean we fixed endogeneity entirely. The idea that any test could show whether or not there is endogeneity is very dangerous. Instead, we need to make a good argument that our instrument was valid.
your video is really helpful. I have a question. if we want to get the return to education for black only uses ivregress command using the same wage2 data. what should be the process of getting that results, and what is a command for it? please respond
Please help me to answer this considering about the regression: reg wage educHat expert at 5.00. I don't understand why we have to run this regression (i.e. what we want to check for this step) Thank you so much!
That is the second stage regression using the "manual" method of 2SLS. I did that to show what's going on "under the hood" of the ivregress command that I use later. You can see the coefficients I got there match up with the ones at 8:00.
Very good
Thanks for this video
6:00 Ivregress
How can I export both first- and second-stage results from the ivregress to excel? thanks!
As far as I know, you have to run the first-stage regression separately to get the results, if you do need them. See my video on tables in Stata for how to actually export the results.
Hi Sebastian, first of all, thank you so much for making this easy to understand. However, I have a key problem here where my 'e' or the 'residual' gets omitted during the final step of determining endogeneity. Do you have any insights as to why that may be the case?
You need to use the residual from the first stage regression. If you use the residual from the second stage, it will get omitted.
Hello Sebastian,
I have quick question about testing for endogeneity. I have an original regression: Y = X1 + Xi-Xn i.year i.ffi, robust. Now I want to test whether X1 is endogenous or not using the hausman test for exogeneity. X1 is a binary variable, where Y is continuous and i.year and i.ffi are year fixed effects and industry fixed effects respectively. I use the following commands:
probit X1 Xi-Xn i.year i.ffi, robust
predict shat, pr
regress Y X1 Xi-Xn shat i.year i.ffi, robust.
Now, shat should be statistically significant in order for X1 to be endogenous. My question is whether I can include the year fixed effects and industry fixed effects in the probit estimation this way? as I have read some contradicting theories about this. shat is only significant if I add these fixed effects in the probit estimation, which is what I actually want
What is the difference between ivreg and reg3, Please ?
Hi,
Would ivregress still work if the endogenous variable were continuous while the IV were categorical (binary)?
Thanks very much.
Yes, it works exactly the same.
@@sebastianwaiecon Thank you very much. I cannot wait to try it with my data today.
beautifully explained. Thanx
Hello Sebastian,i have a question. Let say i have this model (1) Y1 = X1 + X2 and (2) X1 = Y1 + X3. So in ivreg, should i put (X1=X3) or (X1=Y1 X3) ? or to be more specific, is it ivreg Y1 X2 (X1 = X3) or ivreg Y1 X2 (X1 = Y1 X3)? unfortunately i can use IVREGRESS for Y1 X2 (X1 = X3), but i can 't use IVREGRESS for Y1 X2 (X1 = Y1 X3), why?
In the last command you wrote, you have Y1 as both the dependent variable and an instrument, which you can't do.
@@sebastianwaiecon thank you for your response...it's quite quite confusing actually...i've this data DV= TE, IVs = ROE and TA and at the same time I also noticed that ROE can be measured by TE and DTE which in this case i believe that there will be endogeneity...to solve it, i try to regress
ROE=TE+DTE first then i predict ROEhat then i regress it back using this formula TE=ROEhat+TA...(i’m not sure whether it is right or wrong), but then i try to IVREG these equation but i don’t know why the answer is not same as my regression before...i put it like this IVREG TE (ROE=DTE) TA
sir please. when seeing the result, what does R-square and adjusted R squared mean? i mean what is the explanation of R-square = 0.25 for example and adj R square = 0.25. thanks so much
R-squared would be well-explained in any introductory econometrics or statistics text - probably better than I could do in a youtube comment.
i have aquestion can we use the 2 stage where the endogenity is a dummy variable for example in your case the education
also another question, hypothetically lets say your study was to look at the impact of education on wage and lets say you had measure education as a dummy how willl you tell whether education has an impact on wage?
Please my IV is a dummy variable, how do I go about it?
THANK YOU SO MUCH
Is there a way to further control the first stage? I'm doing a regression in which first stage has multiple fixed effects and required me to use a combination of xi: reghdfe. How can I use the predictions from that regression for the second stage without messing up the standard errors?
You can still use FE within ivregress if you use the dummy variable method. I haven't used reghdfe myself, so I can't really comment on that, but you may find some help in the documentation.
Thank you!!!!!!!
Hey! Thank you for this video, it is very helpful. I still have a question, is it possible to do an ivregression 2sls with 4 endogeneous variables and just one instrumental variable. I want to use one instrumental variable for all my endogeneous variables. I hope you can help me out. Thanks :)
No, you can't do that.
You need at least one instrumental variable for each endogenous variable
Hi, if the coefficient for the explanatory variable is no longer significant after ivreg, does it mean we reject endogeneity? Estat endog suggests no endogeneity as well. Thank you!
No, you can't make that conclusion.
Could you, please, help interpret the situation in this case. I check the instrument as you suggest by regressing my explanatory on the instrument + controls, the instrument is significant. When I run ivreg after that, the explanatory variable is no longer significant in the model. When I run the test of endogeneity after ivreg (estat endog), I accept the null that variables are exogenous. I can’t find any examples online interpreting the situation when explanatory loses significance while running ivreg. Thank you!
When you ran the regression with OLS, you found the explanatory variable was significant. However, the estimate could have been biased (perhaps due to omitted variables). Let's assume for now that your instrument was valid. You run the regression using 2SLS and find it's not significant. Again, assuming the instrument was valid, you've now eliminated the bias present in the original regression. What this means is that the OLS result (significant) was misleading and that the variable may not have the effect you thought it did.
Dear Sebastian, Please, help me explain the following. When I regress explanatory on the instrument it is significant at 10%. Explanatory in Ivreg is not significant, but partial R2 = 0.000 AND the test for endogeneity suggests no endogeneity. Thank you! Explanatory in OLS is significant.
For weak instruments you need to look at the significance of the instrument in the first stage regression.
Should I run endog, overid, 1ststage tests based on robust model or I run 2sls withot robustness and then after the 3 tests I run the robust regression? thx
Do you mean robust standard errors? I generally always use them.
What is the difference btween ivregress and reg3 if we want to run a simultaneous equation model, please ?
I have never used reg3, so I can't help you with that.
Ok thank you
Hi Sebastian, I'm Sherry.
Sorry to bother you a little time.
Could you take time to answer some questions? I have trouble in this. First, I don't know how to indentify which is instrument variable. Second,I have no idea that whether there is a instrument variable when I use 2sls in every model. At last of all, I find it's difficult to use intrusions of 2SLS correctly. Could you introduce me some website or books to study?
My basic book recommendations would be "Mastering 'Metrics" by Angrist and Pischke and "Introductory Econometrics: A Modern Approach" by Wooldridge.
How do we do it manually using a matrix with Stata?
I wouldn't. R or Matlab are better tools for matrix calculations.
Is GMM the other name of this test?
what do you mean by ability? because I don't get why you mention ability instead of one of the variables that the data have, did you mean the siblings are no correlated to wage but yes to education, that's why its exogenous to the model. and why did you put exper in the educ regression to see if sibs are significant. thanks and appreciate the video
Ability not being in the dataset is the whole point. Here, ability is an abstract concept of how good a person is at their job. No true numerical measurement of ability exists, so we have to use techniques such as 2SLS to deal with the problems it may cause. Siblings actually is correlated with wage, but, I am arguing, only because of its relationship to education. The reason we put exper in the first stage regression is that it is a control variable. All instruments must appear in the first stage regression, and control variables typically instrument for themselves, as exper is doing here.
thanks, great help
Hi, The F stat of the first stage manually is not the same as the estat firststage regress after ivregress 2sls, can you please advise??
That shouldn't happen. Verify that the first stage you are running manually is exactly the same as the one in ivregress. You can use the option "first" to view the first stage regression with ivregress.
SebastianWaiEcon thanks we got it now! the firststage F-stat of the ivregress is now the same of that of the manual first stage regression. However, can you maybe explain why the command "estat firststage" gives us a significantly different F-stat?
estat is giving you the F stat for restricting just the instrumental variables you added in the first stage. This is the test for the relevance of your instruments. The F stat reported in the first stage regression output is the F stat for overall significance. The latter test involves restricting all coefficients to zero, not just the added instruments. If you have just one instrument, you can verify this by squaring the instrument's t stat. If you have multiple instruments, you can verify this by running the test command on the first stage.
If I’m working with panel data and I use xt before all the commands will I get appropriate results?
The ivregress command does allow you to use factor variables (i-dot structure), if you want to do fixed effects that way.
Hi prof. A question from me :) It seems I can understand these steps, but cannot interpret the results. So after getting the result for 2SLS, I have to compare the results with the OLS (in my case I use this regression) and if coefficients in both regressions are closer to each other then we decide that the variable(s) is/are endogenous and the results are not correct? or how it should be interpreted and concluded/summed up? Thanks in advance..
If the OLS and 2SLS results are very similar, then you either didn't have endogeneity to begin with or your instruments failed to eliminate the bias. Obviously, these are very different interpretations, and it's up to you to figure out which is more likely.
@@sebastianwaiecon so if the OLS and 2SLS results are very similiar and I have endogeneity, there were another bias? maybe from control variables that I did'nt put in?
From these videos, test endogenity should be run for conducting ivregress?
@@popi20101 You need to think about whether your instrument is valid. That is, if your instrument is correlated with an omitted variable. You can potentially solve this by including such variables as controls.
What is different between ivregress and ivreg2?
They should give the same results for 2SLS, as far as I know.
@@sebastianwaiecon ok thank you, I have tried both of them
how can I save my results in excel? or someway more clean?
See my video on estout tables.
Hi sir
I need your help on how to generate instrumental variables according to ivreg2h using STATA. in other words, how to generate instrumental variables from my data because I don't have external instruments. The method developed by (Lewbel, 2012). Please, you help is highly appreciated.
I'm not familiar with that command.
Thanks very much for your quick response
Does this work for panel datasets as well?
Yes, you can do 2SLS with panels.
Can u post the link to the wage2.dta?
It is one of the datasets associated with the Wooldridge textbook. You should be able to find it with Google.
qcpages.qc.cuny.edu/~rvesselinov/statafiles.html