Multiple Regression - Dummy variables and interactions - example in Excel

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • In this video, I present an example of a multiple regression analysis of website visit duration data using both quantitative and qualitative variables. Variables used include gender, browser, mobile/non-mobile, and years of education. Gender and mobile each require a single dummy variable, while browser requires several dummy variables. I also present models that include interactions between the dummy variables and years of education to analyze intercept effects, slope effects, and fully interacted models. In short, I cover:
    - multiple category qualitative variables
    - dummy variables
    - intercept effects
    - slope effects
    - dummy interactions
    I hope you find it useful! Please let me know if you have any questions!
    --Dr. D.

Комментарии • 142

  • @momcilomracajac2242
    @momcilomracajac2242 6 лет назад

    YOU ARE THE GREATEST PERSON ALIVE!!! I HAVE BEEN SEARCHING FOR HELP ON DUMMY VARIABLES FOR WHAT TODAY WOULD BE THE FOURTH DAY FOR MY PROJECT AND EVEN MY PROFESSOR WAS OF NO HELP! i really appreciate this video...

    • @econ_drd
      @econ_drd  6 лет назад

      I'm glad you found it helpful. ;)

  • @bigermie1
    @bigermie1 9 лет назад +19

    Love when people perform on such a high level where the lay person can understand. Great Job

  • @EGaya90
    @EGaya90 9 лет назад +2

    I'm into 7:02 and I've been nodding since the video began...thank you man! :)

  • @ellarichardson7804
    @ellarichardson7804 8 лет назад

    THANK YOU SO MUCH. I've been so confused with how to do this for ages and now I finally understand it! I couldn't be more grateful.

  • @sobhitc
    @sobhitc 11 лет назад

    You have no idea how much this video helped me to do my thesis work! Thank you so much!

  • @user-qt8ox4fl8h
    @user-qt8ox4fl8h 5 лет назад +1

    Great many thanks Dr Delaney! It would be nice if you discuss two more related issues. 1) Explanation of the coefficients in a regression w/o the intercept term. 2) If we define dummies differently then how do we interpret the coefficients? For example, consider the regression y= a1*D1+a2*D2+a3*D3+a4*D4+u where Ds are dummies for season but defined differently- value of D1 is 1 for all observations, value of D2 is 1 for all observations except Spring, value of D3 is 1 for all observations except Spring and Summer, and value of D4 is 1 only for observations in Winter. Thanks again for allowing questions and discussions.

  • @oul1735
    @oul1735 2 года назад +1

    Thank you very much for the video. It saves my dissertation.

  • @floranshadzhieva7829
    @floranshadzhieva7829 8 лет назад +1

    This is the best explanation that I've found so far. Thank you so much!

  • @econ_drd
    @econ_drd  10 лет назад +4

    Gayathri Ravichandran It's not letting me reply directly to your comment. But the answer is yes, you should be able to check which independent variable contributes more.
    One is to do a series of 8 separate regressions with 1 independent variable in each, and check the R^2. The other is to do 8 separate regressions with all but 1 in each, and check the R^2.
    Finally, you can do the full regression and just see which has the largest coefficient (in magnitude)...this runs into the problem of different scales, so you may want to measure your variables in #'s of standard deviations from the mean value of that variable.
    Caveat: all of this assumes you have enough observations to run all these tests without running into overfitting problems.To safely run 8 regressions here (or 17, maybe), you'll want to make sure that you have at least 17*8*15 = 2040 observations.

  • @urzolarl
    @urzolarl 4 года назад +1

    Thank you! This video was great, it is explained in a way that makes a lot of sense! The book for my business analytics class made it way more complicated!

  • @araratghazarian2354
    @araratghazarian2354 8 лет назад

    I was working on my thesis and these materials were precisely helpful. Thanks!

  • @pascalejacquelinepetit5131
    @pascalejacquelinepetit5131 7 лет назад

    Great explanation for new users and to refresh. Much appreciated!

  • @Andy1311100
    @Andy1311100 10 лет назад

    Thx! You saved me a lot of time to re-take the course of statistics

  • @TDYoung07
    @TDYoung07 5 лет назад

    Great breakdown of multiple regression and helped me greatly with educated forecasting.

  • @saeedrabbanifar7458
    @saeedrabbanifar7458 7 лет назад

    So useful. May need to watch more than once to master but its worth it.

  • @econ_drd
    @econ_drd  10 лет назад +1

    Hi Fang, it's definitely a good idea to run it, and then you can use an F test for a subset of variables to see which model is better. If you search RUclips for F test for subset, you'll see the video that outlines the process.

  • @dermarcopetermp
    @dermarcopetermp 9 лет назад

    Your explanation is awesome.
    It helps me understand interaction a lot
    Thank You!

  • @eagles51593
    @eagles51593 8 лет назад +1

    thank you so much!! best tutorial I've seen by far

  • @noodzie89
    @noodzie89 11 лет назад

    Thank you so much for this video! I was having quite a bit of trouble grasping this concept but now I get it! Great explanation!

  • @twolittlefish
    @twolittlefish 11 лет назад

    Big thanks from Switzerland!

  • @xiaolingsundberg9182
    @xiaolingsundberg9182 2 года назад

    Wow, thank you so much!! I learned so much!

  • @sfs4708
    @sfs4708 10 лет назад +3

    Thank you very much for this tutorial - so intuitive, and guides us directly to what's important. My question is: do you know a similarly intuitive way to run a regression with a dummy dependent variable? I'm trying to analyze survey responses, much of which is discrete data. Thank you!

  • @squirrellover69ify
    @squirrellover69ify 5 лет назад

    I did not understand this at all but within the 1st ten minutes it makes so much sense. Makes me want to go play around with an actual data set. curious if theres anyone with videos on how to do this in R?

  • @econ_drd
    @econ_drd  10 лет назад

    \You can use a MLE method to estimate it directly, or nonlinear least squares (Stata has a "nl" command for just such a purpose) but for Cobb-Douglas, I'm not sure why you'd want to. If you have Q = A * K^a * L^b and you take logs, you get ln(Q) = ln(A) + a * ln(K) + b*ln(L) and you can just regress that in a straightforward fashion and get estimates for your production shares...unless you know the error distribution is wrong...but the ease of this is the whole point of using Cobb-Douglas.

  • @NamTran-jd6lp
    @NamTran-jd6lp 2 года назад

    life savior!!!

  • @121mohitkumar
    @121mohitkumar 3 года назад

    Thank you mate! Really helpful video

  • @imamsuhadisuhadi8195
    @imamsuhadisuhadi8195 2 года назад

    Thank you very much for this video .. Very clear

  • @mukalumasaki4558
    @mukalumasaki4558 8 лет назад

    Thanks a lot for this video! Very clear and explicit. Great job. :)

  • @econ_drd
    @econ_drd  11 лет назад +1

    Hi Katie,
    For qualitative variables, you want to use a dummy variable. I have several videos on the topic. I hope that helps!
    Best regards,
    Dr. D.

  • @emerekek
    @emerekek 5 лет назад

    Thanks a lot Dr D. Very insightful

  • @reidclanton4081
    @reidclanton4081 4 года назад +1

    17.37 is interactions for those who want to know

  • @alexablanc6647
    @alexablanc6647 3 года назад +1

    Hi Jason,
    I watched your RUclips video about using dummy variables with the regression tool in excel. I studied math in college so I was really excited about it.
    I’m trying to use it to forecast sales and I set it up where I had my Y values as previous sales and my X values as weeks 1 to 52, where it would be a 1 if it matched the sales week and 0 otherwise. I also included holidays like Easter x week, 4th of July x week etc.
    It gave me an error that I can only have 16 columns used in the X values, so I tried it with just 16 weeks and the p values were really big. I’m wondering if you know of another way I can do this to include the seasonality from the weeks and the impact of the holidays.
    Thanks so much!

  • @ajaxvi
    @ajaxvi 4 года назад +1

    Brilliant!

  • @nialldevine1156
    @nialldevine1156 9 лет назад

    ridiculously helpful video, thank you

  • @solog10
    @solog10 11 лет назад

    Excellent video. Very helpful.

  • @natyelisassis
    @natyelisassis 11 лет назад

    WOW! GREAT video, and you have my respect Jason! You are GOOD! =)

  • @lamaung
    @lamaung 11 лет назад

    Thanks for your wonderful explanation !!!

  • @dutchboybmx
    @dutchboybmx 5 лет назад

    Thank you so much!

  • @econ_drd
    @econ_drd  11 лет назад +3

    Thanks! I'm glad it was helpful!

  • @maimunahjohari9229
    @maimunahjohari9229 8 лет назад

    Thanks a lot Dr Delaney, really helpful!

  • @nletizio
    @nletizio 9 лет назад

    Excellent video, thanks!

  • @kathleentolentino5077
    @kathleentolentino5077 8 лет назад

    A really great tool! Thank you!

  • @manhoor1200
    @manhoor1200 5 лет назад

    thank you for this video

  • @Tattenlieve
    @Tattenlieve 10 лет назад +1

    Brilliant video - explained really well !! You mentioned in passing that another one of your series explained some of the theory behind dummy variables. I'm interested in how contrasts can be specified, say whether there is a significant difference between each of the browsers with each other and not just with reference to Firefox as per your example? Thanks again

  • @user-hu7ov6fi9y
    @user-hu7ov6fi9y 5 лет назад +1

    Thanks for sharing this brilliant video online. I would like to know if I want to calculate the coefficients of Firefox as independent variable, which browser should be excluded as a dummy variables? Many thanks

  • @md.sakilmahmud4751
    @md.sakilmahmud4751 8 лет назад

    It's very good video .Thanks for help .

  • @jrippee05
    @jrippee05 7 лет назад

    Good video.
    You should have posted the data set so we could follow along.
    Thanks.

  • @econ_drd
    @econ_drd  11 лет назад

    Hi Kirstin,
    You would want to use a dummy variable. If you search youtube for "dummy variable" you should find a few videos (some of which are mine). Good luck!
    --Dr. D.

  • @ndubuisimachebe764
    @ndubuisimachebe764 9 лет назад

    Thanks. The video has been very helpful!!

  • @ASOT666
    @ASOT666 8 лет назад

    Great video !

  • @Crimau12000
    @Crimau12000 9 лет назад

    Thank you very much. It has been very helpful

  • @econ_drd
    @econ_drd  11 лет назад

    Hi TheMasterkyle79. Fair enough. I recommend the video on interpreting models which may help clear things up. Good luck and let me know if I can help at all!

  • @somjitbanerjee3003
    @somjitbanerjee3003 2 года назад

    Thank you!

  • @yvesliao6004
    @yvesliao6004 5 лет назад

    Thank you very much! It's really helpful! But I wonder if we can get the cofficients without a dependent variable and only with two independent dummies in the equation . And how do we apply constraints on the equation? Like for example, we want to examine how much of y is resulting from the factor b, and much of it is a result of factor c, we have a time series of y and the equation: Y=a+b1*d1+b2*d2+...+b50*d50+c1*e1+c2*e2+...+c34*e34, d and c are the dummy variables. The condition is the sum of the weighted b1~b50=0 and the sum of weighted c1~c34=0. In this case, how can we get the series of b1~b50 and c1~c34?

  • @RanaSharif
    @RanaSharif 11 лет назад

    cab interactions be between two dummy variables like in example female has mobile, and how we can write the equation

  • @martaarteaga5950
    @martaarteaga5950 10 лет назад

    Great video it is very helpful!

  • @limfangwen1102
    @limfangwen1102 11 лет назад

    Hi Dr D,
    This is a great video! Can I ask, for the last example of everything, if we found that some variables are statistically significant, and others are not, is it a good idea to run another regression analysis of only those significant variables?
    Kind Regards,
    Fang

  • @williamdonovan3679
    @williamdonovan3679 11 лет назад

    Dummy variables are important because computer simulations of a complex physical process are not forecasting the results observed and therefore cannot be used to optimize the process

  • @limfangwen1102
    @limfangwen1102 10 лет назад

    Hi Dr. D. For your example you explained interactions with a quantitative and a dummy variable, so what I understand is that the reference (Firefox or Male) is always omitted. Does this apply to interactions of 2 dummy variables? For instance, I would like to investigate if there is an interaction between Gender and Browser, so for my interactions, will Firefox and male be omitted?
    Regards,
    Fang

    • @econ_drd
      @econ_drd  10 лет назад +1

      Dummy variables are just a way to account for every possible combination, to allow for a full complement of different intercepts, for example. In the case you mention, Gender (G) and Browser (B), if you want full interactions, you can see that you could have:
      G B = 0 0 (Male, Firefox)
      G B = 0 1 (Male, Chrome)
      G B = 1 0 (Female, Firefox)
      G B = 1 1 (Female, Chrome)
      If you had a third browser, say IE, you'd need to add another dummy just because there are more than 4 combinations, and 2 binary variables can only give you 4 combinations. Lets say we had B1 = 1 if Chrome, B2 = 1 if IE:
      G B1 B2 = 0 0 0 (Male, Firefox)
      G B1 B2 = 0 1 0 (M, Chrome)
      G B1 B2 = 0 0 1 (M , IE)
      G B1 B2 = 1 0 0 (Female , Firefox)
      G B1 B2 = 1 1 0 (F , Chrome)
      G B1 B2 = 1 0 1 (F , IE)
      You can see that we never use 011 or 111, because that would imply Chrome AND IE, which are mutually exclusive by assumption. In principle, though, you should let your intuition help you--you just want a different intercept (or slope term depending on your application) for each case.

  • @oluwatoba11
    @oluwatoba11 11 лет назад

    Hey Jason,
    This is excellent and really helpful. Thanks. Moreover, I'd like to ask for more. Could you please do a video on exponential regression with multiple variables? E.g., the Cobb-Douglas function. I am ware you could do a log-linear but is there a way of doing this directly?

  • @eddiele644
    @eddiele644 4 года назад

    So when do we actually interact our variables? Is there a way to see if it is necessary or do we just do it and then see if the coefficient on the interaction term is statistically significant?

  • @matthewthomas4620
    @matthewthomas4620 9 лет назад

    Thank you so much for this video. I have not seen anything else on the web that concisely explains the underlying math, concept, and real world how to.
    Is it possible to do this type of analysis with grouped data? How would you 'weigh' the groups?

  • @abbas8646
    @abbas8646 10 лет назад

    Thank you that is a great video

  • @lionelpipper1992
    @lionelpipper1992 11 лет назад

    Thank you, very helpful!

  • @kathyyue370
    @kathyyue370 11 лет назад

    Thanks for the video! My question is - for the later variables such as Male Female, if you are analyzing just gender, why do you still include the previous variables in the regression table? Does that make a difference? I think you said "holding all else constant"?

  • @YellowDog00
    @YellowDog00 7 лет назад

    Hi Jason,
    First of all thank you for your great video.
    I have a question as to why we need an omitted variable? In your video, you didn't develop a dummy variable for Firefox. May I ask why?

    • @AmanyFaroun
      @AmanyFaroun 7 лет назад

      I need an answer to your question too. How would I know the effect of FireFox?

  • @francescoguerra8917
    @francescoguerra8917 5 лет назад +1

    Hi Dr D.
    It is possible to run a multiple regression if i have all categorical variables (both my independent variables and my dependent variable are categorical, two-level variables)?

  • @5522Katie
    @5522Katie 11 лет назад

    Thank-you for this video, it really helped me in my project. I have a question though: how would you do this analysis if y were qualitative (i.e., y is either yes or no?)

  • @nabinabi7007
    @nabinabi7007 10 лет назад

    Thank you very much, I think that we are able to center only quantitative variable and not dummy variable. Please i ask if you have other videos about RIDGE regression or PARTIAL LEAST SQUARES regression.

  • @anskrenes
    @anskrenes 8 лет назад

    Hi Dr. Delaney, thanks for the video! Would these rules apply for moderation? For instance, if the predictor had many dummy variables, the outcome didn't, and the moderator didn't, would it work the same way? Thank you!

  • @kirstinegan2572
    @kirstinegan2572 11 лет назад +1

    Hey Jason,
    Thank you for this video, it was very helpful! I do have a question though... how would you run a regressional model if your dependent variable was also a categorical variable?
    Thanks!!

  • @stusexton9303
    @stusexton9303 8 лет назад

    Thanks champion.

  • @Andy1311100
    @Andy1311100 10 лет назад

    I have a question about general (non-linear) multiple regression. I understand general MR just needs to change x and y into some functions. But my question is: do I need to change the cross dummy into the same functions as well? Take your data as an example, if I use 1/educ as the new x, for the educ*fem dummy, should it be 1/educ*1 or still educ*1? Thanks.

  • @waleolusi8875
    @waleolusi8875 11 лет назад

    This is really helpful thanks a lot. do you have other videos on working with eviews, and stuffs like that. thanks again

  • @Animate-likenate23
    @Animate-likenate23 10 лет назад +2

    Because both quantitative analysis and statistics use this. This is my second encounter in less than year with this type of problem.

    • @econ_drd
      @econ_drd  10 лет назад

      I'm glad you found it useful! If there's more you're interested in, leave comments!

  • @ShahidKhan-gj6vy
    @ShahidKhan-gj6vy 8 лет назад

    Thanks a lot!

  • @aminurabiuladodo4127
    @aminurabiuladodo4127 10 лет назад

    Thanks for the tutorial
    Please I want to do regression analysis between waiting time in a restaurant and profit made to find out if automated system can reduce the waiting time. What are the datas I need to collect?

    • @econ_drd
      @econ_drd  10 лет назад

      You would need: Waiting time and whether the associated waiting time was using the automated system. You don't even need to use regression if it's just System A v. System B. You can make fewer assumptions and use a 2-sample t-test, or MANY fewer assumptions and use something like a Mann-Whitney (Wilcoxon) test if all you care about is the average, or a two-sample Kolmogorov-Smirnov test if you want the full distributional test.

  • @TritoneTelephone
    @TritoneTelephone 10 лет назад

    Shouldn't the regression equation include the original educ AND browser variables when testing interactions?

  • @dougmisenheimer9289
    @dougmisenheimer9289 10 месяцев назад

    Because I’m in a Managerial Decision Making class and we have some problems to solve. I need some help! It’s a combo of statistics and business calc.

  • @raghavmodi7709
    @raghavmodi7709 3 года назад

    Sir ,
    In case of browser, if we introduce a fourth dummy variable for firefox ( which is against the theory ),then what difference will it make?

  • @lanniu8435
    @lanniu8435 11 лет назад

    thanks so much!

  • @adriazka
    @adriazka 10 лет назад

    Thank you

  • @aaf882010
    @aaf882010 10 лет назад

    hi, it's very helpful :) . Please I want to do regression analysis between the home prices and if it's affected by the bank interest , in addition i have some other variables which will be included , such as Population , wages ... but i want to check the relation between interest and prices ... how can i do that ? thanks a lot

  • @dtumpal6671
    @dtumpal6671 11 лет назад

    After putting the interactions there, I found that one of the main effects became not significant (which previously was significant). How do we interpret this? Thanks in advanced.

  • @outofthebox5226
    @outofthebox5226 10 лет назад

    Sir, I'm having one dependent variable and eight independent variables. can i use regression to see which one of the independent variable contribute more to the dependent variable?

  • @arnabdada07
    @arnabdada07 8 лет назад

    Hi Jason, here years of education is an independent variable right? and if that is the case, then how can we put it in the X range while doing the regression?

    • @econ_drd
      @econ_drd  8 лет назад

      Independent variables all go on the right hand side (i.e. are x's). Dependent variables go on the left (i.e. are y's). If you're concerned about endogeneity (probably not a huge issue in this application), you would want to take a different modeling approach.

  • @kamalbasnet8793
    @kamalbasnet8793 4 года назад

    In a regression model given as,
    logpgp95i = γ0 + γ1avexpri + γ2 lat absti + γ3africa + γ4 asia + γ5 other + νi
    where logpgp95 is GDP per capita of country in 1995, africa = 1 if country i in Africa, asia = 1 if country i in Asia, and other = 1
    if country i is not in Asia, Africa, or the Americas.
    The regression coefficient for dummy africa is -0.9163864. How to interpret this coefficient? If I interpret "As other factors being equal, African countries have 91.6% less GDP per capita than non-African countries", is it the right interpretation?

  • @danielweaver4372
    @danielweaver4372 10 лет назад

    can your dependent variable be categorical? for example if my hypothesis is that males are more likely to use chrome than females. (relationship between gender and browser) both coded categorical variables.

  • @rachaelmorrison5584
    @rachaelmorrison5584 7 лет назад

    Hi there, when I have a dummy variable, a continuous variable and interaction term, does the coefficient of the dummy variable still indicate the results of when it equals 1 (regardless of the continuous variable) unlike the coefficient for the continuous variable, which only represents the values for the continuous variable when dummy =0?

  • @daniaakbar5421
    @daniaakbar5421 6 лет назад

    Thanx for informative video.it really helped me.I have some questiins .I want to fit quadratic model with one categorical and one continues variable including interaction term and squared term.but minitab software did not take the square term of categorical variable.can u plx explain me why is it so?.and my second question is I want to know the theory behind model fitting with categorical variables along with the procedure to estimate regression coefficients. Help me from where I can find the material. Thanks in advance

  • @nabinabi7007
    @nabinabi7007 10 лет назад

    Thanks for the tutorial, but what about multicollinearity? you have variables in interactions, so May be the VIF is more than 10.

    • @econ_drd
      @econ_drd  10 лет назад

      The short answer is get more data. :D
      Yeah, including interaction terms can definitely lead to higher VIFs, but it's generally not something to be concerned about with interactions of dummy variables. If you are concerned, you can recenter the variable, particularly if it's a quantitative variable you're interacting with a dummy. But dealing with collinearity is more concerning with other, ostensibly unrelated variables than with interactions, in which the relationship is explicitly stated.

  • @4620extensa
    @4620extensa 6 лет назад

    Could u tell me how to find correlation between 1500 categorical variables after dummy encoding

  • @sanjayserene
    @sanjayserene 11 лет назад

    but can you please say me, what happen if there are two independent variables(non categorical) and 2 dummy variables? in the above exam there is only one independent variable that is education, but what happen if there would be another one independent variable?

  • @chaopazu4951
    @chaopazu4951 9 лет назад

    Hi Dr D, I am wondering whether I can look at the interaction between 2 dummy variables? Thanks,.

  • @andrewmiller9441
    @andrewmiller9441 8 лет назад

    Hi, I am interested in learning how to graph a liner regression for 3 variables. as in is weight a function of height and thickness.

  • @srikarbeechu
    @srikarbeechu 8 лет назад

    Hi Jason I have a problem in hand, i exactly do not know the function of a model, but using the dataset i have i must find out the function. i have three inputs in hand and i have an output, i must find relationship between these input variables and find the output. could i have a short guidance over this.

  • @avinashpoojari9372
    @avinashpoojari9372 8 лет назад

    Hello, there are 3 separate dummy variable columns for internet E, safari, chrome...
    is there any choice to take these 3 in a single column with giving discrete values like 0,1,2....please help me over finding this

  • @glennen
    @glennen 10 лет назад

    I LOVE YOU

  • @AI-ew1rj
    @AI-ew1rj 6 лет назад

    shouldn't internet explorer' equation just have B2. Why are we including B1, which is firefox for internet explorer

  • @abbaskhanyousafzai1
    @abbaskhanyousafzai1 4 года назад

    dear sir this is not what i am looking for, please make a video on multi breaks in the time series data and how to add dummy variable for each variable to apply ARDL test