Statistics 101: Multiple Linear Regression, Two Categorical Variables Continued

Поделиться
HTML-код
  • Опубликовано: 28 ноя 2024

Комментарии • 125

  • @chandymohan
    @chandymohan 6 лет назад +27

    Brandon, at 65 I am trying to keep up with new technologies. Regression analysis was not taught in my school days and, so, I was finding it difficult to follow the predictive models. Your videos not only taught me the fundamentals but also have given me the confidence to use regression for my bucket list pet projects. Thank you!

  • @audreeprovard4328
    @audreeprovard4328 4 года назад +5

    You saved me so many hours of reading. I am analyzing some data with a multiple linear regression for my job. I couldn't wrap my head around how the "lm" function in R was able to include categorical data in the analysis. Thank you for explaining all of this so clearly!

  • @rohitekka2674
    @rohitekka2674 3 года назад +1

    Came here for the concept, stayed here for the knowledge. On my way to complete PL15. Thank you Brandon!

  • @xmwang829
    @xmwang829 8 лет назад +2

    I love all series of your videos because the boring/complicated statistics subjects in the classroom become so easy/interesting and visually enjoyable. You cannot imagine how helpful your videoes are. Love your way of delivering the speech. I wish I see your videos earlier. Thank you, Brandon.

  • @shrikantprabhu799
    @shrikantprabhu799 9 лет назад +11

    I stumbled upon your videos, as I had some doubts in Statistics! Nice to see that someone has taken efforts to explain the Fundamentals ... Much Thanks and Best Wishes !

  • @robinmccullough6718
    @robinmccullough6718 9 лет назад +1

    I've been going through your videos as a refresher from when I studied economics over three years ago. It has been incredibly helpful. Thank you very much for doing these videos!

  • @GeorgeTel100
    @GeorgeTel100 7 лет назад +1

    Why can't every teacher be like Brandon?!! What a gift!

  • @JackIsNotInTheBox
    @JackIsNotInTheBox 4 года назад +5

    Fantastic series. I watched all of them!

  • @vikashprasad557
    @vikashprasad557 4 года назад +3

    Simply awesome teaching skills. Thank you so much Sir!

  • @kingsleyakpeji5447
    @kingsleyakpeji5447 6 лет назад

    Thank you Brandon. Your videos on multiple regression have significantly improved my intuition on the topic. Keep up the good work.

  • @michaelpappas3857
    @michaelpappas3857 2 года назад

    Brandon, I can't thank you enough for your excellent videos that make difficult concepts understandable and accessible! You're incredible! Thanks.

  • @abdallamohamed2374
    @abdallamohamed2374 7 лет назад

    i had problems with dummy variables in multiple regression. Now they are very clear to me. Thanks a lot Brandon for this great efforts!

  • @paulinagc6986
    @paulinagc6986 5 лет назад

    I followed this playlist from video 1, and you sir, just saved my term. Thank you so much for creating and sharing these videos. Godspeed

  • @growhappiness9065
    @growhappiness9065 9 лет назад +85

    Damn I wish I had adopted you as my remote statistics professor from the get-go, it was so infuriating and frustrating having to figure out everything for myself. The most frustrating part is looking back and realizing that none of this is actually hard at all, it all boils down to the ineptitude of teachers and the limits of the book format. I'm always puzzled how didactically challenged some people with a background in psychology can be. I hope some day Universities will stop wasting their professor's time and also my time with this incessant lecture ritual nonsense. All you need is one person that can explain things from the ground up in a didactically sound manner, and that person only has to do it once and put it online.

    • @davidjohn1944
      @davidjohn1944 7 лет назад

      Growhappiness

    • @mohammadpourheydarian5877
      @mohammadpourheydarian5877 7 лет назад

      Brandon is a very good teacher but most tsudents are not ready for this level of challenge.

    • @vainbow4632
      @vainbow4632 4 года назад

      People with a background in psychology usually don't understand statistics themselves, so teaching others is quite the challenge

  • @YogeshprabhuJ
    @YogeshprabhuJ 9 лет назад +2

    Fantastic and clear explanation... videos on Interaction variables , logistics regression , non linear models would be of great help. Thanks once again.

  • @ameliajimenez6985
    @ameliajimenez6985 5 лет назад

    THANK YOU! I watched this playlist and the simple linear regression playlist in two days. This has been tremendously helpful!

  • @lilyrose4056
    @lilyrose4056 9 лет назад

    dear Mr.Brandon I'd like to thank you very much for helping us with your amazing videos.Regards

  • @gowezzmania880
    @gowezzmania880 6 лет назад

    By watching this video, I can enrich my understanding of the fundamental explanation of multiple regression output. superb!!!

  • @1954Kamla
    @1954Kamla 7 лет назад

    The way things are explained is just marvellous, examples are simple and to the point. Really helpful to a learner....Thanks Brandon for these wonderful videos...

  • @HSarah123321
    @HSarah123321 9 лет назад

    Thank you so much for these videos, you really helped me understand multiple regression, which I never thought I would! You're a lifesaver!

  • @deborahfister1443
    @deborahfister1443 4 года назад

    I watched all the videos in this series because I edit dissertations for students and, in general, they have no idea what the data they get from SPSS means or if it is relevant. So I get all of the data and have to sort it out. I guess the best bit of information regarding multiple regression is to eliminate IVs that are not corelated to the DV. Thanks for clearly explaining this sometimes confusion topic.

  • @pradeepsharma30
    @pradeepsharma30 7 лет назад

    All I can say is....it is amazing piece of work!!

  • @xcriss2898
    @xcriss2898 9 лет назад

    Thank you very much, your videos are incredible well paced and explanatory. It will be great, if possible, for the problems with bigger datasets, to have it at hand. Cheers!

  • @valdaalleyne3661
    @valdaalleyne3661 9 лет назад

    I recently started watching your videos and they are really helpful. Keep up the good work!

  • @WreathStorm
    @WreathStorm 8 лет назад

    Thank you so so much! I was so confused about qualitative predictors, and your mini-series made them understandable! ^_^

  • @abhijeetparihar3349
    @abhijeetparihar3349 8 лет назад

    Excellent explanation!

  • @atiferede7226
    @atiferede7226 6 лет назад

    Excellent videos. Thank you very much Brandon.

  • @Sanky1982
    @Sanky1982 9 лет назад

    Hi Brandon, THANK YOU for sharing these videos. You are an amazing teacher. The simplicity and easiness in you explanation is simply unparallel! Please keep up the good work going. Thanks again!

  • @tamammohamad922
    @tamammohamad922 4 года назад

    I just finished them all, thanks!

  • @fhd21
    @fhd21 9 лет назад

    Thank you Brandon. You have helped a lot!

  • @timedullantes571
    @timedullantes571 4 года назад

    Thank you very much Brandon.

  • @himanshudalai1028
    @himanshudalai1028 5 лет назад

    Hi Brandon !! Thank you so much for these wonderful videos. These are the most useful ones.

  • @sedaaydin3000
    @sedaaydin3000 7 лет назад

    you are a legend brandon!

  • @tonydinh8905
    @tonydinh8905 5 лет назад

    This was fantastic! Thank you so much!

  • @Landon_R
    @Landon_R 2 года назад

    One thing I'm still not clear on..."East" is not given in the results, which is fine for the purpose of inputting numbers into the regression equation, but we don't get to see the p-value for that category. We can see whether or not North, South, and West are significant, but how would you know if East is significant or not? Also: my understanding is that in this case East is a reference for the other categories within that variable. The p-values for North, South and West indicate whether or not they are significantly different from the reference (East). What does that mean for each category's significance to the overall model, and what information tells us if the "Region" variable overall is significant to the model? It seems there might be cases in which we would remove the variable entirely, and cases where some categories are significant and we would keep the predictor but remove the non-significant categories within it.

  • @toukka4
    @toukka4 3 года назад

    Thank God you exist

  • @tobinwazzan
    @tobinwazzan 4 года назад

    Thank you Brandon.

  • @Basmareem
    @Basmareem 3 года назад

    Sir, the video was enightening and has yremendously helped in analysis of my dissertation. I had a doubt. When there are many categorical and continuous exposure variables, should each be analyzed separately as you have explained in the first part of the classes on multiple regression or run it together as done in this video.???

  • @simondixon6586
    @simondixon6586 5 месяцев назад

    Great videos! One question though. If you don't have minitab or SPSS (which are expensive) how can you use Excel to calculate the comparative statistics? Do you have a video for that?

  • @agathabanga3578
    @agathabanga3578 9 лет назад +5

    Thanks Brandon. The videos are great. One question- it relates to dummy variables. On the results slide(output) East is not even mentioned. Therefore it doesn't have a coefficient etc. Please explain what will be done if we happen to need to know that? eg exam question asking for a model for a house in the East.

    • @Randyminder
      @Randyminder 8 лет назад +1

      I'd like to know the same thing. Did you ever get an answer to this?

    • @alejandrohermida9318
      @alejandrohermida9318 7 лет назад +4

      I'm daring to guess right here, but I think that, as the "east" category is marked with a 0, the values of east would be equal to those of the sqft without adding anything else. The same would be true for when a school is not exemplary (also marked with 0), so in the case of a house in the east in a neighborhood without an exemplary school, the price of the house would only be affected by the sqft (92 dlls per additional square feet). In the case of a house in the east with an exemplary school, we would have to add 27310 dlls to the coefficient of the constant, that is, instead of starting our line in -30.3, we would be starting it around -3 (as we are adding the 27.3 thousand dollars from the coefficient of the exemplary school). In the case of a house in any other region, we would take the previous numbers and add the coefficient of the region to the constant (so -3 + (-4) for the north, -3 + 7.2 for the south and so). This however would not give us the 4 different lines that were showed in the previous video, as the numbers showed here are all taken from the whole dataset, not from the individual regression of every variable. I hope my reasoning is right and if anyone could correct my answer I would be happy to receive feedback.

  • @TimvanWessel
    @TimvanWessel 8 лет назад +1

    Hi Brandon thanks again for making these series. I like minitab so much compared to excel and even more compared to SPSS. I would like to know how you graphed the 14:50 graph. Did you do this in minitab if yes where can i find it. And do you have the data for this example i would like to duplicate your results to understand them even better. thanks.

  • @alecryan8733
    @alecryan8733 4 года назад

    Amazing series. Any videos with interaction items? Can't seem to find it anywhere

  • @carlosacosta1453
    @carlosacosta1453 6 лет назад

    Thanks a lot Brandon. What do you think about Interaction between categoricall variables? This generate another variables in Regression Model

  • @canernm
    @canernm 5 лет назад +2

    Hello Brandon, thanks for the video ! One question, in this model, would we end up removing the "region" variable since its not significant? Thanks in advance

    • @sahibakumar6180
      @sahibakumar6180 4 года назад +1

      Hey,
      Could you find the answer to this?

  • @ninonoelperez7210
    @ninonoelperez7210 4 года назад

    Good video! I have a question to ask.
    In the first part of this series, you sir talked about checking first the significance of each independent variable against the dependent variables. Does that apply also to categorical variables? Since the categorical predictor for the region is NOT significant, should it still be included in the model? Should i model a new one without including the region variable? I look forward to your reply thanks :)

  • @John-kk4xk
    @John-kk4xk 5 лет назад

    The videos are awesome and very informative! Just wondering if there is another video on how to tie this all together and use the coefficients on a large file to predict the value of many properties at once.?

  • @frankyfernandes4567
    @frankyfernandes4567 6 лет назад

    Thanks a lot; you made Stats interesting

  • @roywright3723
    @roywright3723 8 лет назад

    Good morning Brandon
    I have been using Simple Linear Regression and Multiple Regression in excel in the valuation process of real estate.
    When using excel with multiple regression, I have been focusing on the Significance F, Adjusted R Square and the Correlation Coefficient (Multiple R). I have been ignoring the t-Stat and the P-values.
    My question is how critical art the t-Stats and the P-values?

  • @martygofast
    @martygofast 5 лет назад

    Hi Brandon, I miss a section on control variables and OVB. or should I find it in a different playlist?

  • @mohsinsyed4236
    @mohsinsyed4236 4 года назад

    @brandon foltz: i am confused at 11:15. from where did we get the negative values. Can u share some manual working too???

  • @dendeibrahimadekanmbi8022
    @dendeibrahimadekanmbi8022 6 лет назад

    I really appreciate your efforts in educating people. I would like to know if you have any short course on Multivariate analysis which I can attend to obtain certificate . I prefer to have physical contact with you and the rest of your team.

  • @FuriousP14
    @FuriousP14 3 года назад

    So in the example should we remove the region predictor from our model since it isn't statistically significant?

  • @jimshoe8646
    @jimshoe8646 9 лет назад

    UNPACKING THE INTERCEPT IN MULTIPLE REGRESSION. Don't know if it is possible but I will ask anyway. I created a spreadsheet for a tract of 48 homes (made up data) and input the price per square foot of land, living area, basement, baths, etc. Then I ran the multiple regression and it returned the exact coefficients expected. The intercepts for each characteristic were combined into one in the multiple regression results. I would now like to 'unpack' the results to find out how much of the 'multiple' coefficient should be allocated to each characteristic. For example, a 1,400 square foot home is worth $100/sqft but the model just gives the coefficient of $50. Is it possible to 'unpack' the intercept and find out where the other $50 went?

  • @nicholaslipanovich827
    @nicholaslipanovich827 3 года назад

    This is great and all but I was really hoping to see you interpret the Beta's for the Geographic regions. I realize they aren't statistically significant but showing an example of if they were statistically significant and explaining their coefficient results would've be very helpful

    • @nicholaslipanovich827
      @nicholaslipanovich827 3 года назад

      I was able to get an example of this in a stats regression textbook from grad school so I'm good but most people watching this don't own advanced statistics books

  • @shan19key
    @shan19key 7 лет назад

    One question - If a variable is not significant(region in this case), should it be even used in the regression equation or can it be omitted?

  • @AndyRyanTX
    @AndyRyanTX 7 лет назад

    Can you explain why the coefficients look different at around the 11:45 mark than they do earlier in the video? For example, the coefficient for West (when it's a NO for exemplary) is 54.4 when it was -23.1 at around the 4 min mark.

  • @amartyagupta8255
    @amartyagupta8255 7 лет назад

    Hi Brandon! You didn't do any prep work here; is it because we have one predictive variable and rest categorical values, describing the characteristics of the house.

  • @tcmJOETrotsky
    @tcmJOETrotsky 9 лет назад

    Hi Brandon,
    Thanks for posting these videos--I've been trying to brush up on my statistics after not having done some of these methods for several years. I wanted to ask if you would be willing to post the slides from your videos on your webpage, so I could quickly go through them for review.

  • @paulinaagnello6790
    @paulinaagnello6790 7 лет назад

    You are awesome! But, I don't quite understand why the overall estimated regression equation is in the format of 8 different 1-variable equations, instead of that long equation with multiple Beta values and all the x-variables????

  • @TrentTube
    @TrentTube 5 лет назад

    Hey, Brandon. I am building a model where I have two categorical variables that are significant as groups on my ANOVA tables, but many of the elements of the groups are not statistically significant. I am uncertain whether I ought to keep the groups in the model. Both groups explain a large share of the variation.

  • @yasmeenalbalushi2853
    @yasmeenalbalushi2853 7 лет назад

    Great effort
    Thank you

  • @alexchun5944
    @alexchun5944 5 лет назад

    Hey Brandon, what does the F-Value tell us about the variable?

  • @MrMarcus7447
    @MrMarcus7447 8 лет назад +1

    Hi Brandon,
    Thanks for your wonderful videos... much appreciated.
    Can I ask why you retain the "location" variable in the regression equation even though it is not statistically significant? In previous videos using interval variables didn't you remove the variable if it was not correlated with the dependent variable?
    Hope this makes sense and thanks again for your great work.
    Regards, M

    • @沈宏富
      @沈宏富 8 лет назад +1

      ,欸,啊啊哦一欸餓Hkkil ˙恩餓無意義欸

  • @mouhamadeldarwich
    @mouhamadeldarwich Год назад +1

    What does it mean when the t-test of the slope of one of the included variables is not significant? Does that mean we should drop it out of the model?

    • @BrandonFoltz
      @BrandonFoltz  Год назад

      Hello! Significance is only meaningful in the context of the other variables in the regression model. So, it should be kept and reported as not significant. If you want to build the model that explains the most variance in the DV, you can use a true model building processes such as forward backward, stepwise, subsets, etc. to see how different combinations gobble up the total sum of squares.

    • @mouhamadeldarwich
      @mouhamadeldarwich Год назад

      Thank you for your reply@@BrandonFoltz

  • @jmrjmr8254
    @jmrjmr8254 9 лет назад

    Hi Brandon! Thanks for the video! When do we consider that the R-sq (pred) is significantly lower than the R-sq(adj)?

  • @susansebastian4229
    @susansebastian4229 3 года назад

    Should we only check for multi-collinearity after running the model by looking at the vif for categorical variables? Or is there a different method to check for this while prepping the data.

    • @BrandonFoltz
      @BrandonFoltz  3 года назад

      Hi Susan! Always a god idea to check for collinearities right at the start. If features are correlated with each other and the target, these methods will likely not include it because the sum of squares is redundant. You can look at tolerance or VIF (VIF is the reciprocal of tolerance) as an additional check.

  • @baronhp7503
    @baronhp7503 9 лет назад

    I did not fully understand beta1;
    In video 5A we studied the scatterplots and learnt that beta1 - the coeff of sqrfeets - is different depending on region (east/north/south/west)
    Now here our model is such that the beta1 is same regardless of the region. Shouldn't we somehow have different beta1 for different regions?

  • @irenecheca6575
    @irenecheca6575 3 года назад

    I am wondering how do you know, with categorical variables with more than two levels, which of the levels are significantly different than the others for that dependent variable. Imagining that region would have been significant for some of the levels, say West and East, and not for others, how do you know weather West and East are significantly different from each other?

  • @satheeshkrishnankannaiyan5577
    @satheeshkrishnankannaiyan5577 6 лет назад

    Hi Brandon. If you could do you a prediction video based on the categorical example that would be helpful .

  • @gautamshenoy6170
    @gautamshenoy6170 4 года назад

    is it necessary to include all categorical independent variable ?? (or ignore sum like prev example of 3 independent variable and the best fit was 1 independent variable ) or we can ignore the non significant ones (the region) ?

  • @hardXidiot
    @hardXidiot 6 лет назад

    sir do you have a tutorial about mediation analysis?

  • @fryeedom
    @fryeedom 3 года назад

    you make me like statistics...

  • @lakshmanpaul5523
    @lakshmanpaul5523 2 года назад

    How can calculate multiple regression for three or more independent value?

  • @daniaakbar5421
    @daniaakbar5421 6 лет назад

    Informative video again.can u please explain that if I want to fit quadratic regression equation with 1 categorical variable and other continuous variable.then when I use minitab and use regression option then in model tab when I give quadratic model then minitab does not take the square of categorical variable.I need square as I m using quadratic regression equation. Can u plz explain what the logic behind this???

  • @wsxzaqist
    @wsxzaqist 9 лет назад

    Hi Brandon
    I wanted to to know, if there is strong corelation between errors/residuals and predictors; will it affect the regression model?
    if so, how do I address it?

  • @shanmugavelrajeevan7349
    @shanmugavelrajeevan7349 6 лет назад

    Hi Brandon. Your videos are great. Is there anyway to run multiple regression for more than one industry type with more than one independent variable

  • @filippocatellani7948
    @filippocatellani7948 2 года назад +1

    Hi, if i have high p value in Region variable, i could decide to remove the variable from the model? Or not? Thanks so much

    • @BrandonFoltz
      @BrandonFoltz  2 года назад +1

      Hello! Probably but I would recommend using a model building technique in the event there are interactions and other relationships. Stepwise. Best subsets. And such. I have videos about all those!

    • @filippocatellani7948
      @filippocatellani7948 2 года назад

      @@BrandonFoltz thanks you very much

  • @manuuworld
    @manuuworld 8 лет назад

    Where can i find problems to solve and practice?

  • @MohamedHassan-my5ut
    @MohamedHassan-my5ut 8 лет назад

    Thanks a lot for these very helpful well illustrated videos. Can you help me to understand how can i find a curvlinear relationship out from experimental data of multiple variables?
    Thanks in advance :)

  • @nickfleming3719
    @nickfleming3719 4 года назад

    Don't you think you should explain how you got the coefficient for east, since that wasn't included as a variable in the regression?
    That seems like one of the most important things to explain how to do,
    everything else is pretty intuitive
    at least a simple mention
    not a word about it
    just magic

  • @WildtuinMichael
    @WildtuinMichael 5 лет назад

    What I don't understand is the fact that this multiple regression model gives parallel lines for east, west, north and east. However, when you look at the scatter plot it is clear that the lines through these regions are not parallel. Is there way to incorporate the non-parallelity into a model?

    • @xuchuan6401
      @xuchuan6401 29 дней назад

      No interaction term in the current video. So the relation between price and sqrt is regardless of exemp type and regions

  • @life-ul6ye
    @life-ul6ye 3 года назад

    Thank you very much

  • @alexp3428
    @alexp3428 7 лет назад

    The one question that bugs me with regression output is the notion that the regression pvalue can be stat sig, but the underlying variables/components don't necessarily have to be. How can the regression be stat sig at p of 0.00, when one of the variables isn't even statistically significant. Does this not give indication that this variable does not aid in explaining the phenomenon?

  • @utopiasolutions8797
    @utopiasolutions8797 5 лет назад

    How is the slope same for all cases? In the previous videos, the slope was visibly different.

    • @xuchuan6401
      @xuchuan6401 29 дней назад

      If variables have no interaction terms, beta will be constant for continuous variables, because you will only get one beta for one Xi

  • @SyedKollol
    @SyedKollol 8 лет назад +1

    excellent!!

  • @mylilmushy5628
    @mylilmushy5628 3 года назад

    what about minitab 18? it's different right?

  • @AndyRyanTX
    @AndyRyanTX 7 лет назад

    If a P-value is not significant (greater than .05) then don't we eliminate them an re run the regression for a different equation? I noticed all of your p-values for region were not significant. Do you keep variables that are not significant to develop you regression equation?

  • @عبدالزهرهعبدالرضا

    u r really great, thanx.

  • @SyedKollol
    @SyedKollol 8 лет назад

    Hi +Brandon Foltz, I am bit confused now. If there are categorical Variables, unlike multiple regression, are we supposed to use multiple equations(like all 08 eq in this case) or are we going to select One from the 08 equations as in this video ??

    • @ActionSportsExtreme
      @ActionSportsExtreme 8 лет назад +1

      I think you have to decide whether you want an exemplary public highschool or not and in which region you want to live. Then you pick the according equation.
      I hope I'm right!

  • @h.a.8965
    @h.a.8965 6 лет назад

    Can anybody tell me that if we have categorical variables how can we look at a linear relationship between variables? Is it really important to have linear relationship between variables when all the independent variables are categorical?

  • @npt0112
    @npt0112 4 года назад

    I have one question. Where is the p- value of east region ?

  • @gsanjaypratap
    @gsanjaypratap 7 лет назад

    kindly provide link(URL) for all of Stats. video to download....Thanks a ton...Brandon

  • @joshuafancher3111
    @joshuafancher3111 5 лет назад

    Amazing

  • @migalejo85
    @migalejo85 9 лет назад

    Thanks a lot!!!

  • @vintonchen6210
    @vintonchen6210 7 лет назад +1

    what if I want coefficient of "east "?

  • @abc-md9zj
    @abc-md9zj 9 лет назад

    Its great videos. He is great! I am wondering if he has published statistical books. I will be very interested in.

  • @tn-show906
    @tn-show906 6 лет назад

    How to interpret coefficient of region in this example. Exemplary case is fine.. but region has four categories.. so can we say when compare with other regions . That is an average increase.. so how to calculate that average increase of the 4 regions

    • @xuchuan6401
      @xuchuan6401 29 дней назад

      All will be compared to East. For example, north is encoded as 1 0 0, so the coef of north (-4 here) represents a 4 decrease when all other variables being constant (i.e., 1 0 0 vs 0 0 0), and 0 0 0 represents East

  • @sriharijagannath2127
    @sriharijagannath2127 9 лет назад

    @Brandon Foltz - Thanks for the amazing videos. I am trying to analyze website traffic data and I have 6 different websites. Could I use them as categorical Variables?
    It would be a great help if you could upload links to download your presentation slides. As they are much easier to go through than 6 videos of multiple regression in case we need a quick reference.

  • @Vijay-iq1fh
    @Vijay-iq1fh 3 года назад

    love it

  • @arvin_diamante
    @arvin_diamante 4 года назад

    Isn’t it just 9square foot? Kindly check pls