13.10 Multiple Linear Regression: Mean-Center & Standardization

Поделиться
HTML-код
  • Опубликовано: 9 янв 2025

Комментарии • 18

  • @brandony8691
    @brandony8691 4 года назад

    Thank you. I'm working on making a predictive financial model with some interaction terms that were experiencing concerning levels of multicollinearity and this helped solve my issues!

    • @ShawnJanzen
      @ShawnJanzen  4 года назад

      Thanks Brandon. Glad you found it useful! Good luck with your financial model.

  • @BlueUKLouis
    @BlueUKLouis 2 года назад +1

    The output tables appear to be Simple Linear Regression, does this same line of code work in multiple linear regression?

    • @ShawnJanzen
      @ShawnJanzen  2 года назад

      Are you referring to the output table screenshots around 4:56? Yes, you're right those are Simple Linear Regression. Good catch. Sorry for any confusion. I teach this lesson after my students have the basics of both simple and multiple regression, which is why it ends up in the MLR numbered videos, which I should now rethink. :)
      But to answer your question--yes, you can use the exact same process with multiple independent variables for multiple regression. Just keep in mind that you can have a mix of standardized and non-standardized variables in your regression; which ones you standardize (and type of standardization) or not is up to you as informed by the type of data you have and questions you want to answer. So set each variable up beforehand as you want to use them, then plug them in your regression. Good luck!

  • @indoman123
    @indoman123 3 года назад +1

    On the plots they all have the same y intercept when the original is supposed to be different than mean centered.

    • @ShawnJanzen
      @ShawnJanzen  3 года назад +1

      Sorry for the very late reply.
      If you draw the plot so that the y-axis cuts through x=0, then yes, you would see a different y-intercept in the plot. For the purpose of this video, I wanted to make it clear to the viewer that the spread of plotted points maintains the relationship as does the regression line. I believe the actual y-intercept values are rather arbitrary; the relationship between the points and the regression line is far more important.
      Here is R code if you'd like to recreate the plot example.
      [Looks like RUclips is making automatic links in my code. Do not click on them; I do not intend to link you to other sites.]
      # import data
      data("mtcars")
      # mtcars mean center hp
      mtcars$hp.mc

  • @brandonjohnson7634
    @brandonjohnson7634 2 года назад

    How would you use the MC intercept and coefficients in a predictive equation? The y-intercept makes the y-hat value off from the observed data

    • @ShawnJanzen
      @ShawnJanzen  2 года назад +1

      You use them to write the regression equation the same way as if you didn't mean center your independent variable (IV). The math to predict y-hat works out the same; the difference is conceptual. The comparative plots I have around time 7:35 show that the same regression line (all connected y-hats) in relation to the plotted data points.
      In general, y-intercept values can vary wildly depending on how your IVs are measured and transformed. Sometimes they don't even make sense in the real world. So, don't worry about interpreting y-intercept values. If you do need to interpret the y-intercept, just keep in mind that is the value when ALL IVs equal zero; knowing what those zero value IVs represent is far more important.

    • @brandonjohnson7634
      @brandonjohnson7634 2 года назад

      @@ShawnJanzen After having posted the previous comment, I did not formulate the question with enough details. The change I was seeing in the intercept was the first thing I focused on before examining my models. So, one of my variables was curving in the residuals so I used a 3rd order polynomial to correct that. I also centered everything first because of the multi-collinearity issues I was having. Both changes fixed my issues. VIF, autocorrelation, and residuals looked acceptable. However, the coefficients themselves were somewhat different with the MC than the non-MC model and the intercept changed a little as well. It turns out that my non-MC model is excellent at predicting values in the observed data ranges with a RMSE of ~9. My MC model also has a RMSE of ~9; however, it does not predict variables within my observed value range. It over predicts greatly by about 100. I have only four IVs and MC those variables is not a math intensive or complex endeavor with excel so I do not understand why the MC model is changing so drastically. I know this is not helpful for you because you would need to see the model and data but I figured I would explain myself.

    • @ShawnJanzen
      @ShawnJanzen  2 года назад

      @@brandonjohnson7634 you're right. It is hard to say without seeing your data and model. Could be lots of things and hard to speculate and not possibly lead you down the wrong rabbit hole. I suggest posting to a site like stats.stackexchange.com/ . I find help there to be very useful, both when I have questions and to learn from others. You'll get the best help by providing clear details and reproducible outcomes, if not the initial data and problem itself. Who knows, maybe I'll see it there and can help better than via RUclips comments. :)
      And kudos to you for running stats in Excel. Are you doing the math in it by hand? If you use Execl's stats formulas, just be careful. They can do weird and unexpected things. Over the years they've added all sorts of variants (like types of standard deviation) and other formulas lack arguments needed to properly adjust based on the type of analysis (like some Chi Square stuff).

  • @Marie-wi9hl
    @Marie-wi9hl 2 года назад

    Hello :) for a regression, do I need to center only the predictor variables, or all variables that will be used for the regression? Thanks for the video!

    • @ShawnJanzen
      @ShawnJanzen  2 года назад

      Hi. You can center any continuous variable, predictor or explanatory. You can center any number of them, from just one to all of them (but not categorical variables). Just to be clear, centering a variable isn't necessary -- it's just one of many techniques we optionally use.

  • @riaa3218
    @riaa3218 4 года назад +1

    is 5.77e-06 not just an error message like you get in excel when the syntax is incorrect? ive never seen p-values with letters

    • @ShawnJanzen
      @ShawnJanzen  4 года назад

      5.77e-06 is not an error. It is a full p-value expressed in scientific notation, which is a p-value written out fully as 0.00000577. R usually outputs for small numbers unless told to do otherwise.

  • @Dr_Shiny
    @Dr_Shiny 4 года назад +1

    very impresive

    • @ShawnJanzen
      @ShawnJanzen  4 года назад +1

      Many thanks. Hope you found it useful.

  • @maryamomar4106
    @maryamomar4106 3 года назад

    I love you

    • @ShawnJanzen
      @ShawnJanzen  3 года назад

      Thanks! I'll take that in a love of learning kinda way. Hope you enjoyed the video! I appreciate your viewership.