Interpretation of interaction effects

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 9

  • @anitapallenberg80
    @anitapallenberg80 2 года назад +1

    Thank you Mikko! As always, super helpful video.
    However, I was a bit confused about not centering the variables. I thought that if we include an interaction term then we can only interpret the regression coefficients when the other variables are 0? If I center I will always have a meaningful interpretation for 0 and also reduce spurious associations based on the scaling. Could you please shortly explain that?

    • @mronkko
      @mronkko  2 года назад

      That is right. Let's assume that we have a model with effects of gender, education, and gender*education. The interpretation of the gender effect from this model would be for a person with 0 years of education. This is not a useful thing to know because there are very few of any such persons. If you center the data, then the effect of gender is reported at mean value of education, which might be somewhere between 12-15 years depending on country. But in my field we do not interpret the coefficients at all but we instead plot adjusted prediction lines. If we interpret the results this way, the usefulness of centering disappears but it results in the plots being scaled incorrectly. So whether you center or not depends on how you plan to interpret the results.

  • @yuzaR-Data-Science
    @yuzaR-Data-Science 2 года назад

    Dear Mikko, thanks a lot! It's hard to find a good explanation of interactions. Your's is the one! I have a question if I may: I have 4 interactions in the final model after backwards selection in R. All predictors are categorical. If I use "allEffects(model)" I get lower and more realistic estimates, then, if I use "emmeans(model, pairwise ~ one | two)" I get very high (unrealistic) estimates. Using "emmeans(model, pairwise ~ one | two, weights = "prop")" seems to solve the problem, but now I am not sure anymore how to interpret the multiple interactions and which package (R Code) to trust. So, my question is: are several interactions in an (explonatory, not predictory) model legit and how to correctly interpret them / what are correct estimates?

    • @mronkko
      @mronkko  2 года назад +1

      First, I would not use backward selection. I think it is better to build your models based on theory instead of empirics.
      Second, I have never used emmeans function so I cannot comment on that. But generally, you need to understand the difference between adjusted predictions and marginal effects. It is possible that other package gives you marginal effects and other adjusted predictions (marginal means). See this for an explanation:
      ruclips.net/video/kddPY0jpMgU/видео.html
      You can also check you the explanation in may article on visualisation: journals.sagepub.com/doi/full/10.1177/1094428121991907

    • @yuzaR-Data-Science
      @yuzaR-Data-Science 2 года назад

      @@mronkko Thank you very much for your answer! emmeans indeed produces estimated marginal means, and the estimates are often strange. For a single interaction they are (adjusted and marginal) apparently identical. But for several interactions in the model they differ a lot. Thanks for the references, I'll study them! Do you know a good reference for the usual statistical procedure for multiple linear regression? E.g. if I have 5 variables, what should the steps be: 1) univariate analyses, 2) single interactions, 3) ??? multiple regression without interactions (with only p

    • @mronkko
      @mronkko  2 года назад +1

      ​@@yuzaR-Data-Science I cannot give a good book recommendation: I have a problem with regression books generally. There appears to be two kinds of books 1) "regression for dummies", which explain applications. But these are also often "by dummies" meaning that they contain errors and cannot be recommended. 2) econometrics books that focus on the theory more than on the application. I recommend the following workflow for my students 1) univariate analyses, 2) regression with all variables and no interactions, 3) regression with all interactions. The reason why you need all variables is that if the interactions are correlated you run into omitted variable bias if you do not include them all.
      When I get weird results, I troubleshoot the results by replicating the analysis "manually". In regression, prediction at means is easy to calculate by hand and it is identical to the marginal means (adjusted predictions). If you do not get the same result from hand calculation and software, you need to study the documentation more and try replicating simpler models by hand to understand why the results are not the same.

    • @yuzaR-Data-Science
      @yuzaR-Data-Science 2 года назад

      @@mronkko thanks a lot!

  • @larissacury7714
    @larissacury7714 Год назад

    Hi, thank you once more! I have a regression with Y ~ YEAR * TEST(A/B). It turns out that YearB has a significant positive slope on the ref level (Year 1, Test A), but Test (B) doesn't (the slope is negative, non significant). However, the interaction Year * Test is still significant (Year2, LangB, negative significant interaction), any headings on this? I'm a bit confused about how interpreting it, the plot is very much similar to the figure 2a 3:45

    • @mronkko
      @mronkko  Год назад +1

      The significant effect of an interaction means that the slope of one variable depends on another variable. I would just calculate slopes or expected values at specific values of Year and Test and interpret those.