I'm stoked! I'm analyzing a dataset of roughly 100 patients with a rare disease, I began using stepwise logistic regression without interactions and I got a nice predictive model for disease remission with many variables that contributed to prediction accuracy but I just can't provide a proper explanation for many of them (they don't make sense). Right now I'm running gmulti with what I learned from your "Find the best model" video and I will definitely try emmeans after that and try to understand these relationships. Thank you very much!
Glad my content is useful! That's exactly what I hoped for - to share useful tools with the world! I think, when a variable does not make sense it should not be in the model in the first place. Because, the method (e.g. glmulti) can't think, but only calculates and provide answers. When the predictors make sense, then, to my current knowledge, emmeans package is the best package to make sense of the result. Thanks for your feedback and for watching!
I just came across your channel. I thought I watched all your videos already but I saw this is brand new. You must have heard it before, but you're doing an excellent job here. I already subscribed. I wish I xould subscribe twice. Please keep them coming. Thanks for what you do.
Kudos to you, @YuzaR! I think that you should really consider to create some courses on the educational platforms (Coursera, Udemy, etc), if you don't have some yet. It would be really helpful! Thanks for the amazing job you are doing!
Thanks for the great videos. I only started programming 3-weeks ago for the first time of my life, it reminded me when I was a 7-years old boy and wanted to play FIFA 1994 on "dos", it was 30 years ago, and since then I have never typed any command. By the way, for 2 categorical predictors (e.g., "age_cat" and "jobclass"), are there any difference between linear model ("lm") and ANOVA models (aov, or aov_test), i.e., generating an object for ANOVA model, and piping it through the emmeans function, in the same way you dealt with your linear model object.
Great stuff, Yury! A quick question: Does the whole analysis by emmeans you go through here produce correct results when lme with a random factor is used instead of lm with only fixed factors?
Hi. Thanks. Quick question: 1. is this Interaction also what is known as Subgroup analysis or it is different? What makes them different if it is and do you have a video on subgroup? 2. Is this interaction the same as what is not an moderator or it is different?
I usually don't do any of that, but it is probably due to my field - medical stats, which want highly interpretable results. folks here do not even like any log-transformation. but if you are working with machine learning predictions intention, some scaling or centering might be useful... good idea for a future video actually. cheers mate
Hey man, wow, I would highly appreciate that! Yes, there is an easy way through "Thanks" under the video, near the download and share buttons. The youtube takes a share of it, but I'll still receive the most. Thank you sooo much for your support!!! 🙏And for watching!
One simple question: Let's assume I want to test hypotheses regarding the influence of x1 on Y, x2 on Y, and that there is an interaction effect between x1 and x3. The coefficients b1 and b2 from the model y = b1x1 + b2x2 + b3x1x2 won't tell me the unconditional effect of x1 and x2 on Y. How could I obtain these unconditional effects in R, to report for my hypothesis testing, using an OLS with interactions ?. Thanks in advance.
if you use interaction, you don't interpret the main effects. I do not recommend to mix additive and interaction effect in the same model for different predictors. Hard to interpret. Similarly, triple interactions are hard to interpret, the emmeans can help though.
@yuzaR, thank you, your videos are so helpful! One thing is, I can't get the graphs at 1:58 in the video to work with my glmers. Is it because my outcome variable is categorical? Thank you!
Hi, Melissa. You usually don't do a categorical outcome with glms. At least not in the video, there, wage - as an outcome is numeric. That already could be a problem. Thanks for your feedback!
Thanks! It depends on what you mean by customize. It’s definitely possible to get contrasts between levels of one variable inside levels of another, or vice versa
Thanks so much for this, it has been a huge help. I do have a question though. I am trying to use the emmip and emtrends functions, but the outputs look inversed because I am using a negative binomial glm and I need to backtransform the coefficients (I think). Is there a way to incorporate that into my emmip code to make it correct?? Thank you!! i.e.: emmip(GalTMBX, Treatment ~ Week, CIs = TRUE, cov.reduce = FALSE, ylab = "Galleries")
Hi @YuzaR. This is amazing, we have been for too long ignored the interactions. I was wondering if possible you can help us show us how to CONDUCT reliability and validity IN R or STATA [for OLS and Logistic regression or other models like multivariable...]. Conducting RELIABILITY AND VALIDITY is quite challenging due lack of systematic guidance in pedagogical way.
I partially addressed your question in a previous video on the {performance} package. Have a look on two functions, "compare_performance" and "check_model". Thanks for your feedback and thanks for watching!
it's a great question! I often see "multicollinearity" with interactions (VIF > 10), but I always accept and ignore it, because for me the question is more important than the multicollinearity. Would I not ignore it, a lot of questions would not be answerable. I speculate the multicollinearity is not an issue, when no collinearity arises in the model without interaction. Then, such predictors may interact. For me the multicollinearity only makes sense when we check predictors without interactions, because only they can provide similar information (be multicollinear). In the interaction, one predictor is checked inside of the levels of another predictors, so, they can't provide similar information if they didn't before interaction. However, if I have three predictors which are collinear then the interactions between them would definitely skew up the result. That's my train of thoughts on the issue, since I never explicitly found the answer to that. But if you'll find one, please, comment here for the whole stats community. Cheers and thank you for watching
@@yuzaR-Data-Science , thank you for responding to my question, and apologies for my late reply in return. My question partly stems from a situation where I was dealing with a small dataset (~170 observations). I attempted to include an interaction which resulted in 'noisy' estimates, along with several other categorical variables. Therefore, my model had to calculate a large number of parameters relative to the sample size. I suspect that if I had a much larger sample size, that the estimates for the interaction term would have been stable even if multicollinearity was present. Given my small sample, I ultimately ended up trimming the number of variables, including the interaction term, from the model. That said, I still kept the two variables in the model - just not as an interaction. I suppose it's worth keeping in mind that the sample size required to detect an interaction would be larger than that required to detect a main effect of the same size. Anyway, your content is a goldmine, and I really appreciate the insights you share via your videos and website 🙂
yeah, then it's rather the overfitting problem. Then I would also reduce the number of predictors and avoid interactions. Thanks for your feedback, Marcell!
sure! It's totally on the list. But will take some time. Until then you can have a look at my old blog, but I think, the quality of that is low. I did power analysis in R mainly, without any deep understanding. When I'll create new blog-post on the topic, I'll try to go deeper. Anyway, hier ist the link: yury-zablotski.netlify.app/post/power-analysis-vol-1/
Awesome presentation. Please keep making these videos and keep inspiring. Try to promote your channel on various platforms to gain more likes. I am very glad that I stumbled upon your channel! God bless you! :) I just subbed to your channel. I started with one vid and now watched over 6 vids. Amazing skills you got! :)👏 and thank you so much for doing these videos.
Thanks for such a generous feedback! :) I am happy my content is useful! Good suggestion, I do promote it a bit on twitter, facebook and linkedin... what else do you think I can do? Would appreciate any suggestion! Cheers
Thanks, Roma! Sorry, I can't switch off my accent 🙈 😂, but I think there are automatic subtitles from Google. I don't know whether they are any good. Did you try them?
@@yuzaR-Data-Science yes there are subtitles but sometimes they don't work properly it's just that with human English subtitles, it's much easier for non-English speakers (I'm one of them, from Ukraine) to understand the information, which is good for both consumers and content producers. Thanks for the answer, I wish you success!
Thanks 🙏 I’ll do my best to speak more clearly. By the way, in the description to the video there is a link to a blog post where you can read what I say and get the code
@@yuzaR-Data-Science thanks, too bad I didn't see this earlier also links in the code, etc very high quality work, incredible thank you! (but tables and long lines are sometimes not displayed well if you fix it, in my opinion, it will be absolutely perfect)
I'm stoked! I'm analyzing a dataset of roughly 100 patients with a rare disease, I began using stepwise logistic regression without interactions and I got a nice predictive model for disease remission with many variables that contributed to prediction accuracy but I just can't provide a proper explanation for many of them (they don't make sense). Right now I'm running gmulti with what I learned from your "Find the best model" video and I will definitely try emmeans after that and try to understand these relationships. Thank you very much!
Glad my content is useful! That's exactly what I hoped for - to share useful tools with the world! I think, when a variable does not make sense it should not be in the model in the first place. Because, the method (e.g. glmulti) can't think, but only calculates and provide answers. When the predictors make sense, then, to my current knowledge, emmeans package is the best package to make sense of the result. Thanks for your feedback and for watching!
You make data analysis so smooth and easier to understand. Thanks for your wonderful tutorials.
Glad to hear that! Thanks for such a nice feedback and thanks for watching, Felix!
I just came across your channel. I thought I watched all your videos already but I saw this is brand new. You must have heard it before, but you're doing an excellent job here. I already subscribed. I wish I xould subscribe twice. Please keep them coming. Thanks for what you do.
Thanks a ton! It means the world to me! And motivates to continue! Thanks for watching - it's the best support! Cheers
Kudos to you, @YuzaR! I think that you should really consider to create some courses on the educational platforms (Coursera, Udemy, etc), if you don't have some yet. It would be really helpful! Thanks for the amazing job you are doing!
Thanks a lot for such a nice feedback! Great idea. I will consider producing a course in the future. Until then, thanks for watching!
Very useful! Thanks!
Glad it was helpful! Thanks for watching!
so impressive with your knowledge and video, thank you so much.
Glad you enjoyed it! Just send you the link in other comment too: we.tl/t-tBLvcJ55xT
Great, excellent job, thank you very much !!!
Glad you liked it! Thank you for watching!
I also subscribed. Your videos are always very informativ and helpful. Thank you.
Thanks for the sub! And for watching! I am happy you like my content!
Thanks so much for sharing this useful resources sir
So nice of you! Thanks 🙏
Thanks for the great videos. I only started programming 3-weeks ago for the first time of my life, it reminded me when I was a 7-years old boy and wanted to play FIFA 1994 on "dos", it was 30 years ago, and since then I have never typed any command.
By the way, for 2 categorical predictors (e.g., "age_cat" and "jobclass"), are there any difference between linear model ("lm") and ANOVA models (aov, or aov_test), i.e., generating an object for ANOVA model, and piping it through the emmeans function, in the same way you dealt with your linear model object.
Thanks 🙏 Itamar, anova and lm usually produce identical results. But emmeans works generally better with models, like lm etc.
Great stuff, Yury! A quick question: Does the whole analysis by emmeans you go through here produce correct results when lme with a random factor is used instead of lm with only fixed factors?
Sure , I use emmeans for mixed models all the time. Thanks for watching!
Hi. Thanks. Quick question:
1. is this Interaction also what is known as Subgroup analysis or it is different? What makes them different if it is and do you have a video on subgroup?
2. Is this interaction the same as what is not an moderator or it is different?
hi, yes, this is similar to subgroup modelling, or stratification, but I don't know what you mean with moderator
Thanks for this! Do you recommend any variable prep (centering, etc.) before running models?
I usually don't do any of that, but it is probably due to my field - medical stats, which want highly interpretable results. folks here do not even like any log-transformation. but if you are working with machine learning predictions intention, some scaling or centering might be useful... good idea for a future video actually. cheers mate
@yuzaR Is there another way to send you a tip other than through Koji? Doesn't work for me.
Hey man, wow, I would highly appreciate that! Yes, there is an easy way through "Thanks" under the video, near the download and share buttons. The youtube takes a share of it, but I'll still receive the most. Thank you sooo much for your support!!! 🙏And for watching!
One simple question: Let's assume I want to test hypotheses regarding the influence of x1 on Y, x2 on Y, and that there is an interaction effect between x1 and x3. The coefficients b1 and b2 from the model y = b1x1 + b2x2 + b3x1x2 won't tell me the unconditional effect of x1 and x2 on Y. How could I obtain these unconditional effects in R, to report for my hypothesis testing, using an OLS with interactions ?. Thanks in advance.
if you use interaction, you don't interpret the main effects. I do not recommend to mix additive and interaction effect in the same model for different predictors. Hard to interpret. Similarly, triple interactions are hard to interpret, the emmeans can help though.
@yuzaR, thank you, your videos are so helpful! One thing is, I can't get the graphs at 1:58 in the video to work with my glmers. Is it because my outcome variable is categorical? Thank you!
Hi, Melissa. You usually don't do a categorical outcome with glms. At least not in the video, there, wage - as an outcome is numeric. That already could be a problem. Thanks for your feedback!
Amazing videos! Could you also explain how the "cov.keep" parameter works in emtrends? thanks
Thanks! I would recommend to look it up in the manual of emmeans package: cran.r-project.org/web/packages/emmeans/emmeans.pdf
Thanks for the great video! Is there a way to customize contrasts in this package ??
Thanks! It depends on what you mean by customize. It’s definitely possible to get contrasts between levels of one variable inside levels of another, or vice versa
Thanks so much for this, it has been a huge help. I do have a question though. I am trying to use the emmip and emtrends functions, but the outputs look inversed because I am using a negative binomial glm and I need to backtransform the coefficients (I think). Is there a way to incorporate that into my emmip code to make it correct?? Thank you!!
i.e.: emmip(GalTMBX, Treatment ~ Week, CIs = TRUE, cov.reduce = FALSE, ylab = "Galleries")
hi mate, have you tried the type = "response" ? Here is an example for logistic regression:
bla
Hi @YuzaR. This is amazing, we have been for too long ignored the interactions. I was wondering if possible you can help us show us how to CONDUCT reliability and validity IN R or STATA [for OLS and Logistic regression or other models like multivariable...]. Conducting RELIABILITY AND VALIDITY is quite challenging due lack of systematic guidance in pedagogical way.
I partially addressed your question in a previous video on the {performance} package. Have a look on two functions, "compare_performance" and "check_model". Thanks for your feedback and thanks for watching!
@@yuzaR-Data-Science Thank you so much
@@kwizeralambert1316 you are very welcome!
What if adding an interaction leads to multicollinearity? Can the results still be trusted?
it's a great question! I often see "multicollinearity" with interactions (VIF > 10), but I always accept and ignore it, because for me the question is more important than the multicollinearity. Would I not ignore it, a lot of questions would not be answerable. I speculate the multicollinearity is not an issue, when no collinearity arises in the model without interaction. Then, such predictors may interact. For me the multicollinearity only makes sense when we check predictors without interactions, because only they can provide similar information (be multicollinear). In the interaction, one predictor is checked inside of the levels of another predictors, so, they can't provide similar information if they didn't before interaction. However, if I have three predictors which are collinear then the interactions between them would definitely skew up the result. That's my train of thoughts on the issue, since I never explicitly found the answer to that. But if you'll find one, please, comment here for the whole stats community. Cheers and thank you for watching
@@yuzaR-Data-Science , thank you for responding to my question, and apologies for my late reply in return. My question partly stems from a situation where I was dealing with a small dataset (~170 observations). I attempted to include an interaction which resulted in 'noisy' estimates, along with several other categorical variables. Therefore, my model had to calculate a large number of parameters relative to the sample size. I suspect that if I had a much larger sample size, that the estimates for the interaction term would have been stable even if multicollinearity was present. Given my small sample, I ultimately ended up trimming the number of variables, including the interaction term, from the model. That said, I still kept the two variables in the model - just not as an interaction. I suppose it's worth keeping in mind that the sample size required to detect an interaction would be larger than that required to detect a main effect of the same size. Anyway, your content is a goldmine, and I really appreciate the insights you share via your videos and website 🙂
PS: perhaps a video on a priori power analysis would be of interest to your viewers (nudge nudge, wink wink 😉)
yeah, then it's rather the overfitting problem. Then I would also reduce the number of predictors and avoid interactions. Thanks for your feedback, Marcell!
sure! It's totally on the list. But will take some time. Until then you can have a look at my old blog, but I think, the quality of that is low. I did power analysis in R mainly, without any deep understanding. When I'll create new blog-post on the topic, I'll try to go deeper. Anyway, hier ist the link: yury-zablotski.netlify.app/post/power-analysis-vol-1/
Thanks
Thank you! 🙏
Awesome presentation. Please keep making these videos and keep inspiring. Try to promote your channel on various platforms to gain more likes. I am very glad that I stumbled upon your channel! God bless you! :) I just subbed to your channel. I started with one vid and now watched over 6 vids. Amazing skills you got! :)👏 and thank you so much for doing these videos.
Thanks for such a generous feedback! :) I am happy my content is useful! Good suggestion, I do promote it a bit on twitter, facebook and linkedin... what else do you think I can do? Would appreciate any suggestion! Cheers
really quality content, if only there were subtitles..
Thanks, Roma! Sorry, I can't switch off my accent 🙈 😂, but I think there are automatic subtitles from Google. I don't know whether they are any good. Did you try them?
@@yuzaR-Data-Science
yes there are subtitles but sometimes they don't work properly
it's just that with human English subtitles, it's much easier for non-English speakers (I'm one of them, from Ukraine) to understand the information, which is good for both consumers and content producers.
Thanks for the answer, I wish you success!
Thanks 🙏 I’ll do my best to speak more clearly. By the way, in the description to the video there is a link to a blog post where you can read what I say and get the code
@@yuzaR-Data-Science
thanks, too bad I didn't see this earlier
also links in the code, etc
very high quality work, incredible thank you!
(but tables and long lines are sometimes not displayed well
if you fix it, in my opinion, it will be absolutely perfect)
Thanks for the improvement advice! I’ll try to fix it