If x1 and x2 are strongly correlated then we should check their individual correlation with target and will select the variable which is highly correlated with target and can also check p value for the variables.
Thanks a lot for the detailed discussion on this topic. For the question asked in the video(Which feature to be removed incase of high correlation), I guess among the two, we have to remove the one which least contributes(less correlated) with the target variable. In that way, we will be able to preserve the feature which has high contribution.
Firstly great explanation !! Now coming to your question, we have to check the bi-variate strength between dependent variables with independent variables. The independent variable with weakest strength should choose to remove from model
Thank you for detailed explanation. I tried this concept from other channels but was bit difficult to get it. Your way of explaining terms is very simple and which helps to understand subject. Really glad that i visited your channel.👍
Simple, Clear and Amazing explanation!!! I think we can remove one of the columns seeing the p value. If p>0.05 then we fail to reject the Null hypothesis for that variable and thus that coefficient value will be equal to 0.Hence that variable will not contribute significantly. Sir pls do make a video on how to use Ridge-Lasso regression to handle multicollinearity.
We can create and compare two models based on choosing each of the correlated explanatory variables one at a time and select the model having better R-squared value.
Hi Aman, What should we do when the constant term p-value is high? Mostly I see that people keep it without worrying about it. Could you please give an explanation for this?
Aman, Your videos are great. But there are many videos which have some connection with other, so can you please make a video in which you can say which order to follow the playlists to learn the machine learning from basics. It would be really helpful😅
Hi Aman, As per my knowledge we can use VIF (Variation Inflation Factor) function, heatmap,Corr() function to remove the multicoliniarity. Please confirm another techniques
Sir if the there are 3 predictors and one dependent variable. all the three independent variables are highly correlated then which type of regression model can be used. multiple regression can not be used rt?can we use the linear regression? can the tolerance of .1 and the VIF less than 10 not a good enough to indicate that there is no multicollinearity? for your question i think the one with weak correlated one to be removed
My question.. 1. Is multicollinearity a concern for predictive modeling. I mean the prediction is altered by neglecting this phenomenon or not. 2. In case of GAM do we have to worry about multicollinearity. 3. How collinearity inflates the variation.
Thanks Kunal for asking it. Answer to first question is prediction will not be impacted more however eoefficoents will be impacted. 2nd and 3rd, I will. Cover in other video
Hi Aman it's very good explanation...please do video on penalised regression like lasso ridge and elastic..too much of mathematics into those please explain in simple way Thank you
Hello sir, After reading the comments I saw the answer to your question. They said we have to remove the one which has less correlation coefficient with the target variable due to the correlation matrix. It confused me at one point, Can we say that the coefficients in front of each feature that we get after running the regression model indicate us impact of each feature on the target? So, I mean can I take these coefficients when I decide which feature I have to remove bw two correlated features instead of taking correlation matrix value with the target variable? Can we say that the coefficients in front of each feature actually say the same thing as the value in the correlation matrix with the target variable in this context?
regarding the question- which variable to remove out of a set of highly correlated variables? Can this be answered by PCA (principal component analysis)? or will the PCA weight them the same because they are highly correlated?
Hi Aman, hope you are doing well ! I want to ask one thing, what you are mentioning regression models is related to linear models right not the tree based regression models am i correct ? does multicolinearity effects the tree based models ?
Hello sir! Firstly thank you for the video! I have 2 questions if you answer I will be glad: 1) Can we say that we don't need to be concerned about correlated features in for example decision tree-based models? I mean do we need this concept only in linear-based models? 2) Don't we need to touch correlated features when we use Lasso or Ridge regression is that true? Will the model do that by itself in that case? Don't we need to touch?
@@hemanthkumar42 Note that multicollinearity does not affect prediction accuracy of the linear regression ,it only make the interpretation harder in the linear regression and mostly for interpretation we go to linear regression and when we go to neural network we already know its type of blackbox and we dont want to interpret ,but want good prediction results ,thats why we dont bother about multicollinearity in neural network
Thankyou for excellent explanation. I have fews questions please: 1. I used Polynomial features method in sklearn and it significantly improved accuracy of my linear regression prediction model, but i found that the newly created features are correlated with the existing features since i created square and cubes! I understand as per your explanation that it will lead to multicollinearity problem! So i understand that the coefficients are not the true picture, However can i use this type of model for predictions? 2. What would you suggest the threshold correlation value for multicollinearity? Thanks
Remove the variable which have low impact on target variable... Sir I hv 2 question 1. If there is multicollinearity in Classification problem. How to handle that 2. What is VIF & how standardization done 3. Can we use standard scaler in regression problem
Great Explanation... At 7.50 you said "that's why we should not have multicollinearity in regression" . So, Is it okay if we have multicollinearity in classification?? Could you please make it clear..
Hi Aman, I have 9 categorical and 6 numerical columns and it's a regression problem. So I can find the correlation between numerical using correlation heatmap but how to find the relation between categorical..?? Can I use chi square test..?? If I use I am getting all 9 categorical are dependent on each other. So what should be my next step..?? Please guide me. Thanks
Hi...Thanks for the nice explaination. Have a question that is multicollinearity a problem for linear regression only? if not then how its a problem for non-linear regression?
Hi Aman, nice work, keep it up.....i have a doubt that why normal distribution is so important? why we need our independent variable should show normal distribution for a good model? i am not finding a satisfying answer. can you please help?
Hi Sharad, in simple language, its easy for the model to learn pattern if you give examples from a large set of range.(That is your normal distribution). Take a example below: Predict salary of an individual(Y - target) based on his/her expense(X variable) Scenario 1 - in your training set you have Y as - 10LPA, 15LPA,20LPA, like that, here model wont be able to learn the pattern for 3LPA guys, may be there is difference is income/expense pattern for junior guys. Scenario 2 - You give many values of Y from all over like 2LPA, 4LPS,5LPA,100LPA, all values like they are normally distributed. Here its easy for model to learn pattern as it sees a range of values and the resulting model will be more reliable. Hope its clear now.
I can't find your answer, I understand that we should use vif for continuous variables but what if I need to see correlation among all ordinal, numeric and nominal?
Hi Aman, Could you please make a detailed video explaining the difference between Gradient Boost, AdaBoost and ExtremeGradientBoosting? Why is AdaBoost called adaptive? Is it only because it edits the weights of the misclassified instances? XGBoost and GradientBoost also are adaptive in that way, arent they? Also, why are XGBoost and Gboost more robust to outliers than AdaBoost despite all of them having a term of log in their loss functions? Would really appreciate your reply. Thanks
If x1 and x2 are strongly correlated then we should check their individual correlation with target and will select the variable which is highly correlated with target and can also check p value for the variables.
Correct. Thank you.
whats p value ?
Thanks a lot for the detailed discussion on this topic. For the question asked in the video(Which feature to be removed incase of high correlation), I guess among the two, we have to remove the one which least contributes(less correlated) with the target variable. In that way, we will be able to preserve the feature which has high contribution.
Thanks Sanjeev. True.
How do we know which contributes least, help?
@@babareddy44 from R2, F- value or p- value?
@@babareddy44 you can use random forest model to see the significance of feature that contribute the most
Firstly great explanation !! Now coming to your question, we have to check the bi-variate strength between dependent variables with independent variables. The independent variable with weakest strength should choose to remove from model
Awesome. Thank you. :)
what is bi- viariate?
@@jamiainaga5853 the two variables that have been found to be highly correlated with each other
How to find this bi-variate strength?
Never seen someone with such a clear understandable explanation...thank you so much!
Amazing pace, crisp word selection and good examples, thank you Aman for great videos !!
Thanks Swati.
Wow, I think I owe you my mark on the Econometrics final, you blew my mind, I had no idea it was so simple. Thank you!
Thank you for detailed explanation. I tried this concept from other channels but was bit difficult to get it. Your way of explaining terms is very simple and which helps to understand subject. Really glad that i visited your channel.👍
Thanks Bala. Keep learning
Simple, Clear and Amazing explanation!!!
I think we can remove one of the columns seeing the p value. If p>0.05 then we fail to reject the Null hypothesis for that variable and thus that coefficient value will be equal to 0.Hence that variable will not contribute significantly.
Sir pls do make a video on how to use Ridge-Lasso regression to handle multicollinearity.
Thanks Samruddhi,
Videos u asked:
ruclips.net/video/7XvBwQeT9OI/видео.html
ruclips.net/video/21TgKhy1GY4/видео.html
To omit either X1 or X2, we can use PCA and remove the variable with low variance.
Why with low variance
It is evident that a lot of work goes into developing these very informative videos. Thank you!
The best explaination on whole RUclips! Thank You.
awesome video! very clear and beginner friendly, no broken train of thought, very problem-focused
Glad you liked it!
09:11 The Data which More Coefficient Value that we have to consider for analysis.
I don't know which one should I take. By the way video is great
If X1, X2 have high correlation, can I choose to drop the X with lower correlation to Y? Based on the correlation matrix
Yes Right.
@@UnfoldDataScience Thank you kind sir. High quality content as always!
👍❤️
Great explaination sir . Thanks for sharing and making my fundamentals strong
Most Welcome.
Thank you so much for the explanation!
Thank you sir... Best explanation
Most welcome
best explanation....keep the good work up.
Thanks a lot.
We can create and compare two models based on choosing each of the correlated explanatory variables one at a time and select the model having better R-squared value.
Amazing Explanation Aman, I have a question that VIF and auxiliary regression both use to detect multicollinearity?
this channel is so underrated
Very Informative aman
Thanks a lot.
If value is less then thresholds value 0.5/0.7 as per the reference suggests. Then we can remove those values
Such a great explanation sir.. Thanks a lot!
Thanks Roshini.
really loved sir what u said i can say that u have great idea of explaining concepts. i can blindly follow u sir
Thanks Shivam.
thanks for clear explanation and God bless!
Welcome.
is multicollinearity will be problem too in correlations? just focus on getting which variables that correlate, not focus on regression. like in PCA
You are a genius. Thanks
You are a great teacher! I learnt something new today.
Pls share within data science groups
great, easily understandable
Hi Aman, What should we do when the constant term p-value is high? Mostly I see that people keep it without worrying about it. Could you please give an explanation for this?
Aman, Your videos are great. But there are many videos which have some connection with other, so can you please make a video in which you can say which order to follow the playlists to learn the machine learning from basics. It would be really helpful😅
Noted. Thank you for suggesting.
Hi Aman,
As per my knowledge we can use VIF (Variation Inflation Factor) function, heatmap,Corr() function to remove the multicoliniarity. Please confirm another techniques
Yes Sushir, apart from some other regression techniques can be used.
Thanks Aman, may I know the regression techniques to remove multicoliniarity. so I will definitely learn this and it will helpful for me.
we will drop that feature from the model whose correlation with the dependent variable is lesser as compared to the other one
Sr ap ko js ne jo answr dia he sb ka answr correct he ap sb ko yes bol rhen hn
Moslty answers are correct only.
Multicolinearity is problem in classification as well right .@3:57
Yes, if it's a linear model like logistics regression.
Simple and to the point explaination 🤘
Thanks Harshad.
Sir if the there are 3 predictors and one dependent variable. all the three independent variables are highly correlated then which type of regression model can be used. multiple regression can not be used rt?can we use the linear regression? can the tolerance of .1 and the VIF less than 10 not a good enough to indicate that there is no multicollinearity?
for your question i think the one with weak correlated one to be removed
My question..
1. Is multicollinearity a concern for predictive modeling. I mean the prediction is altered by neglecting this phenomenon or not.
2. In case of GAM do we have to worry about multicollinearity.
3. How collinearity inflates the variation.
Thanks Kunal for asking it. Answer to first question is prediction will not be impacted more however eoefficoents will be impacted.
2nd and 3rd, I will. Cover in other video
@@UnfoldDataScience thanks 👍. Really appreciate your videos.
which one will eliminate ? VIF of each features set the threshold >5
Should we need to remove multicollinearity while building time series model?
Not necessarily.
Sir can given some ideas on how to know which type of ml models is affected by multicollinearity?
Regression based model
Hi Aman it's very good explanation...please do video on penalised regression like lasso ridge and elastic..too much of mathematics into those please explain in simple way Thank you
Thanks Bhavani, Sure will do,
Great content.
Thanks a lot.
Hello sir, After reading the comments I saw the answer to your question. They said we have to remove the one which has less correlation coefficient with the target variable due to the correlation matrix. It confused me at one point, Can we say that the coefficients in front of each feature that we get after running the regression model indicate us impact of each feature on the target? So, I mean can I take these coefficients when I decide which feature I have to remove bw two correlated features instead of taking correlation matrix value with the target variable? Can we say that the coefficients in front of each feature actually say the same thing as the value in the correlation matrix with the target variable in this context?
regarding the question- which variable to remove out of a set of highly correlated variables? Can this be answered by PCA (principal component analysis)? or will the PCA weight them the same because they are highly correlated?
Hi Anmol , not in terms of pca, generally I asked.
Hi Aman, hope you are doing well !
I want to ask one thing, what you are mentioning regression models is related to linear models right not the tree based regression models am i correct ?
does multicolinearity effects the tree based models ?
Please please make a vedio how to select drivers in linear regression which drive the sales
can we remove highly negatively correlated features also or not? someone reply, please
Hello sir! Firstly thank you for the video!
I have 2 questions if you answer I will be glad:
1) Can we say that we don't need to be concerned about correlated features in for example decision tree-based models? I mean do we need this concept only in linear-based models?
2) Don't we need to touch correlated features when we use Lasso or Ridge regression is that true? Will the model do that by itself in that case? Don't we need to touch?
1. This is a problem with regression based models where coefficients come into picture.
2.still you need to take care.
@@UnfoldDataScience from you first answer, then why multicollinearity is not a problem in neural network? Pls make a video regarding this sir...
@@hemanthkumar42 Note that multicollinearity does not affect prediction accuracy of the linear regression ,it only make the interpretation harder in the linear regression and mostly for interpretation we go to linear regression and when we go to neural network we already know its type of blackbox and we dont want to interpret ,but want good prediction results ,thats why we dont bother about multicollinearity in neural network
Thankyou for excellent explanation. I have fews questions please:
1. I used Polynomial features method in sklearn and it significantly improved accuracy of my linear regression prediction model, but i found that the newly created features are correlated with the existing features since i created square and cubes! I understand as per your explanation that it will lead to multicollinearity problem! So i understand that the coefficients are not the true picture, However can i use this type of model for predictions?
2. What would you suggest the threshold correlation value for multicollinearity?
Thanks
Is multicollinearity is the problem for neural network?
Not always.
Good and perfect
Thank you
Remove the variable which have low impact on target variable...
Sir I hv 2 question
1. If there is multicollinearity in Classification problem. How to handle that
2. What is VIF & how standardization done
3. Can we use standard scaler in regression problem
There are three questions, I will cover them in separate video. Thanks for asking.
Great Explanation...
At 7.50 you said "that's why we should not have multicollinearity in regression" . So, Is it okay if we have multicollinearity in classification?? Could you please make it clear..
When I say, it means regression family of Algorithms. Logistic regression also.
@@UnfoldDataScience Thank you Aman Sir
Great.
Thanks Surya.
You always rocks :)
Thanks for watching Sudheesh.
great explanation
Thank you 🙂
I think which variable highly CO relate with target variable
Hi Aman, I have 9 categorical and 6 numerical columns and it's a regression problem.
So I can find the correlation between numerical using correlation heatmap but how to find the relation between categorical..??
Can I use chi square test..??
If I use I am getting all 9 categorical are dependent on each other. So what should be my next step..??
Please guide me.
Thanks
Yes, chi square can be used, I have a dedicated video for the same topic.
Should I check multicolinearty for classification problem?
For logistic regression - yes.
@@UnfoldDataScience Is it necessary to check multicollinearity between categorical features or numerical and categorical features??
Sorry so it means when there is multicollinearity for example x2 and x3, so if I increase x2, x3 will automatically increased? Great video by the way!
internally at some level yes. tq
Hi...Thanks for the nice explaination. Have a question that is multicollinearity a problem for linear regression only? if not then how its a problem for non-linear regression?
For regression based models like linear/logistic etc
Hi Aman, nice work, keep it up.....i have a doubt that why normal distribution is so important? why we need our independent variable should show normal distribution for a good model? i am not finding a satisfying answer. can you please help?
Hi Sharad, in simple language, its easy for the model to learn pattern if you give examples from a large set of range.(That is your normal distribution).
Take a example below:
Predict salary of an individual(Y - target) based on his/her expense(X variable)
Scenario 1 - in your training set you have Y as - 10LPA, 15LPA,20LPA, like that, here model wont be able to learn the pattern for 3LPA guys, may be there is difference is income/expense pattern for junior guys.
Scenario 2 - You give many values of Y from all over like 2LPA, 4LPS,5LPA,100LPA, all values like they are normally distributed.
Here its easy for model to learn pattern as it sees a range of values and the resulting model will be more reliable.
Hope its clear now.
@@UnfoldDataScience thanks for clarification . Does a huge dataset always show normal distribution?
No, not always...it depends on data
The independent variable with higher correlation among the similar factors should be removed
Awesome explanation, I have a question: if I have nominal,ordinal and continuous variables how can I find multicollinearity among them?
Hi Sidrah, answered.
I can't find your answer, I understand that we should use vif for continuous variables but what if I need to see correlation among all ordinal, numeric and nominal?
Remove that feature which is in less correlation with target.
Thank you!
Welcome.
Sir what to do if the multi collinearity is affecting the binary classification problem
many ways to take care of it. I have discussed in classification videos.
Great video. Please create video on python implementation of Lasso and ridge regression
Thanks Ameer. Sure!!
Thank you
Thanks again.
Thnks sr g I think uncecessary variable remove
Yes True.
I will find the correlation between x1 and y and x2 and y individually and see which one is lesser the one with lesser correlation i will delete it
Sir plz do the video for post pruning decision tree
ok Karthik
Sir please explain Lasso and ridge if you made it,link pl.
ruclips.net/video/7XvBwQeT9OI/видео.html
ruclips.net/video/21TgKhy1GY4/видео.html
Thanks
finished watching
Is Machine learning better than deep learning or deep learning better than machine learning
Depends on problem statement, data availability, Infra availability etc, can't say one is better then other
@@UnfoldDataScience oh ok got it ✌️
I want to be a data analyst but I want sequential courses from you please guide
www.unfolddatascience.com
Hi Aman,
Could you please make a detailed video explaining the difference between Gradient Boost, AdaBoost and ExtremeGradientBoosting?
Why is AdaBoost called adaptive? Is it only because it edits the weights of the misclassified instances? XGBoost and GradientBoost also are adaptive in that way, arent they?
Also, why are XGBoost and Gboost more robust to outliers than AdaBoost despite all of them having a term of log in their loss functions?
Would really appreciate your reply.
Thanks
It depends on feature importance. the feature with less importance will be dropped.
correct me if am wrong :0
Correct Sujith
Jo less Cor related ho usko remove karna hai
True. "less correlated with target" :)
at least two variables!
this shit is pure gold
Neither x1 or x2
excellent explanation