Hands-on Multicollinearity Treatment | Variance Inflation Factor | Data Preprocessing in Python
HTML-код
- Опубликовано: 30 ноя 2024
- Welcome to the next instalment of our Data Pre-processing series! In this practical, hands-on tutorial, we perform hands-on Multicollinearity treatment. If you missed our previous video covering theory and different approaches, you may refer the links provided.
Dataset Link - archive.ics.uc...
Complete Data Pre-processing Playlist - tinyurl.com/5c...
Multicollinearity theory - • The A to Z of Multicol...
PCA Hands-on - • MASTER Principal Compo...
Ridge and Lasso Hands-on - • Hands-on Data Science ...
In this video, we kick things off by introducing a real-world dataset that boasts 30+ features. But here's the twist: many of these features exhibit strong correlations, which can trouble our predictive models.
To address this issue effectively, we explore powerful Multicollinearity treatment approaches:
1) Feature Pruning based on Pearson's Correlation Coefficient:
We start by identifying pairs of highly correlated features using Pearson's correlation coefficient threshold. When two features are closely related, we make an informed decision to drop one of them, ensuring our dataset remains lean and mean. This method optimizes model performance and interpretability.
2) Recursive Feature Elimination with Variance Inflation Factor (VIF):
The second approach is a more advanced technique. We demonstrate how to apply VIF smartly using a while loop. This iterative process allows us to systematically eliminate features with high VIF, effectively mitigating multicollinearity. By doing so, we improve the stability and reliability of our models.
Finally, we compare the two approaches, highlighting the advantages and potential trade-offs of each. Understanding when to use Pearson's correlation threshold and when to employ VIF-based recursive elimination is crucial for tackling multicollinearity.
Happy Learning!
Thanks for this video and your way of explaining the topic is awesome. I can't believe your channel hasn't reached to the people yet.
Thank you! The journey of a thousand miles begins with a single step. We are happy it has reached you. 😊
10:52 I am running this and I get an error on the vif["value"]= [variance_inflation_factor(df.values,I)… line.
Its saying Type Error ufunc isfinite not supported for the input types and the inputs could not be safely coerced to any supported types
We'll check this.