13.4.1 Recursive Feature Elimination (L13: Feature Selection)

Sebastian Raschka

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 26 дек 2021
In this video, we start our discussion of wrapper methods for feature selection. In particular, we cover Recursive Feature Elimination (RFE) and see how we can use it in scikit-learn to select features based on linear model coefficients.
Slides: sebastianraschka.com/pdf/lect...
Code: github.com/rasbt/stat451-mach...

Logistic regression lectures:
L8.0 Logistic Regression - Lecture Overview (06:28)
• L8.0 Logistic Regressi...
L8.1 Logistic Regression as a Single-Layer Neural Network (09:15)
• L8.1 Logistic Regressi...
L8.2 Logistic Regression Loss Function (12:57)
• L8.2 Logistic Regressi...
L8.3 Logistic Regression Loss Derivative and Training (19:57)
• L8.3 Logistic Regressi...
L8.4 Logits and Cross Entropy (06:47)
• L8.4 Logits and Cross ...
L8.5 Logistic Regression in PyTorch - Code Example (19:02)
• L8.5 Logistic Regressi...
L8.6 Multinomial Logistic Regression / Softmax Regression (17:31)
• L8.6 Multinomial Logis...
L8.7.1 OneHot Encoding and Multi-category Cross Entropy (15:34)
• L8.7.1 OneHot Encoding...
L8.7.2 OneHot Encoding and Multi-category Cross Entropy Code Example (15:04)
• L8.7.2 OneHot Encoding...
L8.8 Softmax Regression Derivatives for Gradient Descent (19:38)
• L8.8 Softmax Regressio...
L8.9 Softmax Regression Code Example Using PyTorch (25:39)
• L8.9 Softmax Regressio...
-------
This video is part of my Introduction of Machine Learning course.
Next video: • 13.4.2 Feature Permuta...
The complete playlist: • Intro to Machine Learn...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka
Наука

Комментарии • 22

@djethereal99 2 года назад ⁺⁴
Prof Raschka, this is fantastic content! I've been following your course for a while now and always end up learning something new. Really appreciate all your hard work that goes into creating this material and sharing it with the world.
@SebastianRaschka 2 года назад
Really glad to hear :)
@afreenbegum9098 Год назад ⁺¹
Thank you it helped me a lot in my major project.
@SebastianRaschka Год назад
Glad it helped! That's really cool to hear!
@emsif Месяц назад
thank you for this great leacture. In your other leacture about sequential feature selection you showed that backward selection (sbs) is superior than forward selection (sfs) according to a study. How does recursive feature elimination compare to sbs and sfs?
@camilafragoso2178 2 года назад ⁺¹
Thank you so much for the lectures, Prof Sebastian!
I was wondering what is the difference between the L1-regularized method and the RFE? I understand that one is embedded and the other is a wrapper, but they look pretty similar. Thanks in advance!
@SebastianRaschka 2 года назад ⁺¹
Let's take logistic regression as an example.
So for a logistic regression classifier with L1 regularization, you modify the loss function. This modification will push some of the weights towards zero.
RFE works a bit differently. You run a regular logistic regression classifier, and then after training, you set the smallest weight to zero. You can repeat this a number of times until you reached a user-specified number of iterations. E.g., if you have 5 rounds, 5 weights will be set to 0.
@ocamlmail Год назад
Thank you very much!
1)Can you explain about wrapper methods -- can I use some fraction of training data set because using with whole dataset sometimes can be huge? If yes, what is best practices?
2)Is there any usage of test data for feature selection or I should select features based on training set only?
@weii321 9 месяцев назад
Is it a must to split data into x_train, x_test, y_train, y_test when using RFE?
@shubhamtalks9718 2 года назад ⁺¹
Don't we need to do hyperparameter tuning of the estimator inside the RFE? If no, then why?
@SebastianRaschka 2 года назад ⁺²
Yes, you can tune the estimator inside as well. In practice, for linear models, it is probably not necessary as there is not much to tune except regularization penalties (and they would be weird to combine with RFE). And for random forests you usually also don't have to tune much in practice. For other estimators you might consider tuning its parameters.
@rajeshkalakoti2434 2 года назад ⁺¹
Doesn't RFE work with non-linear model models like DT, EXtra tree classifier, random forest, and k-nn? since Every model has coefficients weights.
@SebastianRaschka 2 года назад ⁺¹
Yeah. Originally, it is based on the weight coefficients of a linear model via .coef_. I think it was modified to work more generally now via .feature_importances_. So, yeah, you could combine it with decision tree classifiers etc. now, but not sure how "good" the results are.
@rajeshkalakoti2434 2 года назад ⁺¹
@@SebastianRaschka yes, correlation coefficient feature ranking method is used on svm when the first time paper is published. But I am still confused, is corralation feature ranking method same for all remaining classifiers (estimator) on RFE- scikit learn? Thank you. Your answer is so valuable.
@SebastianRaschka 2 года назад ⁺¹
@@rajeshkalakoti2434 Afaik RFE-based feature selection is usually based on model coefficients. Correlation-based feature selection is usually based on p-values (the p-values here are in context of the null-hypothesis that the feature have no effect on the target variable). This is usually also done with a linear regression model, but you could maybe also use a SVM regression model for that. Also note that what I am describing above is more of a general description and not specific to a paper (sorry, I haven't read the paper you are referring to)
@adityask277 Год назад
Hi prof raschka. Can we use coefficients for feature importance even when features are normalised. It feels like a wrong parameter to use when p values are present
@SebastianRaschka Год назад
To me, it's more the opposite: if you want to use the coefficients for feature importance, I'd recommend to normalize the features. E.g., if you have a feature like meters and kilometers, the meters number is always 1000 times higher. So if you use meters instead of kilometers in your model, the corresponding weight coefficient will probably end up 1000x smaller. So, if you want to compare feature coefficients, it makes sense to me to have the features all on the same scale
@sasaglamocak2846 Год назад
Hello Sebastian, really great content! I have two questions and it would be great to get an answer.
1. Is RFE actually model agnostic, why or why not?
2. Could I use RandomForest, Gradient Boosting, Neural Networks as RFE core algorithm and is it recommendable, if not, why?
Thank you a lot!!!
@SebastianRaschka Год назад ⁺²
It's model agnostic to some extend, but not completely. You need parameters for the selection, and tree-based models don't have that.
@sasaglamocak2846 Год назад ⁺²
@@SebastianRaschka Thank you! Also, I think it would be great if you could give us some kind of roadmap, recommendations, suggestions to achieve our goal and become machine learning expert. I am lost in the ocean of tutorials and there is no full path from 0 to end.
@SebastianRaschka Год назад ⁺³
@@sasaglamocak2846 That's a very good point, and at the same time this is a very hard thing to make a recommendation about, because the path is different for everyone. I have many successful colleagues who only studied the computational aspects, and I have many successful colleagues who mostly studied mathematical topics. Both are successful in their own way. I think the best path is completing 1 or 2 introductory courses or books and then seeing what interests you most and then working on that (and reading more along the way)
@sasaglamocak2846 Год назад
@@SebastianRaschka Thank you!

Следующие

Автовоспроизведение

13.4.2 Feature Permutation Importance (L13: Feature Selection)