Thank you very much for the video. The easier formula for computing the sum of m choose k for getting the number of exhaustive search runs is 2^m - 1. Cheers
This was a great help, I was just trying to determine the best FS method for my dataset. I've now subscribed and I'm looking forward to checking out all the videos on the playlist, thank you!
@@abhishek-shrm That's a good question. I'd say focusing on some personal project and applying those techniques, or Kaggle. When you are doing that, you'll also automatically want to learn more specific things, look for specific papers, and so forth. I think working on a project is really a worthwhile learning experience.
Amazing content Sebastian! Your explanations in all your videos are amazingly clear :) One short question however: It seems like that both presented feature selection functions base the feature selection on the scores in the training data set, and that these functions do not make use on a validation/testing data set. Is this right or am I overseeing something? Wouldn't it be better to base feature selection on the scores on a testing data set, to avoid overfitting?
Yes and no. Yes, the selection is based on the training set. But it is using k-fold cross-validation. E.g., if you set cv=5, there will be 5 classifiers fitted for each feature combination, and the score is the average score over the 5 validation folds. You can of course override it and use the regular holdout method like you would do it in GridSearchCV etc by providing a custom object to the cv argument.
Hello, Sebastian, I watched all the lectures and I wanted to thank you for the course! Would it be possible for you to share the lecture notes? The link you posted under one of the previous videos does not seem to work
Great video! If we want to choose 2 features from for example 60 features. Although it reduces the accuracy, does it mean that if we use this model, it would choose the BEST two models that's more accurate than the other feature combinations of two?
Yes, that's exactly right. It will choose among the different 2-feature subsets, it would choose the best one (but that best one might be worse than the 60-feature subset)
Thanks for the great content. I find that the SFS method can overfit on CV as the selected features are sensitive to the random seed when N features is large (say N > 100). Is it possible to do nested CV, where you'd perform SFS based on a subset of the data (train, val) then evaluate generalisation on the test data? In that case, a 5-fold outer CV would have 5 different selected feature sets. Are there any approaches to then select the top features?
First of all, thank you very much for your understandable video! However, your case is a classification problem, I wonder how if I use it for a regression problem. Can I use KNeighborsClassifier or replace it with another estimator? And is there any way to display run progress in scikit-learn (like verbose in mlxtend). Because I have very large input (tens of thousands) so I want to know how much it takes time?
Yes, you can replace it with any other scikit-learn estimator for classification or regression. E.g., you could use a RandomForestRegressor or anything you like really. Btw, if you use a scikit-learn regressor, it will automatically use the R2 score for scoring like it's done in other scikit-learn methods. However, you can of course change it to something else. There is a "scoring" parameter for that similar to GridSearchCV. Also, you can show the progress via the "verbose" parameter: verbose : int (default: 0), level of verbosity to use in logging. If 0, no output, if 1 number of features in current set, if 2 detailed logging i ncluding timestamp and cv scores at step.
Thanks again for such an informative and exhaustive video explanation! I had one query on SequentialFeatureSelector in sklearn.feature_selection. Unlike mlxtend; in sklearn there's no option for using "best" in number of features to be selected. Instead, there's an auto option which has to be used with another parameter tol: n_features_to_select: “auto”, int or float, default=’warn’ If "auto", the behaviour depends on the tol parameter: if tol is not None, then features are selected until the score improvement does not exceed tol; otherwise, half of the features are selected. I am using this in a regression setting; any suggestion on how this can be used to get the best features as we get in mlxtend? I am using a score as : 'neg_mean_squared_error'
That's a good question, and I don't know the answer to that, sry. Actually, I have only very limited experience with the sklearn implementation since I am typically using the original SFS in mlxtend.
Thanks for the video. How does one determine the best estimator for the SFS object? I'm using it for a regression problem, but I have hard time determining whether to use LASSO, RF or something else without any source to support it. All the best!
Happy new year! Thank you for very interesting video. I have an applied question to SFS from mlxtend. KNN is simple enough algorithm which hasn't got stop mechanism. I want to use XGBClassifier and it is subject to overfit. Therefore I usually stop it using validation sample. I want to ask, how can I stop my xgbclassifiers during SFS procedure? On the one hand, if I use small number of trees, I will have underfitted algorithms, whose will not allow to choose optimal set of features. On the other hand, if I use too large number of trees, many algorithms will be overfitted and choice of features will be again incorrect (Such procedure will be subject to large fluctuations). Thus I need some mechanism to stop each xgbclassifier in right moment independently. Can you advice me something useful to resolve this issue?
Thank you so much for this video! It is really helpful! May I ask if you could kindly remind me what this score is? Is it p value? I am grateful for your reply!
Dear Sebastian, thank you so much for the Video! I was wondering whether there is a way to use a sk.learn pipeline as the estimator for the Sequential Feature Selector? My pipeline only consists out of a Column Transformer (one-hot encoder) and a Standard Scaler and I can't seem to get it to work. An answer would be much appreciated!
No, that's currently not possible. What you can do is you can provide a custom scoring method (scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html) and provide that as input. Here you could do something like the average of accuracy and F1 score.
Thank you very much for the video. The easier formula for computing the sum of m choose k for getting the number of exhaustive search runs is 2^m - 1. Cheers
This was a great help, I was just trying to determine the best FS method for my dataset. I've now subscribed and I'm looking forward to checking out all the videos on the playlist, thank you!
Mr. Raschka, Please complete the series, Lots of Love from India
Thank you for this playlist it was very helpful and everything is very well explained! Are the feature extraction videos posted yet?
Thanks! Regarding the Feature Extraction vids, I was unfortunately running out of time! Hopefully one day, though!
Thanks, Sebastian for making this playlist!
Glad to hear you are liking it!
@@SebastianRaschka What should be the next steps after completing this playlist?
@@abhishek-shrm That's a good question. I'd say focusing on some personal project and applying those techniques, or Kaggle. When you are doing that, you'll also automatically want to learn more specific things, look for specific papers, and so forth. I think working on a project is really a worthwhile learning experience.
Could you please finish series 14,15, and 16? Loved the playlist.
Glad to hear you like it! I am currently juggling too many projects, but I hope I can get to it some day!
Kindly update the playlist..There is no video on feature selection which you mentioned in this video....Thanks for your playlist, really helpful
Amazing content Sebastian! Your explanations in all your videos are amazingly clear :)
One short question however: It seems like that both presented feature selection functions base the feature selection on the scores in the training data set, and that these functions do not make use on a validation/testing data set. Is this right or am I overseeing something? Wouldn't it be better to base feature selection on the scores on a testing data set, to avoid overfitting?
Yes and no. Yes, the selection is based on the training set. But it is using k-fold cross-validation. E.g., if you set cv=5, there will be 5 classifiers fitted for each feature combination, and the score is the average score over the 5 validation folds. You can of course override it and use the regular holdout method like you would do it in GridSearchCV etc by providing a custom object to the cv argument.
@@SebastianRaschka Thanks Sebastian, this is clear now!
Hello, Sebastian, I watched all the lectures and I wanted to thank you for the course! Would it be possible for you to share the lecture notes? The link you posted under one of the previous videos does not seem to work
thank you so much. your videos are really helpful!
Great video! If we want to choose 2 features from for example 60 features. Although it reduces the accuracy, does it mean that if we use this model, it would choose the BEST two models that's more accurate than the other feature combinations of two?
Yes, that's exactly right. It will choose among the different 2-feature subsets, it would choose the best one (but that best one might be worse than the 60-feature subset)
@@SebastianRaschka okay thank you!
Thanks for the great content. I find that the SFS method can overfit on CV as the selected features are sensitive to the random seed when N features is large (say N > 100). Is it possible to do nested CV, where you'd perform SFS based on a subset of the data (train, val) then evaluate generalisation on the test data? In that case, a 5-fold outer CV would have 5 different selected feature sets. Are there any approaches to then select the top features?
First of all, thank you very much for your understandable video!
However, your case is a classification problem, I wonder how if I use it for a regression problem. Can I use KNeighborsClassifier or replace it with another estimator?
And is there any way to display run progress in scikit-learn (like verbose in mlxtend). Because I have very large input (tens of thousands) so I want to know how much it takes time?
Yes, you can replace it with any other scikit-learn estimator for classification or regression. E.g., you could use a RandomForestRegressor or anything you like really. Btw, if you use a scikit-learn regressor, it will automatically use the R2 score for scoring like it's done in other scikit-learn methods. However, you can of course change it to something else. There is a "scoring" parameter for that similar to GridSearchCV.
Also, you can show the progress via the "verbose" parameter:
verbose : int (default: 0), level of verbosity to use in logging.
If 0, no output, if 1 number of features in current set, if 2 detailed logging i ncluding timestamp and cv scores at step.
Thanks again for such an informative and exhaustive video explanation!
I had one query on SequentialFeatureSelector in sklearn.feature_selection. Unlike mlxtend; in sklearn there's no option for using "best" in number of features to be selected. Instead, there's an auto option which has to be used with another parameter tol:
n_features_to_select: “auto”, int or float, default=’warn’
If "auto", the behaviour depends on the tol parameter:
if tol is not None, then features are selected until the score improvement does not exceed tol; otherwise, half of the features are selected.
I am using this in a regression setting; any suggestion on how this can be used to get the best features as we get in mlxtend? I am using a score as : 'neg_mean_squared_error'
That's a good question, and I don't know the answer to that, sry. Actually, I have only very limited experience with the sklearn implementation since I am typically using the original SFS in mlxtend.
@@SebastianRaschka no issues! Will use mlxtend in that case 😊
Thanks for the video. How does one determine the best estimator for the SFS object? I'm using it for a regression problem, but I have hard time determining whether to use LASSO, RF or something else without any source to support it. All the best!
Happy new year! Thank you for very interesting video. I have an applied question to SFS from mlxtend. KNN is simple enough algorithm which hasn't got stop mechanism. I want to use XGBClassifier and it is subject to overfit. Therefore I usually stop it using validation sample. I want to ask, how can I stop my xgbclassifiers during SFS procedure? On the one hand, if I use small number of trees, I will have underfitted algorithms, whose will not allow to choose optimal set of features. On the other hand, if I use too large number of trees, many algorithms will be overfitted and choice of features will be again incorrect (Such procedure will be subject to large fluctuations). Thus I need some mechanism to stop each xgbclassifier in right moment independently. Can you advice me something useful to resolve this issue?
Thank you so much for this video! It is really helpful! May I ask if you could kindly remind me what this score is? Is it p value? I am grateful for your reply!
Dear Sebastian, thank you so much for the Video! I was wondering whether there is a way to use a sk.learn pipeline as the estimator for the Sequential Feature Selector? My pipeline only consists out of a Column Transformer (one-hot encoder) and a Standard Scaler and I can't seem to get it to work. An answer would be much appreciated!
same
Can we use more than 1 scoring method? Let's say we use accuracy and f1-score.
No, that's currently not possible. What you can do is you can provide a custom scoring method (scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html) and provide that as input. Here you could do something like the average of accuracy and F1 score.
Model has to be fitted before set inside the SFS?
No, it is an unfitted model. It then gets fitted on the training set with the "modified" feature set in each round