At 9:20, wouldn't there be a total of 29 subsets (each containing 28 variables), and not 28 subsets? Each subset consists of the entire dataset minus one feature. So there would in total be one subset for each feature in the original dataset, right? That makes 29 subsets each consisting of 28 features. Am I missing something, or is this an error?
I'm using your work of effort, mlxtend, in my research and it's awesome to have your lecture along with the package, can't ask for more! Please continue the series. Big thanks from water sciences community!
Thanks for the kind words, glad to hear that both are useful to you (and the water sciences community!). I am hoping I'll find more time to make additional videos in the future for sure!
Thanks a lot, I am really thankful to hear that! I spent the last couple of days learning about Final Cut Pro and how to improve the audio quality (removing background noise and room echos). Glad to hear that it was time worth spent :)
In the sequential backward selection (time = 10:55), stage 02, though we remove 1 features making the feature count as 28, we still get 29 feature subsets right? (It says 28). Can you help me clarify this?
Hm, I am not quite sure I understand the question correctly. So if we have 29 features, we have feature subsets of 28 features each after the first round.
@@SebastianRaschka I have the same question. If we have 5 features, subsets will be [x,2,3,4,5], [1,x,3,4,5],[1,2,x,4,5],[1,2,3,x,5],[1,2,3,4,x]. We can see that we got 5 subsets, and each subset has 4 features. Therefore if we have 29 features at the beginning, 29 subsets would be obtained while the number of features in each subset is 28.
Hi, thank you for the excellent explanation. One quetion please, in sequential floating forward selection, when will the floating round happen? Is the algorithm do it every round after adding a new feature? or it just do it randomly?
Thanks so much for the videos. Great presentation. I believe in your in your Feature Permutation Importance video you stated that the process was model agnostic. Is SFS also model agnostic? I would like to use this with a LSTM model but am not sure if it would be a correct application.
Yes, SFS is also model agnostic. My implementation at rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/ is only scikit-learn compatible at the moment, but the concept itself is model agnostic.
@@SebastianRaschka I've attempted several different implementations and done some research but it always seems that these wrapper methods only work with 2D arrays and not the LSTM 3D arrays. Is there a Feature Selection process, other than correlation, which already exists that you like best for LSTM's? Thanks
That's a good question. In this case, you can one-hot encode the categorical variables. And then, optionally, you can treat each set of binary variables (that belong to the original categorical variable) as a fixed feature set. I have an example for that at the bottom here: rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/
@Sebastian Raschka thank-you very much I will certainly read up to this. I attacked this problem (not sure if this was the most optimal) by separating the categorical variables and continuous variables, (the keyword is doing scanning them separately) First scanning trough the categoricals only (removing the less significants) , then doing the same for the continuous variables using in both cases : ---- scipy.stats.pointbiserialr.html What do you think about my approach? Thanks
Hi Sebastian, thank you so much for the videos. I really loved watching them. Just a few questions on feature selection techniques. 1. How to pick one of the wrapper methods? I mean how to select a feature selection technique(wrapper) 😅 2. Why do I even have to use wrapper methods? Can't I simply put all the features in a random forest model and use its feature importance for selecting the features and train a new model with selected features. It seems a lot simpler and faster to me than training 10-20 different models in any wrapper method.
Glad you found them useful! Sure, if you use a random forest, then you can use the random forest feature importance for selection. However, the advantage of wrapper methods is that they work with any model, not just random forests.
@@SebastianRaschka Thanks for replying. Just a follow-up question on feature selection. Let's say I'm working on a binary classification problem on tabular data. Now, how to decide if I should use Forward Selection, Recursive Feature Elimination, Permutation Importance or simple feature importance from Random Forest? There are multiple ways to do the same thing but the end results won't be the same. We can end up with different combination of features from different approaches. So, should I try all of them or select one only? If it's the latter one, then how to pick one in this scenario?
its a little bit out of the box question but can you tell me if we consider the feature selection as multiobjective problem, in which most of the people are consider objectives as sensitivity and specificity or accuracy and feature number by using the optimization (one of the wrapper based approach). what are the other objectives we might look into? Thank you sir in advance.
At 9:20, wouldn't there be a total of 29 subsets (each containing 28 variables), and not 28 subsets? Each subset consists of the entire dataset minus one feature. So there would in total be one subset for each feature in the original dataset, right? That makes 29 subsets each consisting of 28 features. Am I missing something, or is this an error?
sebastian you are a blessing
I'm using your work of effort, mlxtend, in my research and it's awesome to have your lecture along with the package, can't ask for more! Please continue the series.
Big thanks from water sciences community!
Thanks for the kind words, glad to hear that both are useful to you (and the water sciences community!). I am hoping I'll find more time to make additional videos in the future for sure!
I appreciate the video, your explanation style and your mlxtend package. Thank you very much for all the work you do!!
Awesome to hear! Thanks for the kind words!
- High-end video quality & thorough content 💙 Really enjoy your lecture 👨💻 Thanks for posting Dr Sabastian!
Thanks a lot, I am really thankful to hear that! I spent the last couple of days learning about Final Cut Pro and how to improve the audio quality (removing background noise and room echos). Glad to hear that it was time worth spent :)
this video is really helpful. Highly recommended!
In the sequential backward selection (time = 10:55), stage 02, though we remove 1 features making the feature count as 28, we still get 29 feature subsets right? (It says 28). Can you help me clarify this?
Hm, I am not quite sure I understand the question correctly. So if we have 29 features, we have feature subsets of 28 features each after the first round.
@@SebastianRaschka I have the same question.
If we have 5 features, subsets will be [x,2,3,4,5], [1,x,3,4,5],[1,2,x,4,5],[1,2,3,x,5],[1,2,3,4,x]. We can see that we got 5 subsets, and each subset has 4 features.
Therefore if we have 29 features at the beginning, 29 subsets would be obtained while the number of features in each subset is 28.
During Sequential Forward Selection (around 16:30): Do you also add a new feature to the set if the performance is worse than without any new feature?
great explanation. very clear. keep going
thanks for the kind words!
Hi, thank you for the excellent explanation.
One quetion please, in sequential floating forward selection, when will the floating round happen?
Is the algorithm do it every round after adding a new feature? or it just do it randomly?
Good question. It's actually every round.
Thanks so much for the videos. Great presentation.
I believe in your in your Feature Permutation Importance video you stated that the process was model agnostic.
Is SFS also model agnostic? I would like to use this with a LSTM model but am not sure if it would be a correct application.
Yes, SFS is also model agnostic. My implementation at rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/ is only scikit-learn compatible at the moment, but the concept itself is model agnostic.
@@SebastianRaschka I've attempted several different implementations and done some research but it always seems that these wrapper methods only work with 2D arrays and not the LSTM 3D arrays.
Is there a Feature Selection process, other than correlation, which already exists that you like best for LSTM's? Thanks
@@russwedemeyer5602 I wish I had a good answer for that, but I have not tried sth like this, yet
what if you got mixing of several categoricals and continuous variables ?
That's a good question. In this case, you can one-hot encode the categorical variables. And then, optionally, you can treat each set of binary variables (that belong to the original categorical variable) as a fixed feature set. I have an example for that at the bottom here: rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/
@Sebastian Raschka thank-you very much I will certainly read up to this. I attacked this problem (not sure if this was the most optimal) by separating the categorical variables and continuous variables, (the keyword is doing scanning them separately)
First scanning trough the categoricals only (removing the less significants) , then doing the same for the continuous variables using in both cases :
---- scipy.stats.pointbiserialr.html
What do you think about my approach? Thanks
Hi Sebastian, thank you so much for the videos. I really loved watching them. Just a few questions on feature selection techniques.
1. How to pick one of the wrapper methods? I mean how to select a feature selection technique(wrapper) 😅
2. Why do I even have to use wrapper methods? Can't I simply put all the features in a random forest model and use its feature importance for selecting the features and train a new model with selected features. It seems a lot simpler and faster to me than training 10-20 different models in any wrapper method.
Glad you found them useful! Sure, if you use a random forest, then you can use the random forest feature importance for selection. However, the advantage of wrapper methods is that they work with any model, not just random forests.
@@SebastianRaschka Thanks for replying. Just a follow-up question on feature selection.
Let's say I'm working on a binary classification problem on tabular data. Now, how to decide if I should use Forward Selection, Recursive Feature Elimination, Permutation Importance or simple feature importance from Random Forest? There are multiple ways to do the same thing but the end results won't be the same. We can end up with different combination of features from different approaches. So, should I try all of them or select one only? If it's the latter one, then how to pick one in this scenario?
Very good, thank you
Excellent video, thank you!
its a little bit out of the box question but can you tell me if we consider the feature selection as multiobjective problem, in which most of the people are consider objectives as sensitivity and specificity or accuracy and feature number by using the optimization (one of the wrapper based approach). what are the other objectives we might look into? Thank you sir in advance.
Thank you so much sir !!
You are welcome!
greate leasson. thanks!!