13.3.2 Decision Trees & Random Forest Feature Importance (L13: Feature Selection)

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024

Комментарии • 19

  • @bobrarity
    @bobrarity 7 месяцев назад +5

    even though this video is pretty old, this video helped me, the way you explain things is easily understandable, subscribed already 😊

    • @SebastianRaschka
      @SebastianRaschka  7 месяцев назад +2

      Nice. Glad it's still useful after all these years!

  • @anikajohn6924
    @anikajohn6924 2 года назад +3

    Preparing for an interview and this is a great refresher. Thanks!

    • @SebastianRaschka
      @SebastianRaschka  2 года назад +1

      Glad to hear! And best of luck with the interview!!

  • @MrMAFA99
    @MrMAFA99 2 года назад +3

    really good video, you should have more subscribers!
    Greetings from Germany

  • @lekhnathojha8537
    @lekhnathojha8537 2 года назад +2

    very nice explained. So informative. Thank you for making this video.

  • @kenbobcorn
    @kenbobcorn Год назад

    Glad you added Lecture 13 after the fact for those that are interested. Also, do you a list of the equipment you use for video recording? The new tablet setup looks great.

  • @nooy2228
    @nooy2228 Год назад

    Thank you so much!!! Your videos are super great..

  • @abdireza1298
    @abdireza1298 11 месяцев назад

    Professor Raschka, please allow me to ask. Can statistical test procedures be implemented into feature coefficient values (such as Gini impurity)? Like in image 13:04, can we compare the values statistically if we obtained the mean and confidence interval of each feature importance (Proline, Flavanoids, etc) from cross-validation instead of the train-test split? (using Friedman test, or t-test, or Wilcoxon). I do not think any statistical restriction to apply statistical tests to any feature importance coefficient since they are numeric in nature, but I am afraid I missed something because I never encountered any paper that tests statistical feature coefficient. An expert opinion, as you are, is my referee in this case. Thank you, Professor.

  • @Jhawk_LA
    @Jhawk_LA Год назад

    Really great!

  • @yashodharpathak189
    @yashodharpathak189 9 месяцев назад

    Which method of feature selection is best if datasets have many categorical variables. I have a dataset which comprises continuous as well as categorical variables. What should be the approach in this case?

  • @AyushSharma-jm6ki
    @AyushSharma-jm6ki Год назад

    @Sebastian Can DTs and RFs also be used to select features for Regression models?

  • @cagataydemirbas7259
    @cagataydemirbas7259 Год назад

    Hi, when I use randomforest , DecisionTree and xgboost on RFE and to look feature_importances_, even if all of them tree based models, they returned completely different orders. On my dataset has 13 columns on xgboost one of feature importance rank is 1, same feature rank on Decisiontree is 10, an same feautre on Randomforest is 7. How can I trust wich feature is better than others in general purpose ? İf a feature is better predictive than others, shouldnt it be de same rank all tree based models ? I am so confused about this. Also its same on SquentialFeatureSelection

  • @jamalnuman
    @jamalnuman 7 месяцев назад +1

    great

  • @bezawitdagne5756
    @bezawitdagne5756 Год назад

    I were using correlation heatmap, p-value and information gain for feature selection, the values are pretty similar, but I use the result with all algorithms I were using , the accuracy decreased, and I tried using random feature importance , the result I get from RF feature importance has improve my accuracy, so please help me understand why ?

    • @SebastianRaschka
      @SebastianRaschka  Год назад +1

      I think it may depend on what your downstream model looks like. The correlation method may work better for generalized linear models than tree-based methods because tree-based methods have feature selection built-in already

  • @pulkitmadan6381
    @pulkitmadan6381 2 года назад