Multicollinearity in Decision Trees

Поделиться
HTML-код
  • Опубликовано: 2 дек 2024

Комментарии • 24

  • @Septumsempra8818
    @Septumsempra8818 2 года назад +4

    This type of video is my favorite. It's unique on RUclips.
    s/o from South Africa

    • @DimitriBianco
      @DimitriBianco  2 года назад

      Thanks! I have a similar one coming out in a few weeks on how not to do cross validation and sampling.

  • @seanmichael6579
    @seanmichael6579 2 года назад +4

    Absolutely love this content and these kinds of videos. This was nuanced and came with real-world examples and experience. Beats anything I ever got from my stats grad school years. All the best.

    • @DimitriBianco
      @DimitriBianco  2 года назад +1

      Thanks! I'll try and make a few more of these types of videos.

  • @andresrossi9
    @andresrossi9 2 года назад +1

    I've lost this one for some reason. Anyway, as someone who knows decision trees quite "in-depth" I'd say this is a very clear lesson, very good material as always

  • @Yasharghami
    @Yasharghami 2 года назад +3

    Can't wait.

  • @Rizzickk
    @Rizzickk 2 года назад +1

    Please make more ✅

  • @QuantPy
    @QuantPy 2 года назад +1

    Like always, great quality video Dimitri!
    If you are looking for video suggestions, I would really like to see a video regarding the possible risks of creating models based on observed granger causality between financial timeseries (perhaps not explainable because of the high number of independent variables) that may have led to good out of sample prediction performance. Would like to hear a practical example of model monitoring (perhaps some of the more popular metrics you have used previously) that could help detect if the model is deteriorating.
    Thanks again for your effort placed into putting these types of videos together.

    • @DimitriBianco
      @DimitriBianco  2 года назад +2

      I'll look into making some videos around these ideas.

  • @Shawro
    @Shawro 2 года назад +1

    Hi Dimitri. Great video. I’m currently halfway through my first year of undergrad. I’m doing a dual math cs degree.
    I’ve chosen the Stats ‘track’ for the math part of my degree, but I’m not sure what the optimal ‘track’ for CS would be if I’m looking to best prepare myself for quant work.
    My options are data science, machine learning and scientific computing. I’m sure they’re all valuable skills to learn, but which do you think is the best foundation for quant work?
    Thanks in advance.

    • @DimitriBianco
      @DimitriBianco  2 года назад +1

      I would do scientific computing but all the are decent choices as ML and data science are taking off. Scientific computing should give you some nice math overlap and numerical analysis is a key part of quant finance.

  • @bhargav7476
    @bhargav7476 2 года назад +2

    Great stuff as always! I was wondering what's the average age group of your viewers?

    • @DimitriBianco
      @DimitriBianco  2 года назад +1

      85% is between 18 and 34 years old.

  • @didierdupont5784
    @didierdupont5784 2 года назад +2

    Great video!
    How would one avoid such a situation? In a scenario where there are thousands of predictors, I can hardly imagine looking at correlations before building the model could help, as there are just too many to manually go through. The same would apply when pruning a tree.

    • @DimitriBianco
      @DimitriBianco  2 года назад +4

      Cluster analysis. You create clusters based on statistical relationships using something like PCA. There will be a point when the value added from adding more clusters becomes trivial. Often we end up with around 20 clusters for 500 variables. Then you manually review the top few variables in each cluster and build a model with those variables which would give you around 60 final variables.

    • @didierdupont5784
      @didierdupont5784 2 года назад

      @@DimitriBianco Makes sense, thank you!

  • @Septumsempra8818
    @Septumsempra8818 Год назад

    How do we fix it?

  • @Jay-xb5du
    @Jay-xb5du 2 года назад

    Hi Dimitri, informative and great video as always! Just a quick question, do you personally think that a degree in statistics to then go onto a Mfe would give me a better chance to become a quant analyst, or a financial mathematics degree to then go onto an Mfe. Which degree do you think will prepare me better for a MFE too? Thanks

  • @FatmaNurAydin-p7r
    @FatmaNurAydin-p7r Год назад

    Do you have any suggestions for scientific articles on the topic you mentioned in the video? Thank you...

    • @DimitriBianco
      @DimitriBianco  Год назад

      No but you could Google and see if any come up. Multicollinearity can be logically drawn from the math and method of trees. You don't need a paper to come to this conclusion.

  • @jasdeepsinghgrover2470
    @jasdeepsinghgrover2470 2 года назад +1

    If one of the correlated variables is used in the split then other ones automatically become unlikely as they won't reduce impurity. Won't this help a Decision Tree be more robust?
    Added the question from premiere in case someone has the same doubt.

    • @DimitriBianco
      @DimitriBianco  2 года назад

      The strength of a decision tree is that it will prevent multicollinearity further down a branch. The issue is when variables are blindly selected based on correlation. If a wrong variable is used it is highly likely the tree will fail quickly which reduces the robustness.

    • @nyboret6384
      @nyboret6384 2 года назад +3

      @@DimitriBianco It is true, blindly selection of variable into model is a very dangerous business in ML/Datai and especially XAI that we wish to interpret Partial Dependency Plot blindly came with some wrong and/or noisy sign. Thanks for good explanations. From An Asian (Cambodian) Applied and Theoretical Economist’s Econometrics Mathematical Statistician

    • @Felix-vg4mv
      @Felix-vg4mv 2 года назад +1

      Imagine you where the FBI and you predicted crime stats based on ice cream sale. Suddenly in November a video posted on Facebook and then ensues mass riots. Ice cream sale wouldn't change yet crimes would rise.