Applied ML 2020 - 08 - Gradient Boosting

Поделиться
HTML-код
  • Опубликовано: 6 окт 2024

Комментарии • 4

  • @Han-ve8uh
    @Han-ve8uh 3 года назад

    Thanks for the analogy between Gradient Descent and Gradient Boost, that helps understanding, and GradientBoostingClassifier is built on regression trees is eye-opening. Could you help with some questions I can't find answers online?
    1. 4:50 why logreg objective is written like that?
    I was expecting something like y*log(p(y))+(1-y)*log(1-p(y))+L2 term. Here there is a C, a +1, and missing lambda on coefficient penalties, and seem to be missing a pair of () around yhat, compared to the logreg equation at 25:40
    2. Previous lecture on DT/RF mentioned warm_start, is partial_fit a totally orthogonal concept to warm_start, or there's some similarity? (i'm guessing something about how model parameters are initialized/learned/used). My understanding is both use existing parameters and not restart from 0, but warm_start is applied when adding new models when growing an ensemble/for searching hyperparameter of # trees and use the same dataset, while partial_fit is focused on tuning parameters in only 1 model, using more and more new datasets?
    3. 8:54 says SGDClassifier is faster than non-sgd linear models when millions of datapoints. Is this specifically refering to high # columns and not high # rows? Because it's the # columns that determine Hessian computation time and not rows? How would increased # rows make SGD version faster or non-sgd slower?
    4. 25:40 says "fit the gradient with a regression tree". I'm trying to understand how the learning problem is set up in this case. What would be the X and y values to fit the regression tree? It's strange to differentiate wrt yhat since yhat is an output that should not be controllable, whereas wrt w,b makes more sense because they are controllable inputs.

  • @TristanGueck29
    @TristanGueck29 Год назад

    Richtig guter Content Andreas! Danke. Hilft nochmal das von mir verschluckte Buch Statistical Learning Theory zu erweitern (auch weil es mit R angewandt wurde).

  • @yuhuang8447
    @yuhuang8447 4 года назад

    why on the slide 25 after binning we get number "1" to represent the original value "5" on the first row while on the 6th row, the number "3" represents the original "5"?

    • @AndreasMueller
      @AndreasMueller  4 года назад +2

      Each column has separate binning based on the quantiles in that column. I think you're comparing the first column with the 3rd column. All 5s in the first column are represented by 1.