CS480/680 Lecture 24: Gradient boosting, bagging, decision forests

Поделиться
HTML-код
  • Опубликовано: 12 янв 2025

Комментарии • 11

  • @vihaanrajput8082
    @vihaanrajput8082 3 года назад

    Prof, can't we do regularization like L1 or L2 to overcome overfitting in gradient boosting?
    Some panetly of k!
    25:51

  • @vihaanrajput8082
    @vihaanrajput8082 3 года назад

    Prof. please drop computational linear algebra playlist.
    Thanks

  • @Michael-kt3tf
    @Michael-kt3tf 4 года назад

    I dont understand for the regreesion. Why average the predictions works for the regression. Since each predictors are weak learner, won't average them still give a poor result?

    • @fredliu168
      @fredliu168 4 года назад

      For bagging or random forest, each predictor has high variance (due to overfit). Taking an average of these trees reduces the variance.

    • @itsmekrazzy41
      @itsmekrazzy41 4 года назад +1

      The key idea is that the individual predictors have low bias but suffer from high variance. Upon averaging, the bias (which was already small) remains unchanged, the variance however scales down. As a result, the mean squared error (which is bias^2 + variance) decreases because of the averaging operation.

    • @Michael-kt3tf
      @Michael-kt3tf 4 года назад

      Ashish Katiyar yeah, but isnt regression output a continuous value. I understand it works for the classification

    • @itsmekrazzy41
      @itsmekrazzy41 4 года назад +2

      @@Michael-kt3tf The idea is exactly the same for regression too. Since the individual predictors are unbiased and (hopefully) close to independent, one can expect the outputs of the predictors to be spread around the ideal prediction. In such a scenario, averaging should give you something close to the ideal prediction. Note that it is important for the predictors to be independent. If they are not, in particular, if they are positively correlated, one can expect all of them to have similar error, and hence averaging wouldn't be very useful.

    • @Michael-kt3tf
      @Michael-kt3tf 4 года назад

      Ashish Katiyar got it, thank you

  • @user-skxfwmm
    @user-skxfwmm 4 месяца назад

    finished. 2024/9/2

  • @anujshah645
    @anujshah645 4 года назад

    So in gradient boosting, there will be just one learner which gets optimized by fitting to the residuals in subsequent step? Unlike Adaboost where we have multiple weak learners

    • @itsmekrazzy41
      @itsmekrazzy41 4 года назад +1

      In gradient boosting also there will be multiple weak learners. It is a forward stepwise additive process. At each step, you fit a new (high bias) weak learner to the gradient (which happens to be the residuals) and add it to the current estimate of the model.