Machine Learning Lecture 33 "Boosting Continued" -Cornell CS4780 SP17

Поделиться
HTML-код
  • Опубликовано: 22 ноя 2024

Комментарии • 26

  • @amuro9616
    @amuro9616 5 лет назад +13

    I think the intuition explained in all his lectures are amazing and so helpful. This is probably one of most approachable lectures on ML there is.

  • @ioannisavgeros167
    @ioannisavgeros167 5 лет назад +10

    Thanks a lot professor for the entire course and greetings from Greece. I studied data science and machine learning on my master but your lectures are entirely pure masterpiece.

  • @varunjindal1520
    @varunjindal1520 3 года назад +4

    The fact that all the topics are covered so exhaustively make it a must watch. I started from Decision Trees, but I will re-watch whole series.
    Thank You Killian for posting videos.

  • @projectseller702
    @projectseller702 5 лет назад +7

    I watched some videos and they are really amazing. He explained very easy to understand. Thanks you Sir for sharing it. I appreciate it

  • @sudhanshuvashisht8960
    @sudhanshuvashisht8960 4 года назад +1

    Unlike the Squared loss function, to my understanding, the exponential loss won't be minimum at H = Y (vector of labels). I take a very simple case of a dataset of the two labels (say [+1,+1]), Loss function at H = [+1,+1] is 2*exp(-1 * 1) whereas at H = [+2,+2] is 2*exp(-1*2) where the latter is clearly the minimum. Does it contradict your contour lines at 5:05, Dr. Killian? Grateful for all the explanations you've provided so far.

    • @kilianweinberger698
      @kilianweinberger698  4 года назад +2

      Yes, good point, for the exponential loss the solution is always somewhere in the limit. Still, the principle is the same ...

  • @manikantabandla3923
    @manikantabandla3923 Год назад

    Such an amazing lecture on Gradient boosting.
    Can you provide any reference towards Gradient Boosted Classification Trees.
    -- Like what is the loss function used in that case?
    --What is the training dataset used for building classifier ht?
    Thanks in advance.

  • @saad7417
    @saad7417 4 года назад +3

    Sir, could i please please pretty please get the coding assignments too? These intuition building in these lectures is perfect, the only thing needed is i think the coding assignments

  • @Theophila-FlyMoutain
    @Theophila-FlyMoutain 8 месяцев назад

    Hi Professor, thank you so much for the lecture. I wonder if it's possible that AdaBoost stops when training error is zero? Because I see from your demo, after training error is close to zero and exponential loss goes smaller and smaller, the test error doesn't change too much. I guess we don't need to waste time on letting exponential loss smaller and smaller.

    • @kilianweinberger698
      @kilianweinberger698  6 месяцев назад

      No it typically doesn't stop when zero training error is reached. The reason is that even if the training error is zero, the training LOSS will still be >0 and can be further reduced (e.g. by increasing the margin of the decision boundary).

  • @Gauravkriniitkgp
    @Gauravkriniitkgp 4 года назад +1

    Hi professor, great intuition for explaining why adaboost overfits slowly and can be observed on log(#iterations) scale. One question though, I work with GBM and XGboost all the time and they also behave very similar to adaboost when it comes to overfitting. Do you have any intuition behind this?

    • @kilianweinberger698
      @kilianweinberger698  4 года назад +1

      Hmm, good question ... XGBoost in its vanilla form is just GBRT with squared loss - so the same rational doesn’t apply here. My guess would be that your data set is large enough that it may take a long time to overfit.

  • @vocabularybytesbypriyankgo1558
    @vocabularybytesbypriyankgo1558 Месяц назад

    Thanks a lot !!

  • @AmitKumar-vy3so
    @AmitKumar-vy3so 5 лет назад +1

    Two loss are the global loss and the one with the weak learners right,Sir?

  • @galhulli5081
    @galhulli5081 4 года назад

    Hi Prof Kilian. Quick question on Boosting method.
    I watched the videos twice already (and I was also a certificate program student) but couldnt see any explanation..
    Previous lecture mentions that one of the requirements of boosting method is that the weak learners must at least point to the right direction..
    How can we check that the weak learners are on the right direction to run the boosting? Does this happen with trial-error or is there a method or a way?
    Thanks for the great class!!!

    • @kilianweinberger698
      @kilianweinberger698  4 года назад +4

      In AdaBoost, the error on the re-weighted training set must be 0.5 (or you would just flip the predictions), so you stop the moment your best weak learner has an error =0.5 (which means you just cannot learn anything useful anymore).
      In AnyBoost the inner product of the weak learner predictions and the gradients should be >0. (Same thing, it can never be

    • @galhulli5081
      @galhulli5081 4 года назад

      Thank you very much for the explanation, I will check my runs accordingly!

  • @gregmakov2680
    @gregmakov2680 2 года назад

    sao chi co 10 ngan luot coi video nay!! trong khi may video nham kia lai qua troi nguoi coi :D:D:D cac ban sinh vien xu thien dia nguc hong biet chon hang gi het tron ah.

  • @jiahao2709
    @jiahao2709 4 года назад +1

    build in system against overfitting, because \alpha decrease

  • @sanjaykrish8719
    @sanjaykrish8719 4 года назад

    Can we say Adaboost is like coordinate descent in the functional space??

  • @gregmakov2680
    @gregmakov2680 2 года назад

    hahahahaha, adaboost is never overfitting, of course!! fixing a bug tends to create another bug :D:D

  • @MrAngryCucaracha
    @MrAngryCucaracha 4 года назад

    how can adaboost work for svm? wouldnt the linear combination of such a linear classifier be linear?

    • @kilianweinberger698
      @kilianweinberger698  4 года назад

      In AdaBoost we assume each weak learner only outputs +1 / -1. So you have to take the output of each linear classifier and apply the sign() function. Now you are combining multiple linear classifiers in a non-linear fashion.

    • @MrAngryCucaracha
      @MrAngryCucaracha 4 года назад

      @@kilianweinberger698 Thank you very much for your answer Prof. Weinberger. I see now that by taking only the sign of each classifier you are applying the step function and they are no longer linear, similar to an activation function in a neural network (now I wonder if there would be any advantage of using other functions for boosting).
      I would also like to thank you for your amazingly clear and understandable course. I can say that I have understood all topics (even gaussian processes, which I previously believed to be impossible), and I will be very interested too watch if you decide to do any further videos, either other courses or opinion pieces, and to contribute to any patreon-like funding. Best regards.