Gradient Boost Part 3 (of 4): Classification

Поделиться
HTML-код
  • Опубликовано: 19 июн 2024
  • This is Part 3 in our series on Gradient Boost. At long last, we are showing how it can be used for classification. This video gives focuses on the main ideas behind this technique. The next video in this series will focus more on the math and how it works with the underlying algorithm.
    This StatQuest assumes that you have already watched Part 1:
    • Gradient Boost Part 1 ...
    ...and it also assumed that you understand Logistic Regression pretty well. Here are the links for...
    A general overview of Logistic Regression: • StatQuest: Logistic Re...
    how to interpret the coefficients: • Logistic Regression De...
    and how to estimate the coefficients: • Logistic Regression De...
    Lastly, if you want to learn more about using different probability thresholds for classification, check out the StatQuest on ROC and AUC: • THIS VIDEO HAS BEEN UP...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    This StatQuest is based on the following sources:
    A 1999 manuscript by Jerome Friedman that introduced Stochastic Gradient Boost: statweb.stanford.edu/~jhf/ftp...
    The Wikipedia article on Gradient Boosting: en.wikipedia.org/wiki/Gradien...
    The scikit-learn implementation of Gradient Boosting: scikit-learn.org/stable/modul...
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    RUclips Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statquest #gradientboost
  • КиноКино

Комментарии • 517

  • @statquest
    @statquest  4 года назад +26

    NOTE: Gradient Boost traditionally uses Regression Trees. If you don't already know about Regression Trees, check out the 'Quest: ruclips.net/video/g9c66TUylZ4/видео.html Also NOTE: In Statistics, Machine Learning and almost all programming languages, the default base for the log function, log(), is log base 'e' and that is what I use here.
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @parijatkumar6866
      @parijatkumar6866 3 года назад

      I am a bit confused. The first Log that you took : Log(4/2) - was that to some base other than e? Cause e^(log(x)) = x for log to the base e
      And hence the probability will be simply 2/(1+2) = 2/3 = No of Yes / Total Obs = 4/6 = 2/3
      Pls let me know if this is correct.

    • @statquest
      @statquest  3 года назад +2

      @@parijatkumar6866 The log is to the base 'e', and yes, e^(log(x)) = x. However, sometimes we don't have x, we just have the log(x), as is illustrated at 9:45. So, rather than use one formula at one point in the video, and another in another part of the video, I believe I can do a better job explaining the concepts if I am consistent.

    • @jonelleyu1895
      @jonelleyu1895 Год назад

      For Gradient Boost for CLASSIFICATION, because we convert the categorical targets(No or Yes) to probabilities(0-1) and the residuals are calculated from the probabilities, when we build a tree, we still use REGRESSION tree, which use sum of squared residuals to split the tree. Is it correct? Thank you.

    • @statquest
      @statquest  Год назад +1

      @@jonelleyu1895 Yes, even for classification, the target variable is continuous (probabilities instead of Yes/No), and thus, we use regression trees.

  • @weiyang2116
    @weiyang2116 3 года назад +159

    I cannot imagine the amount of time and effort used to create these videos. Thanks!

    • @statquest
      @statquest  3 года назад +27

      Thank you! Yes, I spent a long time working on these videos.

  • @primozpogacar4521
    @primozpogacar4521 3 года назад +22

    Love these videos! You deserve a Nobel prize for simplifying machine learning explanations!

  • @sameepshah3835
    @sameepshah3835 8 дней назад +1

    Thank you so much Josh, I watch 2-3 videos everyday of your machine learning playlist and it just makes my day. Also the fact that you reply to most of the people in the comments section is amazing. Hats off. I only wish the best for you genuinely.

  • @debsicus
    @debsicus 3 года назад +52

    This content shouldn’t be free Josh. So amazing Thank You 👏🏽

    • @statquest
      @statquest  3 года назад +2

      Thank you very much! :)

  • @OgreKev
    @OgreKev 4 года назад +52

    I'm enjoying the thorough and simplified explanations as well as the embellishments, but I've had to set the speed to 125% or 150% so my ADD brain can follow along.
    Same enjoyment, but higher bpm (bams per minute)

  • @jagunaiesec
    @jagunaiesec 4 года назад +34

    The best explanation I've seen so far. BAM! Catchy style as well ;)

    • @statquest
      @statquest  4 года назад

      Thank you! :)

    • @arunavsaikia2678
      @arunavsaikia2678 4 года назад +1

      @@statquest are the individual trees which are trying to predict the residuals regression trees?

    • @statquest
      @statquest  4 года назад

      @@arunavsaikia2678 Yes, they are regression trees.

  • @dhruvjain4774
    @dhruvjain4774 4 года назад +10

    you really explain complicated things in very easy and catchy way.
    i like the way you BAM

  • @xinjietang953
    @xinjietang953 10 месяцев назад +2

    Thanks for all you've done. You know your videos is first-class and precision-promised learning source for me.

    • @statquest
      @statquest  10 месяцев назад

      Great to hear!

  • @user-gr1qk3gu4j
    @user-gr1qk3gu4j 5 лет назад +1

    Very simple and practical lesson. I did created a worked sample based on this with no problems.
    It might be obvious, but not explained there, that initial mean odd should be more than 1. It might be explained as odd of more rare event should be closer to zero.
    Glad to see this video arrived just at the time I started to interest this topic.
    I guess it will become a "bestseller"

  • @Valis67
    @Valis67 2 года назад +2

    That's an excellent lesson and a unique sense of humor. Thank you a lot for the effort in producing these videos!

  • @igormishurov1876
    @igormishurov1876 4 года назад +7

    Will recommend the channel for everyone study the machine learning :) Thanks a lot, Josh!

  • @asdf-dh8ft
    @asdf-dh8ft 3 года назад +2

    Thank you very much! Your step by step explanation is very helpful. It gives to people with poor abstract thinking like me chance to understand all math of these algorithms.

    • @statquest
      @statquest  3 года назад +1

      Glad it was helpful!

  • @umeshjoshi5059
    @umeshjoshi5059 4 года назад +2

    Love these videos. Starting to understand the concepts. Thank you Josh.

  • @soujanyapm9595
    @soujanyapm9595 3 года назад +1

    Amazing illustration of a complicated concept. This is best explanation. Thank you so much for all your efforts in making us understand the concepts very well !!! Mega BAM !!

  • @lemauhieu3037
    @lemauhieu3037 2 года назад +2

    I'm new to ML and these contents are gold. Thank you so much for the effort!

  • @tymothylim6550
    @tymothylim6550 3 года назад +2

    Thank you Josh for another exciting video! It was very helpful, especially with the step-by-step explanations!

    • @statquest
      @statquest  3 года назад +1

      Hooray! I'm glad you appreciate my technique.

  • @juliocardenas4485
    @juliocardenas4485 Год назад +1

    Yet again. Thank you for making concepts understandable and applicable

  • @rishabhkumar-qs3jb
    @rishabhkumar-qs3jb 3 года назад +1

    Fantastic video , I was confused about the gradient boosting, after watching all parts of gb technique from this channel, I understood it very well :)

  • @gonzaloferreirovolpi1237
    @gonzaloferreirovolpi1237 5 лет назад +1

    Already waiting for Part 4...thanks as always Josh!

    • @statquest
      @statquest  5 лет назад +1

      I'm super excited about Part 4 and should be out in a week and a half. This week got a little busy with work, but I'm doing the best that I can.

  • @cmfrtblynmb02
    @cmfrtblynmb02 2 года назад +1

    Finally a video that shows the process of gradent boosting. Thanks a lot.

  • @marjaw6913
    @marjaw6913 2 года назад +1

    Thank you so much for this series, I understand everything thanks to you!

  • @narasimhakamath7429
    @narasimhakamath7429 3 года назад +4

    I wish I had a teacher like Josh! Josh, you are the best! BAAAM!

  • @dankmemer9563
    @dankmemer9563 3 года назад +2

    Thanks for the video! I’ve been going on a statquest marathon for my job and your videos have been really helpful. Also “they’re eating her...and then they’re going eat me!....OH MY GODDDDDDDDDDDDDDD!!!!!!”

  • @sidagarwal43
    @sidagarwal43 3 года назад +1

    Amazing and Simple as always. Thank You

    • @statquest
      @statquest  3 года назад

      Thank you very much! :)

  • @amitv.bansal178
    @amitv.bansal178 2 года назад +1

    Absolutely wonderful. You are are my guru and a true salute to you

  • @AmelZulji
    @AmelZulji 5 лет назад +1

    First of all thank you for such a great explanations. Great job!
    It would be great if you could make a video about the Seurat package, which very powerful tool for single cell RNA analysis.

  • @rrrprogram8667
    @rrrprogram8667 5 лет назад +2

    I have beeeeennnn waiting for this video..... Awesome job Joshh

  • @yulinliu850
    @yulinliu850 5 лет назад +1

    Excellent as always! Thanks Josh!

  • @mayankamble2588
    @mayankamble2588 2 месяца назад +1

    This is amazing. This is the nth time I have come back to this video!

  • @ayahmamdouh8445
    @ayahmamdouh8445 2 года назад +1

    Hi Josh, great video.
    Thank you so much for your great effort.

  • @siyizheng8560
    @siyizheng8560 4 года назад +2

    All your videos are super amazing!!!!

  • @CC-um5mh
    @CC-um5mh 5 лет назад +1

    This is absolutely a great video. Will you cover why we can use residual/(p*(1-p)) as the log of odds in your next video? Very excited for the part 4!!

    • @statquest
      @statquest  5 лет назад +1

      Yes! The derivation is pretty long - lots of little steps, but I'll work it out entirely in the next video. I'm really excited about it as well. It should be out in a little over a week.

  • @SergioPolimante
    @SergioPolimante 2 года назад +1

    man, you videos are just super good, really.

  • @TheAbhiporwal
    @TheAbhiporwal 5 лет назад +2

    Superb video without a doubt!!!
    one query Josh, do you have any plans to cover a video on "LightGBM" in near future?

  • @tumul1474
    @tumul1474 5 лет назад +2

    amazing as always !!

  • @user-ut3sy6hy4p
    @user-ut3sy6hy4p 3 месяца назад +1

    thanks alot , ur videos helped me too much, plz keep going

  • @user-be1hp3xo1b
    @user-be1hp3xo1b Год назад +1

    Great video! Thank you!

  • @joeroc4622
    @joeroc4622 4 года назад +1

    Thank you very much for sharing! :)

  • @Just-Tom
    @Just-Tom 3 года назад +3

    I was wrong! All your songs are great!!!
    Quadruple BAM!

  • @abissiniaedu6011
    @abissiniaedu6011 Год назад +1

    Your are very helpful, thank you!

  • @IQstrategy
    @IQstrategy 5 лет назад +1

    Great videos again! XGBoost next? As this is supposed to solve both variance (RF) & bias (Boost) problems.

  • @pranaykothari9870
    @pranaykothari9870 5 лет назад +2

    Can GB for classification be used for multiple classes? If yes, how will the math be, the video explains for binary classes.

  • @sajjadabdulmalik4265
    @sajjadabdulmalik4265 4 года назад

    Hi Josh thanks alot for your clearly explained videos. I had a question @12.17 when you make the second tree spliting the tree twice with Age only the node and the decision node both are Age. If this is correct will not be a continuous variable create kind of biasness? My second question when we classify the the new person @ 14.40 the initial log(odds) still remains 0.7? Assuming this is nothing but your test set however what happens in the real world scenario were we have more records does the log odds changes as per the new data we want to predict meaning the log of odds for train and test set depends on their own averages (the log of odds)?

  • @rodrigomaldonado5280
    @rodrigomaldonado5280 5 лет назад +4

    Hi Statquest would you please make a video about naive bayes? Please it would be really helpful

  • @vans4lyf2013
    @vans4lyf2013 3 года назад +7

    I wish I could give you the money that I pay in tuition to my university. It's ridiculous that people who are paid so much can't make the topic clear and comprehensible like you do. Maybe you should do teaching lessons for these people. Also you should have millions of subscribers!

    • @statquest
      @statquest  3 года назад

      Thank you very much!

  • @sid9426
    @sid9426 4 года назад +1

    Hey Josh,
    I really enjoy your teaching. Please make some videos on XG Boost as well.

    • @statquest
      @statquest  4 года назад

      XGBoost Part 1, Regression: ruclips.net/video/OtD8wVaFm6E/видео.html
      Part 2 Classification: ruclips.net/video/8b1JEDvenQU/видео.html
      Part 3 Details: ruclips.net/video/ZVFeW798-2I/видео.html
      Part 4, Crazy Cool Optimizations: ruclips.net/video/oRrKeUCEbq8/видео.html

  • @user-tk6bz6lw4e
    @user-tk6bz6lw4e 4 года назад +1

    Thank you for good videos!

  • @Mars7822
    @Mars7822 Год назад +1

    Super Cool to understand and study, Keep Up master..........

  • @junaidbutt3000
    @junaidbutt3000 5 лет назад +1

    Another superb video Josh. The example was very clear and I’m beginning to see the parallels between the regression and classification case.
    One key distinction seems to be in calculating the output value of the terminal nodes for the trees.
    In the regression case the average was taken of the values in the terminal nodes (although this can be changed based on the loss function selected). In the classification case it seems that a different method is used to calculate the output values at the terminal nodes but it seems a function of the loss function (presumably a loss function which takes into account a Bernoulli process?).
    Secondly we also have to be careful in converting the output of the tree ensemble to a probability score. The output is a log odds score and we have to convert it to a probability before we can calculate residuals and generate predictions.
    Is my understanding more or less correct here? Or have I missed something important? Thanks again!

    • @statquest
      @statquest  5 лет назад +1

      You are correct! When Gradient Boost is used for Classification, some liberties are taken with the loss function that you don't see when Gradient Boost is used for Regression. The difference being that the math is super easy for Regression, but for Classification, there are not any easy "closed form" solutions. In theory, you could use Gradient Descent to find approximations, but that would be slow, so, in practice, people use an approximation based on the Taylor series. That's where that funky looking function used to calculate Output Values comes from. I'll cover that in Part 4.

  • @ulrichwake1656
    @ulrichwake1656 5 лет назад +3

    Thank you so much. Great videos again and again.
    One question, what is the difference between xgboost and gradient boost?

    • @mrsamvs
      @mrsamvs 4 года назад

      please reply @statQuest team

  • @abdelhadi6022
    @abdelhadi6022 5 лет назад +1

    Thank you, awesome video

  • @dhruvarora6927
    @dhruvarora6927 5 лет назад

    Thank you for sharing this Josh. I have a quick question - the subsequent trees which are predicting residuals are regression trees (not classification tree) as we are predicting continuous values (residual probabilities)?

  • @61_shivangbhardwaj46
    @61_shivangbhardwaj46 3 года назад +1

    You r amazing sir! 😊 Great content

  • @vinayakgaikar154
    @vinayakgaikar154 Год назад +1

    nice explanation and easy to understand thanks bro

  • @sandralydiametsaha9261
    @sandralydiametsaha9261 5 лет назад +1

    thank you very much for your videos !
    when will you post the next one ?

  • @kevindeng3576
    @kevindeng3576 5 лет назад

    Do we have a video on neural network? It seems to me we just throw a bunch of functions and get an output. What is the idea of it? Why does it work at all?

  • @jrgomez7340
    @jrgomez7340 5 лет назад

    Very helpful explanation. Can you also add a video on how to do this in R? Thanks

  • @suryan5934
    @suryan5934 3 года назад +4

    Now I want to watch Troll 2

    • @statquest
      @statquest  3 года назад +1

      :)

    • @AdityaSingh-lf7oe
      @AdityaSingh-lf7oe 3 года назад +2

      Somewhere around the 15 min mark I made up my mind to search this movie on google

    • @suryan5934
      @suryan5934 3 года назад

      @@AdityaSingh-lf7oe bam

  • @koderr100
    @koderr100 2 года назад +1

    thanks for videos. best of anything else I did see. Will use this 'pe-pe-po-pi-po" as message alarm on phone)

  • @yjj.7673
    @yjj.7673 4 года назад +1

    This is great!!!

  • @anusrivastava5373
    @anusrivastava5373 3 года назад +1

    Simply Awesome!!!!!!

  • @aweqweqe1
    @aweqweqe1 2 года назад +1

    Respect and many thanks from Russia, Moscow

  • @ElderScrolls7
    @ElderScrolls7 4 года назад +1

    Another great lecture by Josh Starmer.

    • @statquest
      @statquest  4 года назад +1

      Hooray! :)

    • @ElderScrolls7
      @ElderScrolls7 4 года назад +1

      @@statquest I actually have a draft paper (not submitted yet) and included you in the acknowledgements if that is ok with you. I will be very happy to send it to you when we have a version out.

    • @statquest
      @statquest  4 года назад +1

      @@ElderScrolls7 Wow! that's awesome! Yes, please send it to me. You can do that by contacting me first through my website: statquest.org/contact/

    • @ElderScrolls7
      @ElderScrolls7 4 года назад +1

      @@statquest I will!

  • @JoaoVictor-sw9go
    @JoaoVictor-sw9go 2 года назад +2

    Hi Josh, great video as always! Can you explain to me or recommend a material to understand the GB algorithm when we are using it for a non-binary classification? E.g. we have three or more possible outputs for classification.

    • @statquest
      @statquest  2 года назад +1

      Unfortunately I don't know a lot about that topic. :(

  • @user-qu7sh1kb1e
    @user-qu7sh1kb1e 4 года назад +1

    very detailed and convincing

  • @abyss-kb8qy
    @abyss-kb8qy 4 года назад +2

    God bless you , thanks you so so so much.

  • @vijaykumarlokhande1607
    @vijaykumarlokhande1607 2 года назад

    I salute your hardwork, and mine too

  • @user-ll8dr9bm5v
    @user-ll8dr9bm5v 6 месяцев назад

    @statquest you mentioned at 10:45 that we build a lot of trees. Are you trying to refer to bagging or having different tree at each iteration?

    • @statquest
      @statquest  6 месяцев назад

      Each time we build a new tree.

  • @deepakmehta1813
    @deepakmehta1813 3 года назад

    Fantastic song, Josh. I have started picturing that I am attending a class and the professor/lecturer walks by in the room with the guitar, and the greeting would be the song. This could be the new norm following stat quest. One question regarding gradient boost that I have is why it restricts the size of the tree based on the number of leaves. What would happen if that restriction is ignored? Thanks, Josh. Once again, superb video on this topic.

    • @statquest
      @statquest  3 года назад

      If you build full sized trees then you would overfit the data and you would not be using "weak learners".

  • @cmfrtblynmb02
    @cmfrtblynmb02 2 года назад +2

    How do you create the classification trees using residual probabilities? Do you stop using some kind of purity index during the optimization in that case? Or do you use regression methods?

    • @statquest
      @statquest  2 года назад

      We use regression trees, which are explained here: ruclips.net/video/g9c66TUylZ4/видео.html

  • @sebastianlinaresrosas3278
    @sebastianlinaresrosas3278 5 лет назад +2

    How do you create each tree? In your decision tree video you use them for classification, but here they are used to predict the residuals (something like regression trees)

  • @mengdayu6203
    @mengdayu6203 5 лет назад +17

    How does the multi-classification algorithm work in this case? Using one vs rest method?

    • @bharathbhimshetty8926
      @bharathbhimshetty8926 4 года назад +2

      It's been over 11 months and no reply from josh... bummer

    • @AnushaCM
      @AnushaCM 4 года назад +2

      have the same question

    • @ketanshetye5029
      @ketanshetye5029 4 года назад +1

      @@AnushaCM well, we could use one vs rest approach

    • @Andynath100
      @Andynath100 3 года назад +3

      It uses a Softmax objective in the case of multi-class classification. Much like Logistic(Softmax) regression.

  • @rungrawin1994
    @rungrawin1994 2 года назад +2

    Listening to your song makes me thinking of Phoebe Buffay haha.
    Love it, anyway !

    • @statquest
      @statquest  2 года назад +1

      See: ruclips.net/video/D0efHEJsfHo/видео.html

    • @rungrawin1994
      @rungrawin1994 2 года назад +1

      ​@@statquest Smelly stat, smelly stat, It's not your fault (to be so hard to understand)

    • @rungrawin1994
      @rungrawin1994 2 года назад

      @@statquest btw i like your explanation on gradient boost too

  • @hamzael2200
    @hamzael2200 3 года назад

    HEY ! THANKS FOR THIS AWESOME VIDEO. I HAVE A QUESTION : IN THE 12:00 MIN HOW DID YOU BUILD THIS NEW TREE? WHAT WAS THE CRITERIA FOR CHOOSING AGE LESS THAN 66 AS THE ROOT ?

    • @statquest
      @statquest  3 года назад

      Gradient Boost uses Regression Trees: ruclips.net/video/g9c66TUylZ4/видео.html

  • @siddharthvm8262
    @siddharthvm8262 2 года назад +1

    Bloody awesome 🔥

  • @shashiramreddy9896
    @shashiramreddy9896 3 года назад

    @StatQuest Thanks for the great content you provide. It's a great explanation of binary-class classification, but how will all this explanation apply to multi-class classification?

    • @statquest
      @statquest  3 года назад

      Usually people combine multiple models that test class vs everything else.

  • @rvstats_ES
    @rvstats_ES 4 года назад +1

    Congrats!! Nice video! Ultra bam!!

    • @statquest
      @statquest  4 года назад

      Thank you very much! :)

  • @abhilashsharma1992
    @abhilashsharma1992 4 года назад +5

    Best original song ever in the start!

    • @statquest
      @statquest  4 года назад +2

      Yes! This is a good one. :)

  • @haitaowu5888
    @haitaowu5888 3 года назад

    Hi, I have a few questions: 1. How do we know when GBDT algorithms stops( except the M, number of trees) 2. how do I choose value for the M, how do I know this is optimal ?
    Nice work by the way, best explanation I found on the internet.

    • @statquest
      @statquest  3 года назад

      You can stop when the predictions stop improving very much. You can try different values for M and plot predictions after each tree and see when predictions stop improving.

    • @haitaowu5888
      @haitaowu5888 3 года назад +1

      @@statquest thank you!

  • @hitesh8383
    @hitesh8383 7 дней назад

    Thanks for this video. But one question. Does the tree that you constructed for predicting residuals at 5:30 use sum of squared errors as in case of regression trees or GINI index as in case of decision trees? Since we have only two target values

    • @statquest
      @statquest  7 дней назад

      In a pinned comment I wrote "Gradient Boost traditionally uses Regression Trees. If you don't already know about Regression Trees, check out the 'Quest: ruclips.net/video/g9c66TUylZ4/видео.html"

  • @Martin-so8gb
    @Martin-so8gb Год назад

    Hey Josh, When these trees are being built using the variables, how are you determining how to build them? Are you using gini impurity to choose each split as in the decision tree videos? Same question goes for regression trees in gradient boosting vids. Thanks in advance brother!

    • @statquest
      @statquest  Год назад

      For both regression and classification problems, Gradient Boost traditionally uses Regression Trees. If you don't already know about Regression Trees, check out the 'Quest: ruclips.net/video/g9c66TUylZ4/видео.html

  • @patrickyoung5257
    @patrickyoung5257 4 года назад +5

    You save me from the abstractness of machine learning.

  • @rohitbansal3032
    @rohitbansal3032 3 года назад +1

    You are awesome !!

  • @HarpreetKaur-qq8rx
    @HarpreetKaur-qq8rx 3 года назад

    Hi Josh,
    Does the Gradient Boost use GINI Impurity too select the best node to split on or is it split on a random node or does it make use of some other criterion to split the data

    • @statquest
      @statquest  3 года назад

      Gradient Boost almost always uses Regression Trees because they are fit to the residuals, which are continuous values. Regression Trees are described here: ruclips.net/video/g9c66TUylZ4/видео.html

  • @jwc7663
    @jwc7663 4 года назад

    Thanks for the great video! One question: Why do you use 1-sigmoid instead of sigmoid itself?

    • @statquest
      @statquest  4 года назад

      What time point in the video are you asking about?

  • @josherickson5446
    @josherickson5446 4 года назад

    Hey Josh, just trying to clarify how the root node in gradient boosting machine (gbm) is decided (i'm sure different packages/model types differ) compared to random forest? From what I understand is rf uses a random 'mtry' of predictors to choose the root node and then uses gini or entropy to pick the variable and then splits using this method, etc, etc. But how does gradient boosting machines do this? Is it like a regular decision tree where all predictors are available and some statistic is used to choose the best one ? Thanks as always for your awesome videos and have a good one!

    • @statquest
      @statquest  4 года назад

      Since the trees in gradient boost predict the residuals, which are continuous, and because it doesn't have it's own special type of tree (like xgboost), it uses regression trees. Here's the StatQuest that explains regression trees: ruclips.net/video/g9c66TUylZ4/видео.html

  • @timothygorden7689
    @timothygorden7689 Год назад +1

    absolute gold

  • @HamidNourashraf
    @HamidNourashraf 7 месяцев назад +1

    the best video for GBT

  • @vijayendrasdm
    @vijayendrasdm 4 года назад

    HI Josh
    Great video.
    I have a question.
    In the classification example for adaboost the misclassified data points were sampled with higher probability in the next iteration of adaboost. This was very clear in adaboost.
    Where and how exactly the misclassified points are assigned higher weightage in GBM so that they can be sampled with higher probability in next iteration of GBM ?

    • @statquest
      @statquest  4 года назад

      The answer to your question is in this video and more details can be found in the follow up: ruclips.net/video/StWY5QWMXCw/видео.html

  • @CodingoKim
    @CodingoKim Год назад +3

    my life has been changed for 3 times. First, when I met Jesus. Second, when I found out my true live. Third, it's you Josh

  • @123chith
    @123chith 5 лет назад +16

    Thank you so much can you please make a video for Support Vector Machines

  • @jayyang7716
    @jayyang7716 2 года назад +1

    Thanks so much for the amazing videos as always! One question: why the loss function for Gradient Boost classification uses residual instead of cross entropy? Thanks!

    • @statquest
      @statquest  2 года назад

      Because we only have two different classifications. If we had more, we could use soft max to convert the predictions to probabilities and then use cross entropy for the loss.

    • @jayyang7716
      @jayyang7716 2 года назад

      @@statquest Thank you!

  • @rrrprogram8667
    @rrrprogram8667 5 лет назад +2

    So finallyyyy the MEGAAAA BAMMMMM is included.... Awesomeee

    • @statquest
      @statquest  5 лет назад +2

      Yes! I was hoping you would spot that! I did it just for you. :)

    • @rrrprogram8667
      @rrrprogram8667 5 лет назад +1

      @@statquest i was in office when i first wrote the comment earlier so couldn't see the full video...

  • @parthsarthijoshi6301
    @parthsarthijoshi6301 3 года назад +1

    THIS IS A BAMTABULOUS VIDEO !!!!!!

  • @rohitverma1057
    @rohitverma1057 2 года назад

    Hey josh great videos!! But I want to ask a doubt around 6:40. To add the leaf and tree's prediction, we are converting tree's prediction through that formula to convert it into log(odds) format, the same type as of leaf and continue to do the same process for each subsequent trees, Right.
    My question is why not we convert the initial single leaf's output to probability format for once and spare all the predictions of further trees from that conversion formula ?

    • @statquest
      @statquest  2 года назад +1

      Because the log(odds) goes from negative infinity to positive infinity, allowing us to add as many trees as we please without having to worry that we will go too far. In contrast, if we used probability, then we would have hard limits at 0 and 1, and then we would have to worry about adding too many trees and going over or under etc.

  • @fvviz409
    @fvviz409 3 года назад

    Hello Josh, So i have a little question. How would we make the first leaf if we have more than 2 labels, Because you said to calculate the first leaf we need to do log(odds) but log(odds) can only be done for classification with 2 labels, What would we do if we had more than 2. Do we use One-vs-All classification like we do in Logistic regression or what?

    • @statquest
      @statquest  3 года назад

      You can do one-vs-all, or change the loss function see the "objective" parameter here: xgboost.readthedocs.io/en/latest/parameter.html

  • @keylazy
    @keylazy 2 года назад

    Thanks for the great video. I wonder if the output of each leaf is probability instead of log(odds), would that simply the math a little?

    • @statquest
      @statquest  2 года назад +2

      It would actually make it more complicated. This is because probabilities have hard limits at 0 and 1. So this makes adding the output from an unknown number of trees tricky. In contrast, the log(odds) has no limits (we can add values all the way up to positive infinity if we wanted to), and that gives us the flexibility to add as many trees as need to the model.

  • @jongcheulkim7284
    @jongcheulkim7284 2 года назад +1

    Thank you so much.

  • @user-sq2zw4un3q
    @user-sq2zw4un3q 3 месяца назад

    Best video ever, quick question on building the next tree. Once we have the new residuals, how do we decide the new node for the next tree? Is it still the same as calculating Gini but on the residuals ?

    • @statquest
      @statquest  3 месяца назад

      Gradient Boost traditionally uses Regression Trees. If you don't already know about Regression Trees, check out the 'Quest: ruclips.net/video/g9c66TUylZ4/видео.html