XGBoost in Python from Start to Finish

Поделиться
HTML-код
  • Опубликовано: 2 июл 2024
  • NOTE: You can support StatQuest by purchasing the Jupyter Notebook and Python code seen in this video here: statquest.gumroad.com/l/uroxo
    NOTE: This StatQuest assumes that you are already familiar with:
    XGBoost for Regression: • XGBoost Part 1 (of 4):...
    XGBoost for Classification: • XGBoost Part 2 (of 4):...
    XGBoost: Crazy Cool Optimizations: • XGBoost Part 4 (of 4):...
    Regularization: • Regularization Part 1:...
    Cross Validation: • Machine Learning Funda...
    Confusion Matrices: • Machine Learning Funda...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying my book, The StatQuest Illustrated Guide to Machine Learning:
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    RUclips Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    2:56 Import Modules
    4:34 Import Data
    13:43 Missing Data Part 1: Identifying
    18:37 Missing Data Part 2: Dealing with it
    24:03 Format Data Part 1: X and y
    25:55 Format Data Part 2: One-Hot Encoding
    33:25 XGBoost - Missing Data and One-Hot Encoding
    36:43 Build a Preliminary XGBoost Model
    45:01 Optimize Parameters with Cross Validation (GridSearchCV)
    49:44 Build and Draw Final XGBoost Model
    #StatQuest #ML #XGBoost

Комментарии • 713

  • @statquest
    @statquest  3 года назад +50

    NOTE: You can support StatQuest by purchasing the Jupyter Notebook and Python code seen in this video here: statquest.gumroad.com/l/uroxo
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @PetStuBa
      @PetStuBa 3 года назад +2

      Dear Josh ... I have a request for new videos or livechats ... could you explain us these tests maybe ? ... Tukey, Bonferroni and Scheffé , it's hard for me to understand , you explain everything so well ... could be very helpful for a lot of people out there ... have a nice day , greetings from Europe

  • @znull3356
    @znull3356 3 года назад +97

    Please keep doing these long-form Python tutorials on the various ideas we've covered in earlier 'Quests. They're great for those of us working in Python, and they give me another way to support the channel. It has been a more-than-pleasant surprise that as I've grown from learning the basics of stats to machine learning and eventually deep learning, StatQuest has grown along with me into those very same fields.
    Thanks Josh.

  • @kelvinhsueh5434
    @kelvinhsueh5434 3 года назад +25

    You are amazing. Can't imagine how much work you put into those step-by-step tutorials.
    Just bought the Jupyter Notebook code and it's beyond worth it! Thank you :)

    • @statquest
      @statquest  3 года назад +3

      Thank you very much for your support! :)

  • @abhinaym5923
    @abhinaym5923 3 года назад +4

    I am purchasing the Jupiter notebook to contribute to your work! Thanks a lot for this video! You are awesome! Will be very very happy to have more ML tutorials and thank you Josh!

    • @statquest
      @statquest  3 года назад

      Thank you very much! :)

  • @zheyizhao4865
    @zheyizhao4865 3 года назад +14

    Hey Josh, I just purchased all of your 3 Jupyter Notebook! I transferred from Econ major to Data Science, it was a nightmare before I find your channel. Your channel shed the light upon my academic career! Look forward to more of the 'Python from Start to Finish' series, and I will definitely support it!

    • @statquest
      @statquest  3 года назад +3

      Awesome! Thank you!

  • @tantalumCRAFT
    @tantalumCRAFT 2 года назад +4

    This is hands down the best Python tutorial on RUclips.. not just for XGBoost, but overall Python logic and syntax. Nice work, subscribed!!

  • @AdamsJamsYouTube
    @AdamsJamsYouTube Год назад +5

    Josh, this video is epic and really helped me understand the actual process of tuning hyperparameters, something that had been a bit of a black box until I saw this video. Your channel is awesome too - great jingles as well :D

  • @markrauschkolb5370
    @markrauschkolb5370 3 года назад +10

    Extremeley helpful - would love to see more from the "start to finish" series

    • @statquest
      @statquest  3 года назад +1

      I'm working on it.

  • @josephhayes9152
    @josephhayes9152 3 года назад +1

    Thanks for the great tutorial! You covered a lot of details (mostly data cleaning) that are often overlooked or skipped as 'trivial' steps.

    • @statquest
      @statquest  3 года назад +1

      Thank you! Yes, "data cleaning" is 95% of the job.

  • @danielmagical6298
    @danielmagical6298 3 года назад +2

    Hi Josh, great job really helpful material as I'm discovering XGBoost just now.
    Thank you and keep you great work!

    • @statquest
      @statquest  3 года назад

      Thank you very much! :)

  • @thomsondcruz
    @thomsondcruz Год назад

    Absolutely loved this video Josh. It breaks down everything into understandable chunks. Thank you and God bless. BAM! The only thing I missed (and its very minor) was taking in a new data row and making an actual prediction by using the model.

    • @statquest
      @statquest  Год назад +1

      Thanks! For new data, you just call clf_xgb.predict() with the row of new data.

  • @minseong4644
    @minseong4644 3 года назад +1

    Such an amazing job Josh.. Couldn't find any better explanation than this!
    Mesmerizing!

  • @fernandes1431
    @fernandes1431 2 года назад +3

    Can't thank you enough for the clearest and best explanation on RUclips

  • @darksoul1381
    @darksoul1381 3 года назад +1

    I was wondering how to find stuff regarding dealing with actual churn data and sampling issues. The tutorial addressed a lot of them. Thanks!

  • @maurosobreira8695
    @maurosobreira8695 2 года назад +1

    A true, real Master Class - You got my support!

  • @jinwooseong2862
    @jinwooseong2862 3 года назад +1

    I watched your all video for XGBoost. It helps me a lot. very appreciated!

  • @francovega7089
    @francovega7089 2 года назад +1

    I really appreciate your content Josh. Thanks for your time

  • @romanroman5226
    @romanroman5226 2 года назад +1

    Awesome video! The cleanest xgboost explanation a have ever seen.

  • @RahulEdvin
    @RahulEdvin 3 года назад +10

    Josh, you’re well and truly phenomenal ! Love from Madras !

    • @prashanthb6521
      @prashanthb6521 3 года назад +1

      Chennai

    • @statquest
      @statquest  3 года назад +2

      BAM! Thank you very much!!!

    • @starmerf
      @starmerf 3 года назад +6

      Hi Rahul I taught atIIT-madras 19192-1993 lived on campus across from post office josh visited us there

    • @RahulEdvin
      @RahulEdvin 3 года назад +1

      Frank Starmer Hello Frank, wow! That’s great to know ! :) I’m sure you must have had a good time here. Cheers :)

    • @prashanthb6521
      @prashanthb6521 3 года назад +1

      @@starmerf wow the world is a small place ☺

  • @samxu5320
    @samxu5320 2 года назад +1

    Your pronunciation is the most authentic and clearest that I have ever heard

  • @navyasreepinjala1582
    @navyasreepinjala1582 2 года назад +2

    I love your teaching style. Extremely helpful for a beginner like me. Really helped me a lot in my exams. No words. You are the best!!!!

  • @marceloherdy2379
    @marceloherdy2379 3 года назад +1

    Man, this video is awesome! Congratulations!

  • @sudheerrao07
    @sudheerrao07 3 года назад +6

    Wow. Finally I see a face for the name. Your previous videos have had immensely helpul. I assumed you are a very senior person. I am not measuring your age. I mean, your way of explaining seemed like a professor with half a century of experience. But in reality, you are quite young. Thank you for all your simple-yet-detailed videos. No words to quantify how much I appreciate them. 🙏

  • @dillonmears6696
    @dillonmears6696 Год назад +1

    Great video! You did a wonderful job of explaining the process. Thanks!

  • @parismollo7016
    @parismollo7016 3 года назад +3

    I haven't watched it yet but I know this will be great!!!!!!!! Thank you Josh.

  • @PradeepMahato007
    @PradeepMahato007 3 года назад +2

    BAMMMMM !!!
    This is awesome 👍 Josh !! Thank you for your contribution, really helpful for new learners.😊😊😊

    • @statquest
      @statquest  3 года назад

      Glad you liked it!

    • @statquest
      @statquest  2 года назад

      @@salilgupta9427 Thanks!

  • @godoren
    @godoren 3 года назад +2

    Thank you for your job, the explanation of the topic is very clear and transparent.

    • @statquest
      @statquest  3 года назад

      Thank you very much! :)

  • @SergioPolimante
    @SergioPolimante 2 года назад +1

    This kind of content is SUPER HARD to produce. I really understand and appreciate your effort here. Thanks and congratulations.

    • @statquest
      @statquest  2 года назад

      Thank you very much!

  • @julieirwin3288
    @julieirwin3288 3 года назад +3

    What did we do to deserve a great guy like Josh ? Thank you Josh!

  • @keyurshah8451
    @keyurshah8451 2 года назад +1

    Hey Mate, amazing tutorial. Very complex problem explained in really simple and effective way. I am using XGBOOST for one of the classification model and after watching your video it made me realise I can further improve my model. So thank you again and keep making those videos. Kudos to you and long live data science 🙏🙏

  • @mdaroza
    @mdaroza 3 года назад +2

    Amazingly organized and well explained!

  • @miguelbarajas9892
    @miguelbarajas9892 Год назад +1

    Freaking amazing! You explain everything so well. Thank you!

  • @felixwhise4165
    @felixwhise4165 3 года назад +1

    just here to say thank you! will come back in a month when I have time to watch it. :)

  • @marekslazak1003
    @marekslazak1003 2 года назад +4

    Jesus, i just learned more over 10 minutes of this than i did throughtout an entire semester of a similar subject on CS. ++ tutorial

  • @sreejaysreedharan4085
    @sreejaysreedharan4085 3 года назад +1

    Lovely and priceless video Josh...BAM BAM BAM as usual !! :) God bless. .

    • @statquest
      @statquest  3 года назад

      Thank you very much! :)

  • @haskycrawford
    @haskycrawford 3 года назад +1

    I love the channel! Eu aprendo + aqui do que a Graduação! You great josh!

  • @Fressia94
    @Fressia94 3 года назад +1

    many thanks to your great and so understandable video. It literaly helps me a lot in Python and XGBoost package

  • @muskanroxx22
    @muskanroxx22 2 года назад +1

    You're a very kind human being Josh!! Thank you so much for making these videos. Your content is gold!!! I am new to data science and this is exactly what I needed!! :)
    Much love from India!

    • @statquest
      @statquest  2 года назад +1

      Glad you like my videos!! BAM! :)

    • @muskanroxx22
      @muskanroxx22 2 года назад

      @@statquest Hey Josh! I am learning about Bayesian Optimizer and I don't seem to get it even after watching tons of tutorials, can you suggest where I should learn it from please? I couldn't find a video on your channel on this.

    • @statquest
      @statquest  2 года назад

      @@muskanroxx22 Unfortunately I don't know of a good source for that.

  • @coolmusic4meyee
    @coolmusic4meyee 5 месяцев назад +1

    Great explanation and walk-through, big thanks!

    • @statquest
      @statquest  5 месяцев назад

      Glad you enjoyed it!

  • @KukaKaz
    @KukaKaz 3 года назад +5

    Yes pls more videos with python❤thank u for the webinar

  • @daniloyukihara2143
    @daniloyukihara2143 3 года назад +1

    hurray, i picture you totally different!
    Thanks a lot for all the videos!

  • @aaltinozz
    @aaltinozz 3 года назад +1

    all week searched for this thank u very much

  • @codinghighlightswithsadra7343
    @codinghighlightswithsadra7343 10 месяцев назад +2

    Thank you so much for the work that you used in step by step tutorial. it was amazing.

    • @statquest
      @statquest  10 месяцев назад +1

      You're very welcome!

  • @andrewxie9896
    @andrewxie9896 2 года назад +1

    you are simply an amazing human being, also the notebooks are great! :D

  • @VarunKumar-pz5si
    @VarunKumar-pz5si 3 года назад +1

    I'm very grateful to have you as my teacher.

  • @williamTjS
    @williamTjS Год назад +1

    Amazing! Thanks so much for the detailed video

  • @jimmyrico5364
    @jimmyrico5364 3 года назад +1

    This is a great piece of work, thanks for sharing it!
    Maybe the only additional piece I'd add which I've found useful on the documentation of XGBoost is that one can take advantage of parallel computing (more cores or using a graphic card your machine or you could have on the cloud) by simply passing the parameter (n_jobs = -1) while doing both, the RandomizedSearchCV stage and the setting the XGB regressor type (XGBRegressor for example).

  • @aksharkottuvada
    @aksharkottuvada Год назад +1

    Thank you Josh. Needed this tutorial to better solve a ML Problem as part of my internship :)

  • @jessehe9286
    @jessehe9286 3 года назад +6

    Great video! Love it!
    request that you do a comparison of XGBoost, CatBoost, and LightGBM, and a quest on ensemble learning.

    • @statquest
      @statquest  3 года назад +4

      I'll keep those topics in mind.

  • @danielpinzon9284
    @danielpinzon9284 3 года назад +1

    Love u Josh.... you are a TRIPLE BAM!!! Greetings from Bogotá, Colombia.

    • @statquest
      @statquest  3 года назад

      Muchas gracias!!! :)

  • @sane7263
    @sane7263 Год назад +2

    That's the Best video I've ever seen. Period.
    TRIPLE BAM! :)

  • @saeedesmailii
    @saeedesmailii 3 года назад +2

    It was extremely helpful. Please continue making these videos. I suggest making a video to explain the clustering with unlabeled data, and predicting the future trend in time-series data.

    • @statquest
      @statquest  3 года назад +1

      I'll keep that in mind. :)

  • @harshbordekar8564
    @harshbordekar8564 2 года назад +1

    Thank you for the awesomeness!!

  • @gisleberge4363
    @gisleberge4363 2 года назад +1

    Appreciate the Python related videos...helps to manoeuvre the code when I try to replicate the method later on...easy to follow the whole thing, also for beginners... 🙂

    • @statquest
      @statquest  2 года назад +1

      Thanks! There will be a lot more python stuff soon.

  • @cszthomas
    @cszthomas 2 года назад +1

    Thank you for the great work!

    • @statquest
      @statquest  2 года назад +1

      Wow! Thank you so much for supporting StatQuest!!! BAM! :)

  • @k44zackie
    @k44zackie 3 года назад +1

    Thank you very much for nice video! Very helpful for me.

    • @statquest
      @statquest  3 года назад +1

      Glad it was helpful!

  • @jwxdxd
    @jwxdxd 3 года назад +1

    You are amazing! Thank you so much !!

  • @azecem6187
    @azecem6187 3 года назад +1

    Thanks a lot Josh!

    • @statquest
      @statquest  3 года назад +1

      Any time! And thanks for your support! :)

  • @Krath1988
    @Krath1988 3 года назад +107

    Liked, favorited, recommended, shared, and sacrificed my first-born to this video.

  • @yurimartins1499
    @yurimartins1499 3 года назад +1

    Thank you Josh!!
    As a suggestion, you could do a StatQuest explaining the measures in market basket analysis?

    • @statquest
      @statquest  3 года назад +1

      I'll keep that in mind.

  • @chiragpalan9780
    @chiragpalan9780 3 года назад +5

    This guy is amazing.
    DOUBLE BAM 💥 💥

  • @Azureandfabricmastery
    @Azureandfabricmastery 3 года назад +1

    Thanks for sharing! Informative.

  • @jameswilliamson1726
    @jameswilliamson1726 11 месяцев назад +1

    Another great tutorial. Thx

    • @statquest
      @statquest  11 месяцев назад

      Glad you liked it!

  • @harshavardhanasrinivasan3125
    @harshavardhanasrinivasan3125 3 года назад +1

    Reaaally amazing!!

  • @fgfanta
    @fgfanta 3 года назад

    This is gold, thank you! I am a rookie of this stuff, still I am unsure one-hot encoding is the best to do, especially to encode the city; being a category with high cardinality, all those variables for 1-hot encoding will require many splits (I guess). Perhaps using a different encoding, like mean encoding or frequency encoding, would be better, may allow to have a good fit with fewer splits.

    • @statquest
      @statquest  3 года назад

      Maybe. Try it out and let me know if you get something that works better.

  • @ketanshetye5029
    @ketanshetye5029 3 года назад +1

    could not help u with money right now , but i watched all the adds in video , hope that helps u financially . love u videos . keep up!!

  • @theredflagisgreen
    @theredflagisgreen 3 года назад +1

    This is magical.

  • @mssnal
    @mssnal 3 года назад +1

    Awesome man

  • @yelyzavetatymoshenko1572
    @yelyzavetatymoshenko1572 2 года назад +1

    Great one!

  • @dr.kingschultz
    @dr.kingschultz Год назад +1

    Another very good video!

  • @nehabalani7290
    @nehabalani7290 3 года назад +2

    Good to also see you sing rather than just hear :).. i had to comment this even before starting the training

  • @pacificbloom1
    @pacificbloom1 2 года назад +1

    Wonderful video josh.....pleasee pleasee pleasee make more videos on start to finish on python for different models.....i havr actually submitted my assignments using your techniques and got better results than what i have learned in my class
    Waiting for more to come especially on python :)

    • @statquest
      @statquest  2 года назад

      Thanks! There should be more python coming out soon.

  • @viniantunes5944
    @viniantunes5944 3 года назад +4

    Josh, you're the didactic in person form.
    Thanks!

  • @abdulkayumshaikh5411
    @abdulkayumshaikh5411 2 года назад +1

    Hello josh, you are doing amazing work keep doing

  • @SmithnWesson
    @SmithnWesson Год назад +1

    BAM! Well done.

  • @nepalm222
    @nepalm222 3 года назад +1

    Great Content, subscribed

    • @nepalm222
      @nepalm222 3 года назад +1

      Also, single best python package run through Ive seen.

    • @statquest
      @statquest  3 года назад

      Thank you very much! :)

  • @erichganz4605
    @erichganz4605 2 года назад +1

    This guy is just amazing

  • @dagma3437
    @dagma3437 3 года назад

    I'm so glad you are a bad-ass stats guru and a teacher waaaaaaaaay before a singer and a guitarist ...Thank you! ;)

    • @statquest
      @statquest  3 года назад +1

      joshuastarmer.bandcamp.com/

    • @dagma3437
      @dagma3437 3 года назад

      StatQuest with Josh Starmer ...not bad. A poor man’s Jack Johnson 🤔

    • @dagma3437
      @dagma3437 3 года назад +1

      Just pulling your leg. Thanks for all the content on stats

  • @yoniziv
    @yoniziv 3 года назад +1

    Triple Bam! thanks for your great tutorial

  • @brandonterrell9680
    @brandonterrell9680 Год назад +1

    # very helpful and informative, thank you!

  • @Dollar123Bills
    @Dollar123Bills 2 года назад +1

    That was amazing

  • @user-hj6zn8js3i
    @user-hj6zn8js3i 8 месяцев назад +1

    Thanks a lot!

  • @arnaiztech
    @arnaiztech 3 года назад +1

    Really cool!! BAM BAM BAM!!

  • @jongcheulkim7284
    @jongcheulkim7284 2 года назад +1

    Thank you so much^^

  • @goelnikhils
    @goelnikhils Год назад +1

    Amazing Content

  • @thiagotanure2212
    @thiagotanure2212 3 года назад +3

    amazing tutorial Josh! Shared with my friends =D
    Could you do one of these about pygam? It would be amazing :)

    • @statquest
      @statquest  3 года назад +1

      I'll keep that in mind.

  • @RahulVarshney_
    @RahulVarshney_ 3 года назад +3

    "25:36" that's what i was waiting for from the beginning...Truly amazing.. You are providing precious information..CHEERS

    • @statquest
      @statquest  3 года назад +1

      Glad it was helpful!

    • @RahulVarshney_
      @RahulVarshney_ 3 года назад

      @@statquest one small request..can you provide some valuable information through a video like which model to chose for different datasets..how do we decide what model we should chose...thanks in advance

    • @statquest
      @statquest  3 года назад

      @@RahulVarshney_ I'll keep that in mind. In the mean time, check out: scikit-learn.org/stable/tutorial/machine_learning_map/index.html

    • @RahulVarshney_
      @RahulVarshney_ 3 года назад +1

      @@statquest that is amazing ...i will complete it today itself thanks again for your prompt reply
      Can i get your email

  • @trendytrenessh462
    @trendytrenessh462 2 года назад +2

    It is really lovely to be able to put a face to the "Hooray!", "BAM !!!" and "Note:"s 😄❤

  • @user-dj6fs7tx4l
    @user-dj6fs7tx4l 10 месяцев назад

    Thank you so much for your hard work! I've learn so much watching your channel. Could you please explain why I shouldn't use one hot encoding while doing linear regression and what should I use instead?

    • @statquest
      @statquest  10 месяцев назад

      I explain how to encode things for linear regression in this video: ruclips.net/video/CqLGvwi-5Pc/видео.html

  • @HardikShah17
    @HardikShah17 Год назад +1

    Excellent Video @StatQuest ! Can we please have more Start to Finish python videos? Like Lightgbm maybe?

    • @statquest
      @statquest  Год назад

      I'll keep that in mind! :)

  • @uwo7130
    @uwo7130 2 месяца назад +1

    Thank you!

  • @felixzhao3435
    @felixzhao3435 2 года назад +1

    Thanks!

    • @statquest
      @statquest  2 года назад

      WOW! Thank you so much for supporting StatQuest!!! BAM! :)

  • @marcelocoip7275
    @marcelocoip7275 Год назад +1

    Hard work here, I'ts funny how the responsabile scientist and the funny guy coexist, very useful lesson, thanks!

  • @shazm4020
    @shazm4020 2 года назад +1

    Thank you so much Josh Starmer! BAM!

  • @NLarsen1989
    @NLarsen1989 2 года назад +1

    Yikes, if I ever understand something enough to explain it as succinctly as you do then I'd be very happy. I've been smashing through a lot of your videos the last few days after spending countless months on python, sklearn and all the usual plug and play solutions and it's not been until I've started watching these that I've started to feel things click into place

    • @statquest
      @statquest  2 года назад

      Awesome! I'm glad my videos are helpful! :)

  • @user-xn3lf3dg1h
    @user-xn3lf3dg1h Год назад

    First you’ve saved me this is super clear! I love all your videos so much 😊
    I do have two questions…
    1. How would you handle a classification problem with time series data?
    2. Is there any other evaluation test you should or could do to evaluate the effectiveness of your model?

    • @statquest
      @statquest  Год назад

      1. I've never used XGBoost with time series (or done much of any time series stuff before), so I can't answer this question.
      2. There are lots of ways to evaluate a model. I only present a few, but there are many more, and they really depend on what you want your model to do. Just google it.

  • @TheSoonAnn
    @TheSoonAnn Год назад +1

    very good explaination

  • @hollyching
    @hollyching 3 года назад

    Thanks Josh for another GREAT video! Just some sharing and minor questions.
    1. try pandas_profiling when doing EDA. I personally love it. :)
    2. some features are highly correlated (eg: city name and zip code). Do we need to handle that before running XGB?
    3. Why choose 10 for early_stopping_rounds
    4. What’s the difference between
    - df.loc[df['Total_Charges']==' ']
    - df[df['Total_Charges']==' ']
    5. What’s the difference between
    - y=df['Churn_Value'].copy
    - y=df['Churn_Value']
    Many thanks in advance!
    H

    • @statquest
      @statquest  3 года назад +1

      1) Thanks for the tip on pandas_profiling.
      2) No.
      3) It's a commonly used number
      4) I don't know.
      5) I believe the former is copy by value and the latter is copy by reference.

  • @sunsiney7014
    @sunsiney7014 2 года назад +1

    Great video! Very informative and clearly explained! Could you please also present BART?

    • @statquest
      @statquest  2 года назад

      I'll keep that in mind.

  • @starmerf
    @starmerf 3 года назад +1

    Awesome hooray