XGBoost in Python from Start to Finish
HTML-код
- Опубликовано: 2 июл 2024
- NOTE: You can support StatQuest by purchasing the Jupyter Notebook and Python code seen in this video here: statquest.gumroad.com/l/uroxo
NOTE: This StatQuest assumes that you are already familiar with:
XGBoost for Regression: • XGBoost Part 1 (of 4):...
XGBoost for Classification: • XGBoost Part 2 (of 4):...
XGBoost: Crazy Cool Optimizations: • XGBoost Part 4 (of 4):...
Regularization: • Regularization Part 1:...
Cross Validation: • Machine Learning Funda...
Confusion Matrices: • Machine Learning Funda...
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying my book, The StatQuest Illustrated Guide to Machine Learning:
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
RUclips Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
2:56 Import Modules
4:34 Import Data
13:43 Missing Data Part 1: Identifying
18:37 Missing Data Part 2: Dealing with it
24:03 Format Data Part 1: X and y
25:55 Format Data Part 2: One-Hot Encoding
33:25 XGBoost - Missing Data and One-Hot Encoding
36:43 Build a Preliminary XGBoost Model
45:01 Optimize Parameters with Cross Validation (GridSearchCV)
49:44 Build and Draw Final XGBoost Model
#StatQuest #ML #XGBoost
NOTE: You can support StatQuest by purchasing the Jupyter Notebook and Python code seen in this video here: statquest.gumroad.com/l/uroxo
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
Dear Josh ... I have a request for new videos or livechats ... could you explain us these tests maybe ? ... Tukey, Bonferroni and Scheffé , it's hard for me to understand , you explain everything so well ... could be very helpful for a lot of people out there ... have a nice day , greetings from Europe
Please keep doing these long-form Python tutorials on the various ideas we've covered in earlier 'Quests. They're great for those of us working in Python, and they give me another way to support the channel. It has been a more-than-pleasant surprise that as I've grown from learning the basics of stats to machine learning and eventually deep learning, StatQuest has grown along with me into those very same fields.
Thanks Josh.
That's the plan!
You are amazing. Can't imagine how much work you put into those step-by-step tutorials.
Just bought the Jupyter Notebook code and it's beyond worth it! Thank you :)
Thank you very much for your support! :)
I am purchasing the Jupiter notebook to contribute to your work! Thanks a lot for this video! You are awesome! Will be very very happy to have more ML tutorials and thank you Josh!
Thank you very much! :)
Hey Josh, I just purchased all of your 3 Jupyter Notebook! I transferred from Econ major to Data Science, it was a nightmare before I find your channel. Your channel shed the light upon my academic career! Look forward to more of the 'Python from Start to Finish' series, and I will definitely support it!
Awesome! Thank you!
This is hands down the best Python tutorial on RUclips.. not just for XGBoost, but overall Python logic and syntax. Nice work, subscribed!!
Wow! Thank you!
Josh, this video is epic and really helped me understand the actual process of tuning hyperparameters, something that had been a bit of a black box until I saw this video. Your channel is awesome too - great jingles as well :D
Thank you!
Extremeley helpful - would love to see more from the "start to finish" series
I'm working on it.
Thanks for the great tutorial! You covered a lot of details (mostly data cleaning) that are often overlooked or skipped as 'trivial' steps.
Thank you! Yes, "data cleaning" is 95% of the job.
Hi Josh, great job really helpful material as I'm discovering XGBoost just now.
Thank you and keep you great work!
Thank you very much! :)
Absolutely loved this video Josh. It breaks down everything into understandable chunks. Thank you and God bless. BAM! The only thing I missed (and its very minor) was taking in a new data row and making an actual prediction by using the model.
Thanks! For new data, you just call clf_xgb.predict() with the row of new data.
Such an amazing job Josh.. Couldn't find any better explanation than this!
Mesmerizing!
Wow, thanks!
Can't thank you enough for the clearest and best explanation on RUclips
Thank you!
I was wondering how to find stuff regarding dealing with actual churn data and sampling issues. The tutorial addressed a lot of them. Thanks!
Thanks!
A true, real Master Class - You got my support!
Thank you! :)
I watched your all video for XGBoost. It helps me a lot. very appreciated!
Glad it helped!
I really appreciate your content Josh. Thanks for your time
Thank you!
Awesome video! The cleanest xgboost explanation a have ever seen.
Wow, thanks!
Josh, you’re well and truly phenomenal ! Love from Madras !
Chennai
BAM! Thank you very much!!!
Hi Rahul I taught atIIT-madras 19192-1993 lived on campus across from post office josh visited us there
Frank Starmer Hello Frank, wow! That’s great to know ! :) I’m sure you must have had a good time here. Cheers :)
@@starmerf wow the world is a small place ☺
Your pronunciation is the most authentic and clearest that I have ever heard
Wow! Thank you!
I love your teaching style. Extremely helpful for a beginner like me. Really helped me a lot in my exams. No words. You are the best!!!!
Thank you!
Man, this video is awesome! Congratulations!
Thank you! :)
Wow. Finally I see a face for the name. Your previous videos have had immensely helpul. I assumed you are a very senior person. I am not measuring your age. I mean, your way of explaining seemed like a professor with half a century of experience. But in reality, you are quite young. Thank you for all your simple-yet-detailed videos. No words to quantify how much I appreciate them. 🙏
Wow, thanks!
Great video! You did a wonderful job of explaining the process. Thanks!
Thanks!
I haven't watched it yet but I know this will be great!!!!!!!! Thank you Josh.
BAM! :)
BAMMMMM !!!
This is awesome 👍 Josh !! Thank you for your contribution, really helpful for new learners.😊😊😊
Glad you liked it!
@@salilgupta9427 Thanks!
Thank you for your job, the explanation of the topic is very clear and transparent.
Thank you very much! :)
This kind of content is SUPER HARD to produce. I really understand and appreciate your effort here. Thanks and congratulations.
Thank you very much!
What did we do to deserve a great guy like Josh ? Thank you Josh!
Thanks! :)
Hey Mate, amazing tutorial. Very complex problem explained in really simple and effective way. I am using XGBOOST for one of the classification model and after watching your video it made me realise I can further improve my model. So thank you again and keep making those videos. Kudos to you and long live data science 🙏🙏
Glad it helped!
Amazingly organized and well explained!
Thank you!
Freaking amazing! You explain everything so well. Thank you!
Thank you!
just here to say thank you! will come back in a month when I have time to watch it. :)
BAM! :)
Jesus, i just learned more over 10 minutes of this than i did throughtout an entire semester of a similar subject on CS. ++ tutorial
Thank you!
Lovely and priceless video Josh...BAM BAM BAM as usual !! :) God bless. .
Thank you very much! :)
I love the channel! Eu aprendo + aqui do que a Graduação! You great josh!
Muito obrigado! :)
many thanks to your great and so understandable video. It literaly helps me a lot in Python and XGBoost package
Glad it helped!
You're a very kind human being Josh!! Thank you so much for making these videos. Your content is gold!!! I am new to data science and this is exactly what I needed!! :)
Much love from India!
Glad you like my videos!! BAM! :)
@@statquest Hey Josh! I am learning about Bayesian Optimizer and I don't seem to get it even after watching tons of tutorials, can you suggest where I should learn it from please? I couldn't find a video on your channel on this.
@@muskanroxx22 Unfortunately I don't know of a good source for that.
Great explanation and walk-through, big thanks!
Glad you enjoyed it!
Yes pls more videos with python❤thank u for the webinar
Thanks! :)
hurray, i picture you totally different!
Thanks a lot for all the videos!
Glad you like them!
all week searched for this thank u very much
Enjoy!
Thank you so much for the work that you used in step by step tutorial. it was amazing.
You're very welcome!
you are simply an amazing human being, also the notebooks are great! :D
Thanks!
I'm very grateful to have you as my teacher.
Thanks!
Amazing! Thanks so much for the detailed video
Thanks!
This is a great piece of work, thanks for sharing it!
Maybe the only additional piece I'd add which I've found useful on the documentation of XGBoost is that one can take advantage of parallel computing (more cores or using a graphic card your machine or you could have on the cloud) by simply passing the parameter (n_jobs = -1) while doing both, the RandomizedSearchCV stage and the setting the XGB regressor type (XGBRegressor for example).
Great tip! BAM!
Thank you Josh. Needed this tutorial to better solve a ML Problem as part of my internship :)
Glad it helped!
Great video! Love it!
request that you do a comparison of XGBoost, CatBoost, and LightGBM, and a quest on ensemble learning.
I'll keep those topics in mind.
Love u Josh.... you are a TRIPLE BAM!!! Greetings from Bogotá, Colombia.
Muchas gracias!!! :)
That's the Best video I've ever seen. Period.
TRIPLE BAM! :)
Wow, thanks!
It was extremely helpful. Please continue making these videos. I suggest making a video to explain the clustering with unlabeled data, and predicting the future trend in time-series data.
I'll keep that in mind. :)
Thank you for the awesomeness!!
bam!
Appreciate the Python related videos...helps to manoeuvre the code when I try to replicate the method later on...easy to follow the whole thing, also for beginners... 🙂
Thanks! There will be a lot more python stuff soon.
Thank you for the great work!
Wow! Thank you so much for supporting StatQuest!!! BAM! :)
Thank you very much for nice video! Very helpful for me.
Glad it was helpful!
You are amazing! Thank you so much !!
You're so welcome!
Thanks a lot Josh!
Any time! And thanks for your support! :)
Liked, favorited, recommended, shared, and sacrificed my first-born to this video.
TRIPLE BAM! :)
😂😂😂
Thank you Josh!!
As a suggestion, you could do a StatQuest explaining the measures in market basket analysis?
I'll keep that in mind.
This guy is amazing.
DOUBLE BAM 💥 💥
Thank you! :)
Thanks for sharing! Informative.
Thanks!
Another great tutorial. Thx
Glad you liked it!
Reaaally amazing!!
Thanks!
This is gold, thank you! I am a rookie of this stuff, still I am unsure one-hot encoding is the best to do, especially to encode the city; being a category with high cardinality, all those variables for 1-hot encoding will require many splits (I guess). Perhaps using a different encoding, like mean encoding or frequency encoding, would be better, may allow to have a good fit with fewer splits.
Maybe. Try it out and let me know if you get something that works better.
could not help u with money right now , but i watched all the adds in video , hope that helps u financially . love u videos . keep up!!
I appreciate that
This is magical.
Bam! :)
Awesome man
Thanks!
Great one!
Thanks!
Another very good video!
Thank you!
Good to also see you sing rather than just hear :).. i had to comment this even before starting the training
😊 thanks
Wonderful video josh.....pleasee pleasee pleasee make more videos on start to finish on python for different models.....i havr actually submitted my assignments using your techniques and got better results than what i have learned in my class
Waiting for more to come especially on python :)
Thanks! There should be more python coming out soon.
Josh, you're the didactic in person form.
Thanks!
I appreciate that!
Hello josh, you are doing amazing work keep doing
Thanks!
BAM! Well done.
:)
Great Content, subscribed
Also, single best python package run through Ive seen.
Thank you very much! :)
This guy is just amazing
Thanks!
I'm so glad you are a bad-ass stats guru and a teacher waaaaaaaaay before a singer and a guitarist ...Thank you! ;)
joshuastarmer.bandcamp.com/
StatQuest with Josh Starmer ...not bad. A poor man’s Jack Johnson 🤔
Just pulling your leg. Thanks for all the content on stats
Triple Bam! thanks for your great tutorial
Any time!
# very helpful and informative, thank you!
Thank you! :)
That was amazing
Thanks!
Thanks a lot!
Thanks!
Really cool!! BAM BAM BAM!!
Thanks! :)
Thank you so much^^
bam!
Amazing Content
Thanks!
amazing tutorial Josh! Shared with my friends =D
Could you do one of these about pygam? It would be amazing :)
I'll keep that in mind.
"25:36" that's what i was waiting for from the beginning...Truly amazing.. You are providing precious information..CHEERS
Glad it was helpful!
@@statquest one small request..can you provide some valuable information through a video like which model to chose for different datasets..how do we decide what model we should chose...thanks in advance
@@RahulVarshney_ I'll keep that in mind. In the mean time, check out: scikit-learn.org/stable/tutorial/machine_learning_map/index.html
@@statquest that is amazing ...i will complete it today itself thanks again for your prompt reply
Can i get your email
It is really lovely to be able to put a face to the "Hooray!", "BAM !!!" and "Note:"s 😄❤
bam!
Thank you so much for your hard work! I've learn so much watching your channel. Could you please explain why I shouldn't use one hot encoding while doing linear regression and what should I use instead?
I explain how to encode things for linear regression in this video: ruclips.net/video/CqLGvwi-5Pc/видео.html
Excellent Video @StatQuest ! Can we please have more Start to Finish python videos? Like Lightgbm maybe?
I'll keep that in mind! :)
Thank you!
You're welcome!
Thanks!
WOW! Thank you so much for supporting StatQuest!!! BAM! :)
Hard work here, I'ts funny how the responsabile scientist and the funny guy coexist, very useful lesson, thanks!
Thanks! 😃!
Thank you so much Josh Starmer! BAM!
bam!
Yikes, if I ever understand something enough to explain it as succinctly as you do then I'd be very happy. I've been smashing through a lot of your videos the last few days after spending countless months on python, sklearn and all the usual plug and play solutions and it's not been until I've started watching these that I've started to feel things click into place
Awesome! I'm glad my videos are helpful! :)
First you’ve saved me this is super clear! I love all your videos so much 😊
I do have two questions…
1. How would you handle a classification problem with time series data?
2. Is there any other evaluation test you should or could do to evaluate the effectiveness of your model?
1. I've never used XGBoost with time series (or done much of any time series stuff before), so I can't answer this question.
2. There are lots of ways to evaluate a model. I only present a few, but there are many more, and they really depend on what you want your model to do. Just google it.
very good explaination
Thank you!
Thanks Josh for another GREAT video! Just some sharing and minor questions.
1. try pandas_profiling when doing EDA. I personally love it. :)
2. some features are highly correlated (eg: city name and zip code). Do we need to handle that before running XGB?
3. Why choose 10 for early_stopping_rounds
4. What’s the difference between
- df.loc[df['Total_Charges']==' ']
- df[df['Total_Charges']==' ']
5. What’s the difference between
- y=df['Churn_Value'].copy
- y=df['Churn_Value']
Many thanks in advance!
H
1) Thanks for the tip on pandas_profiling.
2) No.
3) It's a commonly used number
4) I don't know.
5) I believe the former is copy by value and the latter is copy by reference.
Great video! Very informative and clearly explained! Could you please also present BART?
I'll keep that in mind.
Awesome hooray
BAM! :)