Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption

Rob Mulla

Просмотров 455 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 янв 2025

Комментарии • 438

@casperj4784 2 года назад ⁺¹²⁴
A comprehensive yet succinct tutorial. And, having only just finished my Data Science degree, I found it very reassuring to see that you do get faster and more proficient with time.
@robmulla 2 года назад ⁺¹⁸
I absolutely love messages like this. Glad to hear you found this helpful and it gave you the reassurment that things get faster. I can tell you that they do! The goal of my channel is to "spark curiosity in data science" I hope this video did that for you.
@RaviKumar-uf3eo 2 года назад
Yes. It is very reassuring, but most probably he would have kept all the things ready.
@amirghorbani7922 Год назад
It is better to use icdst Ai predict lstm model.
@naderbazyari2 9 месяцев назад ⁺³
Second time watching this and doing every step on my notebook as Rob goes through the task. I am still blown away by the intricacy of his approach and how he investigates the case. fascinating how he makes it look effortless. Many thanks
@zhuoningli Год назад ⁺³³
Hi Rob! Your tutorials help me get a job offer! When I was searching for a job, I received a take-home technical exercise about time series forecasting. I watched this video and finished my exercise. Finally, I got my dream job! Thank you so much!!! I really appreciate your tutorials! 🥰
@robmulla Год назад ⁺¹¹
Whoa, I really love hearing stories like this. That's amazing and I wish you the best in the rest of your career.
@nirbhay_raghav 2 года назад ⁺²⁹
Hands down, the bestest (if that is a word) video on the entire internet about implementation. No fancy stuff. Not too beginner and toy examples. Hust the right thing what a budding data scientist needs to see. And it is definitely reassuring to see that one can really get better and faster at doing these after a while. It takes me a lot of time reach what you have done in under 30min. Debugging things take a lot of time.
@robmulla 2 года назад ⁺¹
I really apprecaite your positive feedback! Glad to hear you find it encouraging that eventually things will get faster.
@a.h.s.3006 2 года назад ⁺²⁷
I worked with time series before, and this tutorial is very thorough and well made.
Additional features you could think about are lag/window features, where you basically try to let the model cheat from the previous consumption, by giving it a statistical grouping of previous values, let's say the mean of consumption within a window of 8 hours, or by outright giving the previous value (lag), let's say the actual consumption 24 hours ago.
This will greatly improve performance, because it helps the model to go follow the expected trend.
@robmulla 2 года назад ⁺⁸
Thanks for the comment! Glad you enjoyed the video even though you already have experience with time series. You are 100% correct about the lag features. Check out part 2 where I go over this and a few other topics in detail.
@flel2514 Год назад ⁺⁸
Hi Rob, I am a fresh data science graduate, and I find this tutorial very well done and very helpful for those that approach TS for the first time as well as for those that want to refresh the topic
@karishmakapoor4285 Год назад ⁺¹¹
Amazing flow, comprehensive yet smooth. Detailed yet generic. I love the way you think and your float across the entire process. I did this project myself and thoroughly enjoyed it. Cant wait to apply this to other datasets. A Big thumps up👍
@adityaraikwar6069 Год назад ⁺⁶
Being a sort of early intermediate data scientist myself, it's very cool watching him do all these things and the most amazing thing is how everybody's mind works differently and how proficient you become in not only coding but also in approach towards a problem. keep that up man
@paultvshow Год назад
Hey, have you landed a job in data science field?
@digitalnomad2196 11 месяцев назад
also curious to know, recent data science graduate here@@paultvshow
@AnalyticalAnuj Месяц назад
Rob, what you have done in less than 25 minutes and the way you explained your approach, its just effortless yet very effective. Thanks for this gem of a content
@miqueledosanchez7303 3 месяца назад ⁺¹
That's probably the best tutorial i've ever seen in this area. Hope it helps me to do my final degree project. Thanks from spain!
@MilChamp1 2 года назад ⁺³⁶
This was a very nice introduction to this topic. You might consider turning this into a miniseries, since it's such a large topic; the next video might be on how to create the best cross-validation splits for timeseries
@robmulla 2 года назад ⁺¹⁰
Thanks so much. There is so much to cover with time series. I may consider a miniseries that’s a great idea. I’d like to make one on prophet which is a great package for time series forecasting too.
@JacksonWelch 2 года назад ⁺¹³
Love these videos. As a data engineer I love seeing other peoples workflows. Thanks so much for posting.
@robmulla 2 года назад ⁺¹
Glad you liked it. Thanks for watching Jackson.
@PRATEEK30111989 11 месяцев назад ⁺¹
I have never seen a better data science video. You are a savant at this
@sandyattcl 2 года назад ⁺³
what an amazing tutorial! I just had to give a thumbs up even before finishing the video.
@robmulla 2 года назад
Really appreciate that Sandeep. Please share the link with anyone else you think might also like it.
@luismisanmartin98 7 месяцев назад
As someone just getting introduced to time series analysis, this video was gold, thank you for making it!
@sevenaac4783 8 месяцев назад ⁺¹
Thank you for teaching me. It allows me to understand the time series XGBoost in the shortest time.
@super-eth8478 2 года назад ⁺¹
Dude your channel is a gold mine ..
@robmulla 2 года назад
Thanks so much for that feedback. Now share it with anyone you think might appreciate it too!
@super-eth8478 2 года назад ⁺¹
@@robmulla Actually I have shared it to my friends . Cheers !
@selenkokten1708 Год назад ⁺⁷⁴
Don’t use features like year which will not have the same value in the future. It is a bad idea for prediction purposes. Instead use the difference from the minimum date to see if there is an increasing trend year by year.
@paultvshow Год назад ⁺³
Please elaborate
@irshadyasseen146 Год назад ⁺²
Can you provide an example?
@solisoma1012 Год назад
Can I have ur social media handle so I can ask you some questions
@John5ive 8 месяцев назад ⁺²
I get it. The year increments and provides no value to the model.
@warmpianist 8 месяцев назад
The difference from minimum date also won't have the same value in the future. I don't know what you mean.
@jelc 2 года назад ⁺⁴
Really well focused and clearly explained. Love your work!
@robmulla 2 года назад
I appreciate the feedback Julian
@beckynevin1 10 месяцев назад
Wow! I'm trying to get up to speed on XGBoost, so I clicked on this video. There are a lot of meh data science tutorials out there, so it was such a treat to come across this one after slogging through youtube. I immediately subscribed and am headed to your channel to watch more videos on time series prediction!
@hussamcheema 2 года назад ⁺³
I love your content. Liked the video before watching it because I know this is gonna be a great tutorial.
Thanks for making these tutorials. 😊
@robmulla 2 года назад ⁺¹
Thanks! Glad you find it helpful.
@NotesandPens-ro9wx Год назад
Man I am seeing this after an year and your teaching style is just hell .. now sub done and will follow you on other things :) for sure
@musicplace9205 Год назад
Thanks! one of the best video I've ever seen. Simple, clear and overall why each concept is used for.
@rodolfoviegas8504 Год назад
Amazing. We've learnt time series prediction only by statistical methods and/or making ML models to act like ARIMA - making lags for feed them. This approuch very interesting and intuitive. Thanks, Rob
@akshaymbhat9144 Год назад ⁺¹
Thanks for the wonderful video. It's very insightful ❤️ from India .
Keep inspiring and aspiring always!!
@robmulla Год назад
My pleasure! So happy you liked it!
@fudgenuggets405 Год назад
I like this dude's videos. They are informative and to the point.
@ADaBaker95 Год назад
Best video on the subject I've found so far!
@Arieleyo Год назад ⁺¹
Love your videos Rob!! cheers from Argentina ♥
@robmulla Год назад
Sending my ❤ back to Argentina. Thanks for watching!
@gabrielmoreno2554 2 года назад ⁺⁴
Wow, this is exactly what I needed to learn to improve my COVID death predictor. Great job!
@robmulla 2 года назад ⁺¹
So glad you found this helpful. Thanks for watching!
@22niloc Год назад
I'm getting to know Time Series and your vid has loads of great starter points.
@TrueTalenta Год назад
I am new to time series and this by far is very informative and quit succinct!
@peralser Год назад ⁺¹
Great Video ROB, Thanks for sharing with us!!
@robmulla Год назад
Thanks for watching!
@leo.y.comprendo 2 года назад ⁺¹
This is incredible! Instantly subscribed!! thanks for your knowldege
@robmulla 2 года назад
Thanks for watching!
@michaelmebratu2921 Год назад ⁺¹
What a quality tutorial! Thank you so much
@robmulla Год назад
Glad you learned something new!
@PatrickLane-f2n Год назад
You have helped me so much with this video, you don't even know!!! Thanks so much :)
@H99x2 2 года назад ⁺²
Incredible content and explanation. You definitely have a knack for this. I subscribed for more videos like this! Thanks :)
@robmulla 2 года назад ⁺¹
Thanks for watching and the feedback!
@egermani 2 года назад ⁺²
Great content! Thanks a lot for the explanations, they are a great incentive to dive deeper into the subject.
@robmulla 2 года назад
Glad you think so! My hope is that by making short videos that explain a topic at a high level like this will spark curiosity in people so they will dive deeper into the topic, just like you said.
@Mvobrito 2 года назад ⁺⁶
Great video!
If the goal was prediction only, and not inference (meaning you don't care about what's driving the energy consumption), you can the energy consumption of the previous days as feature for the model.
When predicting consumption at T, you can use T-1, T-2, .. T-x.
And even a moving average as feature as well.
@robmulla 2 года назад ⁺¹
I totally agree! It all depends on how far in the future (forecasting horizon) you are attempting to predict.
@69nukeee Год назад
Such an amazing video, thank you Rob and keep 'em coming! ;)
@Singularitarian Год назад
Very illuminating! Learned a whole lot in just 23 minutes.
@evandrogaio7003 Год назад ⁺¹
Such an excellent video. Thanks for sharing!
@robmulla Год назад
Glad you liked it!
@lamborghiniveneno8423 2 года назад ⁺¹
Simply awesome tutorial😀
@robmulla 2 года назад
Thanks so much!
@liliyalopez8998 2 года назад ⁺⁴
I just started studying ML and this tutorial is super helpful. I would like to see how you would use the model for forecasting future energy consumption though
@robmulla 2 года назад ⁺³
Welcome to the wonderful world of ML Liliya! Yes, I did forget to cover that in detail but I may in a future video. It's just a simple extra step to create the future dates dataframe and run the predict and feature creation on it.
@kvafsu225 2 года назад ⁺¹
Great lesson on machine learning. Thank you.
@robmulla 2 года назад
Thank you for watching. Share with a friend!
@wazzadec16 Год назад ⁺⁶
FYI for anybody who is doing this recently. The part where combing training set and test set graphic and using a dotted line has to be modified.
Before: '01-01-2015'
After
ax.axvline(x=dt.datetime(2015,1,1)
Since matplotlib now needs it in a datetime series. I guess because of changing the index to a t0_datetime format?
@shrunkhalawankhede2611 Год назад
from datetime import datetime
ax.axvline(x=datetime(2015,1,1), color='black', ls='--')
@chrispumping Год назад
Very informative and easy to understand tutorial....Thanks you
@robmulla Год назад
You are welcome! Thanks for watching.
@MeghaKorade Год назад ⁺⁴
Hello Rob, Great tutorial! I have a question - In eval_set you're using [(x_train, y_train), (x_test, y_test)] whereas in most data split practices I've seen validation set separated from training data (which not part of either training or testing set)? Can you please check at timestamp 14:02 ?
I'm trying to implement something similar on an interesting dataset and this is a great tutorial!!
@romanrodin5669 2 года назад ⁺¹
Great video! Very clear and easy for understanding! Thanks a lot for clear explanation! I've got a few questions though regarding lagging data for better prediction) will jump into next video, it seems I get an answer there) thanks again!
@robmulla 2 года назад
Glad you liked it. Yes, the next video covers it in more detail!
@lovettolaedo223 Год назад
I enjoyed watching this as it has given me more insight into prediction.
Kindly do a video on GDP growth forecasting using machine learning.
Thank you.
@yosafatrogika3129 2 года назад ⁺¹
so clear explanation, thanks for sharing!
@robmulla 2 года назад
Glad it was helpful!
@gustavojuantorena 2 года назад ⁺¹
"And depending who you ask" 🤣Great video!
@robmulla 2 года назад ⁺¹
I’m glad you got the reference. I was hoping he would see and appreciate that part of the video.
@tatulialphaidze90 2 года назад ⁺¹
Thank you for this tutorial, definitely helped me out
@robmulla 2 года назад
Glad it helped!
@yourscutely 2 года назад ⁺¹
Perfectly explained, thanks a lot
@robmulla 2 года назад ⁺¹
You are welcome! Glad you found it helpful. Check out parts 2 and 3 and share with a friend!
@troy_neilson Год назад
Informative and well-structured. Thanks!
@inovosystemssoftwarecompan6724 Год назад
short and potent, great fluid presentation !!
@GodX36999 2 года назад ⁺¹
Best one I ever seen ❤thank so much.
@robmulla 2 года назад ⁺¹
So glad you like it. Thanks for the comment.
@raasheedpakwashi2961 2 года назад ⁺¹
LEGEND...no other words needed
@robmulla Год назад
Thank you 🙏
@mukhtarayusuf4787 2 года назад ⁺¹
Thanks!
@robmulla 2 года назад
Whoa!! Super thanks! I appreciate that. Glad you liked the video.
@lolmatt9 11 месяцев назад
Very well explained and useful. Thank you!
@azizbekurmonov6278 Год назад
Thanks! Love your explanations.
@a.a.elghawas Год назад ⁺¹
Cool video Rob!
@robmulla Год назад
Thanks for watching!
@jessenthebenezer 22 дня назад
well done, you are so fast, i guess its the experience.
@Lirim_K 6 месяцев назад ⁺⁸
Rob, are you aware that you have made a crucial forecasting mistake? You used the test set for validating the model when fitting, then you used the same test set when you made the final predictions and evaluated it on the same set. The problem is that during the fitting, the model gets to see the test set so you have data leaked into the past, from the future. What you should do is to split the data into train/val/test where the test has never been seen by the model.
@alvaro1-x5k 6 месяцев назад
Totally agree. Its data snooping. Nevertheless there are some cases where you can use all data to validate if you then receive the test set where you can check how the model generalizes.
@HillryesURA 2 месяца назад
Another question, he uses the test set as a base to predict the model, is this correct? In a real environment this “test” is the future, how can we use this in the predict function?
@EzzatHegazyEzzat 11 дней назад
Yes, this is data leakage
@EzzatHegazyEzzat 11 дней назад
Yes, this is data leakage
@Tonitonichoppa_o Год назад
This is the best!! Thank you so much :D 감사합니다!!
@ramizajicek 2 года назад ⁺¹
Thank you for the great presentation
@robmulla 2 года назад
I appreciate you watching and commenting. Share with a friend!
@ikhsansdq 2 месяца назад
Great explanation bro! But I do have a question though, in minute 13:23 you declare the feature and the target from the dataset, what if the dataset is univariate? Should it be declared as features none other than the target or should it be decomposed first?
@anatoliyzavdoveev4252 Год назад
Fantastic video tutorial 👏👏🙏
@李尚諭-z5w 10 месяцев назад ⁺¹
謝謝！
@datalyfe5386 2 года назад ⁺¹
Just came across your channel, awesome content!
@robmulla 2 года назад
Welcome aboard! Glad you like it.
@christianmannke2335 11 месяцев назад ⁺¹
First of all, thank you for this comprehensive video. It helped me a lot to understand this kind of prediction better. However, what I still don't understand is how can I make predictions on new data that the model hasn't seen before? Let's say I want to make predictions from 2018-08-03 for the next 30 days.
@HillryesURA 2 месяца назад
As far i can remember, you need to use a rolling window methodology. With this, your test set will be the last 7 days to predict the 7 days ahead, for example.
@blueradium4260 2 года назад ⁺¹
Brilliant video, thank you :)
@robmulla 2 года назад ⁺¹
Thanks for taking the time to watch.
@muhammadkashif7263 Год назад ⁺¹
Amazing season ❤
@robmulla Год назад ⁺¹
I appreciate the feedback.
@robmulla Год назад
Thanks!
@ademhilmibozkurt7085 2 года назад
I love this video. Please make more. Thanks
@robmulla 2 года назад
Thanks! I apprecaite the comment. Have you seen the part 2 that I have on this topic?
@hannukoistinen5329 Год назад
Much more is needed when you do a relevant time series analysis!!!! And I suggest to forget Python and instead of it to use R!!
@santi6107_2 Год назад ⁺³
Should you not split the training data into train and validation sets, such that you can use validation set instead of test set during training ? (when you use "eval_set" parameter ?)
@Personal-f4d 6 месяцев назад
I also thought of that. I suppose it was out of the scope of the tutorial to keep it simple.
@SP-db6sh 2 года назад ⁺²⁹
Next LSTM ,autoencoder, Deep Reinforcement Learning for finance ?
@robmulla 2 года назад ⁺¹²
I could make a video about LSTMs however those types of complicated models tend to be outperformed by models like XGBoost on these types of datasets. There are a few papers on the topic and forecasting is both an art and science.
@PerteTotale 2 года назад ⁺⁵
@@robmulla yes, but LSTM should pick up better the long lag influence, while XGB is random based and can be tuned in some directions, but not as much / fine for this aspect as the former. And it seems that SARIMAX is out of favor, shame. Also a try out with VAR was pretty ok for +- known time lags. Any thoughts?
ps: add more max depth, to get more feat coverage...
@hasanalmatrouk2593 5 месяцев назад
Thank you so much. you are a LEGEND!!
@sathvikmalgikar2842 Год назад ⁺²
sir you are legend. thank you i was banging my head with lstm model in pytrch previosly but this is way better
@robmulla Год назад
Glad you got something working.
@prasadjayanti 8 месяцев назад
Very good explanation.
@demaischta1129 Год назад
This is so helpful. Thank You!!
@AFlockOfToasters Год назад ⁺³
Question: aren't you involving the validation dataset in the training process when including it in the eval_set?
@robmulla Год назад
I don’t think so. What do you mean?
@Burnitall220 10 месяцев назад
This is incredible!!
@datasciencesolutions2361 2 года назад ⁺¹
Great job sincerely!
@robmulla 2 года назад
Thanks for the feedback!
@rohitraghunathan Год назад ⁺¹
Very good introduction to time series forecasting in xgboost. One thing to note is xgboost has a function for plotting the feature importance. xbg.plot_importance() would have done the trick for you
@aghedostudios3126 2 года назад ⁺²
I cant believe xgboost can do time series analysis as well
@robmulla 2 года назад
Yes! I works well for time series data that is stationary. It wouldn't work well for time series that will have values in the future way outside of what has occurred in the past.
@OskarBienko Год назад
@@robmullao be fair, there is no model that would work if the process is non-stationary - SARIMAX, Random Forest, Linear Regression etc. How about addressing autocorrelation in the underlying process when using xgboost? I think you should've plot pacf and acf, and add some lagged power consumption to the features accordingly.
@tomshaw7179 Год назад ⁺²
Thanks for this video Rob. I am quite new to data science and this was really clear. Have you done a video on optimization maybe using light GBM?
@indranil9874273176 2 года назад ⁺⁵
How rational is it to use a tree-based model for time series forecasting? Not sure about XGBoost, but in general tree-based models can not extrapolate, meaning the predictions would be bounded by the minimum and maximum of the training target variable. If we have a time series with an increasing trend, is that a good option? Btw, just subscribed :D
@robmulla 2 года назад ⁺¹
Thanks for subscribing. You are correct that this type of model will no do well predicting unseen values. However for this type of dataset it can work well. I mention this earlier in the video when talking about the different types of time series. Hope that helps.
@Personal-f4d 6 месяцев назад
Thanks so much for this video.
It would be cool as well to see a video with xgboost mainly about feature engineering using aggregate data(for example the average of the last 30 days) while using cross validation appropriately to avoid data leakage.
Would hyperparameter tuning with GridSearchCV would have to be sacrificed since you can't easily control creating these features using aggregate data within each dataset split made in the cross-validation?
Thanks so much for your enlightening and amazing videos. I highly appreciate your work.
@Ricocase Год назад ⁺¹
How did you spot overfitting w/o a marker?
@somesome559-i1z Год назад ⁺³
A question. I see the prediction was done on test data which are already available. This is good to see how accurate the model is but I am wondering how we can use this model (and xgboost in general) to forecast the upcoming years for which we do not have any data.
@HillryesURA 2 месяца назад
Maybe you need a rolling window methodology. The video’s methodology doesn’t make sense to me.
@sunaxes 10 месяцев назад
Would be useful to look at feature importance at the inflexion point of the test set performance and at the end of training and compare. Features highly ranked in both are the ones useful to understand pattern in data and also also satisfy labelling requirements.
@bryan-9742 Год назад
my understanding was that you actually need to go into the feature importance method within XGBoost as this 'feature importance' was not designed for time series. Clustered Mean Decrease algorithm or shapleys algorithm are much more suited for times series feature engineering.
@Frdy12345 2 года назад ⁺¹
Great video, thanks.
@robmulla 2 года назад ⁺¹
Glad you liked it! Thanks for the feedback.
@exampyklx7953 2 года назад ⁺¹
WOW! This is AMAZING content. May I know what book you studied. Thank you.
@robmulla 2 года назад ⁺¹
Glad you liked it. Most of what I shared in this video I learned through kaggle and working in industry, there was no one book I could point to.
@zengzlli 2 года назад ⁺¹
Nice tutorial 👍
@robmulla 2 года назад
Thank you 👍
@jiyanshsonofdr.rajesh8516 2 года назад ⁺²
Nice explanation..
@robmulla 2 года назад ⁺¹
Thanks for liking
@yahyazakariaalimohammed4391 2 месяца назад
Very helpful, thanks a lot
@audricsatya9577 2 года назад ⁺¹
Very nice introduction and tutorial, can you do lstm too?
@robmulla 2 года назад
Thanks so much. Yes I plan to release one on LSTMs soon.
@mirror1023 Год назад ⁺¹
Amazing video
@robmulla Год назад
Thanks!
@neclis7777 2 года назад ⁺²
Excellent video ! For weather, I suggest you look into HDD and CDD (heating degree days and cooling degree days) which focus on the amount of heating and cooling rather than the mean temperature.
@robmulla 2 года назад ⁺²
Thanks for the tips! I'm not familiar with those but I will look into it. The one main issue I see when people are training forecasting models like this is using the ground truth weather for future dates- which are not available at the time of prediction. That's why I think it's best to use forecast values from the historic dates.

Следующие

Автовоспроизведение

Time Series Forecasting with XGBoost - Advanced Methods