Time Series Forecasting with Xgboost

Поделиться
HTML-код
  • Опубликовано: 19 июн 2024
  • Forecasting with regression
    Follow me on M E D I U M: towardsdatascience.com/likeli...
    INVESTING
    [1] Webull (You can get 3 free stocks setting up a webull account today): a.webull.com/8XVa1znjYxio6ESdff
    CODE: github.com/ajhalthor/Time-Ser...

Комментарии • 56

  • @arianvc8239
    @arianvc8239 3 года назад +2

    Great video! Thank you!
    I look forward to see what you do with prophet.

  • @shivam_dxrk
    @shivam_dxrk Месяц назад

    The Best Creator for DS I've found, Thanks a lot!

  • @abhinavkhushraj5487
    @abhinavkhushraj5487 2 года назад +10

    Great video. Hands on code but each step explained really well. You can be a good teacher!

  • @0xsuperman
    @0xsuperman Год назад +3

    Can you also add a feature such as number of orders from last week (autoregressive)? And if you do that, should you use the actual value from last week, or the predicted values within the test set? If you use actual values, does it consider data leakage?

  • @DrJohnnyStalker
    @DrJohnnyStalker Год назад

    you can have a custom objective function to optimize for mae.
    def mae_loss(y_pred, y_val):
    grad = np.sign(y_val-y_pred)*np.repeat(1,y_val.shape[0])
    hess = np.repeat(0,y_val.shape[0])
    return grad, hess

  • @zakiakmal85
    @zakiakmal85 2 года назад

    Really loved your explanation. Thanks. Also, I can see only few Time Series related videos, would love to see more content on this particular topic.

    • @zakiakmal85
      @zakiakmal85 2 года назад

      One question though. In ML approach, do we not take care of stationarity component like we do in traditional forecasting?

  • @salaisivamal7465
    @salaisivamal7465 2 года назад

    wonderful explanation. great details to understand the Boosting and how it fits for time series.

  • @user-pt3gb8dt2p
    @user-pt3gb8dt2p 3 года назад +2

    Great video Ajay. In regards to your comment on shuffling the training set leading to data leakage, I don't think that would be an issue. XGB isn't regressing on the sequence of prior values, only the 2 features based on past data you are giving it. It will would still predict the same values (assuming random seed is fixed to ensure xgb runs are deterministic) regardless of the order of the table. AFAIK this is only problematic in traditional TS models. Let me know if I'm way off base, interested to hear your thoughts.

    • @CodeEmporium
      @CodeEmporium  3 года назад +2

      Great point. My main concern here the training set can still pick up on trend even though every sample is technically independent of each other. For example, say in January 2020, there was a major policy change that affected order volume of the restaurant. The 2 features for any sample post January 2020 will be influenced by the policy change. But those before January 2020 would not have been. Shuffling can thus make out model perform better than it actually would have (since it has seen the affects of the policy change in the form of those 2 features) when it probably shouldn't have been able to do this

    • @abhirama
      @abhirama Год назад

      @@CodeEmporium Can we not just construct a new feature out of this? policy_change = 0/1?

  • @kumarkushagra7054
    @kumarkushagra7054 Год назад

    Great Video!! How can we implement this for predicting for future values.. as in this case we need to predicted the next-to-next week sales based on the predicted sales of next week. I hope u understand my question. Thnaks!!

  • @kabeerjaffri4015
    @kabeerjaffri4015 3 года назад +1

    Just in time also if u want please document time-series databases

  • @salaisivamal7465
    @salaisivamal7465 2 года назад +1

    clear and crisp.

  • @emmanuel3047
    @emmanuel3047 3 года назад

    Will it not be easier to create new features by lagging the labels plus how do you predict future values?

  • @shwetakulkarni815
    @shwetakulkarni815 2 года назад

    Great Video...how to do global XGboost time series forecasting for Multiple time series?

  • @joapen
    @joapen 3 года назад

    very nice explanation, many thanks for the video!!

  • @MrRugbyferdinand
    @MrRugbyferdinand 3 года назад

    Thanks for the video and the presented code, very intuitive.
    Regarding your general underprediction at the end of the presented timeframe: Instead of replacing your NaN values with zeros, you could replace them with an average of previous orders of the weeks before whereas the exact number of weeks would underly again hyperparameter optimization. Maybe that solves the issue.

  • @patite3103
    @patite3103 3 года назад

    You're a ML guru! Great video!
    I don't understand the point of using the query function. Could you do a video on this topic?

    • @gazergaming1248
      @gazergaming1248 2 года назад

      this is a little late but in case you're still wondering - the idea behind the query function is to make code that is more adaptable to what a typical data scientist' data and environment would look like. Most companies primarily store their open data throgh SQL (or other databases) not in the form of downloadable csv or excel files. Accessing individual files like that is overall pretty inefficient and would result in a messy notebook that would mess up if any files are moved around on your personal hardrive. It also then makes the code dependent on that hardrive, which makes it so your notebook or code is not cross compatible with other devices or other employees who might want to work on it or take it over from you. It's an overall better practice.

  • @deepanshudashora5887
    @deepanshudashora5887 3 года назад

    It is clear mr. .....

  • @NierAutomata2B
    @NierAutomata2B 3 года назад +4

    Thanks for the educational video! I'm new to time series forecasting so I have a naive question: I see you labeled each row (weekly order count) with its next week's order count. And for each row, the features you used are [order_count_7_day, order_count_30_day]. In that case, it seems that for each row, the model only has the two features for that row to make a prediction. How can we leverage more past signals? I'm thinking like for time t, we can use the numbers all the way from [t-k, t-k+1, t-k+2... t-1]. Is that a better way? But the features for each row will have a lot of overlaps vs. the surrounding rows, not sure what's a reasonable way of feature engineering for this. Any suggestions?

    • @donnik7064
      @donnik7064 3 года назад +1

      I have the exact same question! Looking forward to an answer :)

    • @MrDjRoKoLoKo
      @MrDjRoKoLoKo 2 года назад +1

      More variables will not necessarily improve model performance. If adding more past instances of the target (auto-regressive terms) are informative of future values of the time series, then it makes sense to add them, otherwise it can lead to overtraining your model on spurious correlations with the new terms. EDA and iterating between different variables can shed some light into what yields the best results. ACF and PACF plots can help determine a starting point for the right amount of auto regressive terms.

  • @riosaputra2979
    @riosaputra2979 2 года назад

    Hi, I checked the data of the restaurant order and I found that the last order is only up to 03/08/2019 (3rd of August 2019). I couldn't find anymore data beyond 03/08/2019, but how come the last date on the daily number of order graph is 2019-12-7 (7th December 2019).
    Also on the weekly number of orders the last date is 2019-12-02 (2nd December 2019)
    Is there any mistake on the calculation of datetime or timeframe in the pandas sql? Thanks

  • @richarda1630
    @richarda1630 3 года назад +3

    If I do this, I'm going to get hungry :P give me some chicken Tikka Masala or some mutton :D

  • @tobiasmuenchow9884
    @tobiasmuenchow9884 8 месяцев назад

    How can i improve the accuracy? I want to add things like holydays, days where people get their salary and other factors. Is this possible with xgb?

  • @nara.titan28
    @nara.titan28 Год назад

    Hello!!! Where is the case 2 with the prophet model? Can you share me the video :)

  • @user-kf9tp2qv9j
    @user-kf9tp2qv9j 3 года назад +3

    can you explain how a xgboost model makes predictions later?

  • @ccuuttww
    @ccuuttww 3 года назад +2

    Hey can we do cross validate in this case?

    • @MrDanituga
      @MrDanituga 3 года назад

      You can but you need to be careful to not include past data in the validation set. You should divide it having time in consideration

  • @pratiknarkhede1287
    @pratiknarkhede1287 2 года назад +1

    We didn't predict for next week's ,you just tested your predictions on test data.
    If I want to predict next week's orders how can I do that ?

    • @CodeEmporium
      @CodeEmporium  2 года назад +1

      Nice question. You get the number of dales that happened 7 days ago till today, 90 days ago till today and pass through the model to get next week's order count projection

  • @sabinkhdk
    @sabinkhdk 3 года назад +1

    Great video. How do we use this model to forecast beyond 2019-12-06?

    • @shivangiraj9822
      @shivangiraj9822 2 года назад

      Yes same question

    • @MrDjRoKoLoKo
      @MrDjRoKoLoKo 2 года назад

      One solution is to use your forecasted values, as inputs (Xs) for the next batch of forecasts and so on. That being said, predictions have errors, so this will yield a noisy and likely inaccurate forecast if this extrapolated for a lot data points into the future.

  • @sooryaprakash6390
    @sooryaprakash6390 2 года назад

    I don't think this can be used for multi-step forecasting. Am I right or is it possible?

    • @CodeEmporium
      @CodeEmporium  2 года назад +1

      You're gonna have to add a categorical variable that signifies the number of days out you want to forecast. So it doesn't do multistep forecasts in the traditional sense, but you can hack your way around it

    • @siddhant17khare
      @siddhant17khare 2 года назад

      @@CodeEmporium So does it mean, there will be a 3rd feature in the training dataset : No. of days to forecast ?
      Can you please elaborate a bit more on the hack you mentioned here

  • @adityarajora7219
    @adityarajora7219 2 года назад

    What do you do for living?

    • @CodeEmporium
      @CodeEmporium  2 года назад

      I'm just your friendly neighborhood Data Scientist :)

    • @adityarajora7219
      @adityarajora7219 2 года назад

      @@CodeEmporium haha : ), still I wanna know [-_-]

  • @factsschoolofficial
    @factsschoolofficial 2 года назад

    Dude how to extrapolate the future data?

    • @CodeEmporium
      @CodeEmporium  2 года назад +1

      If you have features like in the video with "num orders last 7/30 days". Then you compute the number orders there were in the past 7 days ago from today. Also compute the number of orders in the last 30 days (also from today). These are the features you pass into the model to get results for the future.

    • @siddhant17khare
      @siddhant17khare 2 года назад

      @@CodeEmporium Thanks for the explanation. However, this will only help us predict orders on t+1 day, right?
      How do we do it for t+2, t+3 ...t+n days?
      Does it mean, there will be 1 model each for each time step ?

    • @MD-uy5bo
      @MD-uy5bo Год назад

      @@siddhant17khare we have to use walk forward validation so that forecasts value get added to training datasets for further forecasts

  • @philwebb59
    @philwebb59 2 года назад

    You need to increase your volume. The commercials are much, much louder than your video.

  • @midnight6371
    @midnight6371 2 года назад

    0:00 hi sentdex

  • @WonPeace94
    @WonPeace94 Год назад

    please talk next time without this strong accent /s