Multiple Time Series modeling using Apache Spark and Facebook Prophet
HTML-код
- Опубликовано: 18 сен 2024
- #datascience #machinelearning #timeseries
This video is part of Time Series playlist here - • Time Series Modelling ...
One major challenge with time series in real world is dealing with multiple time series, Be it retailers who have millions of product and every product having different sales cycle or manufacturing industry dealing with hundreds of machinery. In such cases we need systems and solution that can help distribute time series model building across distributed nodes to enable high parallelism. In this video we will see how we can use facebook prophet to model and Apache Spark to distribute across multiple nodes
I saw your previous video about Prophet, this version is insane. It ran in a matter of mili-seconds! 😃
You are a gem !
Returning back a year later. Thank you
Hello, this is a great tutorial where you have multiple products and one model. Could you please do a tutorial of multiple products testing multiple models?
Thanks
Thank you very much for this video !
Thanks for the video first of all.
I would like to ask if I am using such models for timeseries analysis for financial data, how to constantly update the data and retrain the model to avoid data drift.
Retraining in case of this is like rebuilding new models as incremental training might not work. So if we see drift we take old data and add new and train it again
Thank you so much for this video.
@AIEngineering, can you tell me, how to run multiple time series using SARIMAX or XGBoost with Pyspark? Can you please also recommend any literature about multiple time series forecasting ?
For XGBoost you might anyway convert to rows for each time series and use it. You can check my pyspark-xgboost video to model ML which is same since we are converting multi time series to multiple observations to model
For SARIMAX you can follow same approach as this video but instead of facebook you need to use statsmodel or any package you are using to model SARIMAX
@AIEngineering, Thanks a lot for the video, it will help me in my current project.
Can we save these models and use them for prediction in spark itself? I will be trying it anyway, but your view on it would be much appreciated.
Yes you can. You can change the function that creates prophet model to load different models and inference on it
Hi Srivatsan,
I noticed one small issue in pandas udf code. We want to do sort values based on date column inside pandasudf as we do groupby. When we do groupby and apply pandasudf function, it will jumble the order of data insted of sequence data per ts.
Vigneshwar.. Facebook prophet takes date as a column and it orders it internally before fitting the model. So in this case it will not be a issue but if we are using model that requires sorting then yes we might have to order by internally and feed it. Are you seeing any issue and it is not getting sorted?
@@AIEngineeringLife I haven't used fb prophet. But other stats model requires data to be sorted. Is there any way to validate or documentation available data is sorted while fitting fb prophet?
@@vigneshwart2203 You can check this issue tracker response in FB prophet git repo - github.com/facebook/prophet/issues/1412
how to get the trained model from spark tasks so we can predict later without need new training?
Hello Sir I am working on Solar Energy Time Series(5 min granulity) where I have these night time values of energy as zero's. The values only appear during day time. I applied Facebook prophet models on such data the results are not getting better.If I remove those values prophet still won't give satisfactory results. Do you recommend any good time series model for such data?
Can you tell me what result you got after removing the events. Did u create future dataframe only for the time events were available during training the model?.
@@AIEngineeringLifeRemoval of night time values were for both test and train.In future data frame the time periods streched out to some weird points(I defined the period and frequency as given for test).
Is the data available in open domain that I can try?.. Do you see any pattern in data that varies by other factors like temp.. If it is constant or like white noise it might not be easy to model
where can i found this notebook to download ?
Here - github.com/srivatsan88/End-to-End-Time-Series
@@AIEngineeringLife thx!