Time Series Analysis using Python| ARIMA & SARIMAX Model Implementation | Stationarity Handling

Поделиться
HTML-код
  • Опубликовано: 27 авг 2024

Комментарии • 97

  • @iustinatorul7579
    @iustinatorul7579 Год назад +11

    One of the best ARIMA implementation tutorials I have seen. I’m a bit frustrated I found it after I had used ARIMA for a project. I can’t even tell you how much time I had wasted going online and on forums, trying to understand how it works.
    But hey, now that I learned it the hard way it better be sticking. 😂
    Appreciate it!

  • @Beanzmai
    @Beanzmai 6 дней назад +1

    Incredible video, thank you! I kept trying to train my model with the Differenced data and was not getting good results but I caught my error because of this video.

  • @rajaganesh3462
    @rajaganesh3462 Год назад +5

    I have come across many blogs and videos to understand the time series process, but I didn't get a clear picture. However, this video gave me a clear understanding of the process. Really great work! Much appreciated.

  • @cvrbcheppali8214
    @cvrbcheppali8214 9 месяцев назад +4

    This is one of the best video on Timeseries in youtube .Well Explained.Content is very nice.

    • @learnerea
      @learnerea  8 месяцев назад

      Glad you liked it

  • @fayezullah655
    @fayezullah655 4 месяца назад +1

    one of the best video i have ever seen base on the time series in yt. Thanks for making it.

    • @learnerea
      @learnerea  4 месяца назад

      Glad it was helpful

  • @rishabhpandey3609
    @rishabhpandey3609 4 месяца назад +1

    Its really a crazy explanation. I would recommend this in my org, Jio. Keep it up man. God bless you!

    • @learnerea
      @learnerea  4 месяца назад

      Thank you, I will

  • @oladayoojekunle1732
    @oladayoojekunle1732 11 месяцев назад +3

    You really did justice to this topic. Very well done!

    • @learnerea
      @learnerea  11 месяцев назад

      thank you very much

    • @oladayoojekunle1732
      @oladayoojekunle1732 11 месяцев назад

      @@learnerea. Please, can you make a video on how to use the transformed data especially one gotten using log, sqrt and shift. I have been trying to figure that out. The area that got me confused is how to transform the data back to the original format. Thank you

  • @erinbai8510
    @erinbai8510 21 день назад

    Thanks for the video. There is a mior mistake in ADF I noticed is that you cannot accept the null hypothesis and you can only reject the null hypothesis.

  • @pepsibrandambassador
    @pepsibrandambassador 4 месяца назад +1

    you are great! helped me with my project last minute thanks for the video!!

    • @learnerea
      @learnerea  4 месяца назад

      Glad I could help!

  • @nothing_to_love
    @nothing_to_love Месяц назад +1

    Thanks for this amazing VDO!!!

    • @learnerea
      @learnerea  17 дней назад

      Glad it was helpful!!

  • @user-mz2fd1dr9g
    @user-mz2fd1dr9g Год назад +1

    Thank you so much for this vedio, studying since last 3 years, taken some expensive courses, this is the best explanation, kept me motivated to explore and learn throughout the vedio...let us know how we can support you to make more learning vedio thanks.

    • @learnerea
      @learnerea  Год назад

      You are most welcome, and I'm glad that it was helpful..
      keep watching

  • @surendrabera2878
    @surendrabera2878 5 месяцев назад +2

    Your content is too good. I am not able to understand why yiu have such a low views on this video. One suggesgion please make the thumnail little bit eye catchy.

  • @melainetape
    @melainetape 5 месяцев назад +2

    So informative. I do not see a relation between the transformation (Log&Sqrt&Shift) which makes the data stationary and the ARIMA model you build. I'm so confused at this step. I tried with my data and noted that the ShiftDiff transformation makes my data stationary but when it comes to building the model, it does not fit well. Thank in advance.

  • @borisgisagara
    @borisgisagara 4 месяца назад +1

    as you said you were trying to keep it to the beginner's level that's why it's kind of more understandable to the smallest degree possible, except you just got it wrong about the model, it's not ARIMA model that is working bad, it's you trying to predict a whole range of values with the same training data. it means, it'd work well on the first few values but not for all. you have to use the walk forward variation, that is basically to update you training set each time you predict a new value, Thats my idea.
    and thank you for the good video.

  • @erinbai8510
    @erinbai8510 21 день назад

    I have a doubt... at 54 mins when you are using ARIMA model and you started with the original data. Then why did you transform the data to stationary data since you used the original data instead?Thank you so much.

  • @queenx3572
    @queenx3572 5 месяцев назад +1

    If you use the time shift method, d will be the interval for the shift. What happens if you use any other method like the log or square root? What will d be?

  • @julianatorressanchez5250
    @julianatorressanchez5250 3 месяца назад +1

    You are amazing. I love the way you explain. Can you do the same for multidimensional data sets?

  • @madhuripatel5250
    @madhuripatel5250 6 месяцев назад +1

    if I am dealing with time series data with hourly frequency data collected for 2 years. What should I take as lag (shift) value.

  • @vanikmalhotra6586
    @vanikmalhotra6586 4 месяца назад

    Basic Question...Why did we run the model on original set and towards the end you mentioned on running model on altered data set basically diff/square root ?

  • @user-xp5bx1tw7x
    @user-xp5bx1tw7x 8 месяцев назад +2

    Hi, Content is very good and very well explained. thanks for sharing it. Can you please help me understand that we have tried to identify the stationarity but did not use it in modelling. and even identifying the stationarity was not concluded. we did not get desired results.

    • @learnerea
      @learnerea  8 месяцев назад

      Thank you very much for watching it. Yes, that was primarily because it was a beginner level and hence we did not want to spend a lot of time in reverting it back. Certainly we will make another one where we conclude and utilize the stationarity.

  • @user-ur3iz4em1c
    @user-ur3iz4em1c 6 месяцев назад +1

    great tutor thanks for the video ❤❤

    • @learnerea
      @learnerea  6 месяцев назад

      Glad you liked it!

  • @saurabharbal2684
    @saurabharbal2684 Год назад +1

    Hello sir,
    I don't know whats your mistake
    But i got desired results using arima model at time 1:13;45
    Instead of the line at the bottom i got desired results.
    And I followed all things teached by you.

  • @razinust2579
    @razinust2579 3 месяца назад

    brother your work is extremely helpful ,brother i looked for the rolling statistics video link but couldn't find it please share it then thanks in anticipation

  • @sellamimohamedkhaled4527
    @sellamimohamedkhaled4527 Год назад +1

    really good work👌, keep it up

    • @learnerea
      @learnerea  11 месяцев назад

      Thanks a lot 😊

  • @timetraveller7513
    @timetraveller7513 7 месяцев назад +1

    Can't thank you enough 🙏

    • @learnerea
      @learnerea  7 месяцев назад +1

      Glad it was helpful

  • @Vizia219
    @Vizia219 6 месяцев назад

    Hi, I was using your tutorial to learn how to implement ARIMA models. I then went about and implemented my own with some of my own data that I'm using for a school project. However, while my model fit my data very well, my forecasts are flat and they're strange. Could you help me in any way?

  • @Shiva-zn4nz
    @Shiva-zn4nz Год назад +1

    This was so informative. Thank you a bunch! I understood time series. Do you have similar videos for regressions? Thank you!
    Subscribed

    • @learnerea
      @learnerea  Год назад +1

      Glad it was helpful. the below one is on linear regression -
      ruclips.net/video/IigoyVON0eM/видео.html
      here is a problem we solved using the regression and other best fit models -
      ruclips.net/video/2YAheiIHNzI/видео.html
      I recommend you to have a look at the whole datascience playlist -
      ruclips.net/p/PL4GjoPPG4VqOmyh7hQ730evtLaz04LwSf

    • @Shiva-zn4nz
      @Shiva-zn4nz Год назад

      @@learnerea Thank you so much. Love you guys!

  • @PriyeshM-yj8wi
    @PriyeshM-yj8wi 4 месяца назад

    i have sales data consisting of time period and other features including different schemes as features, almost 7-8 those are active on some months so basically they are categorial variables containing 0 or 1. Should i go ahead with Armia for forecasting, if yes then how to consider those categorical variable

  • @thegroup3261
    @thegroup3261 8 месяцев назад

    the best tutorials bro

    • @learnerea
      @learnerea  7 месяцев назад

      Glad it was helpful

  • @ismailhosni7760
    @ismailhosni7760 Год назад +1

    Hellow Dr thanks a lot for sharing the information and teanch us .
    I have a little question with your permission
    the question is : if we estimate our model "ARIMA" and found that there is autocorolation between the riseduals the the model ...... how can we fix this problem ?
    thanks again 🤗🙏🙏🧡❤

    • @learnerea
      @learnerea  Год назад +1

      There are several potential approaches you can take if you find autocorrelation in the residuals of your ARIMA model. Here are a few options you could consider:
      Adding additional AR or MA terms to the model: If the autocorrelation is due to a pattern that has not been captured by the current model, adding additional terms may help to capture this pattern and improve model performance.
      Differencing the data: If the autocorrelation is due to a trend in the data, differencing the data may help to remove this trend and improve model performance.
      Using a different model: If the ARIMA model is not suitable for the data, you may need to consider using a different model altogether. For example, a seasonal ARIMA (SARIMA) model may be more appropriate for data with seasonal patterns.
      Modeling the residuals: If none of the above approaches work, you can try modeling the residuals as a separate time series. This can help to capture any remaining patterns in the data that are not accounted for by the primary model.

    • @ismailhosni7760
      @ismailhosni7760 Год назад

      @@learnerea 🥰🥰🥰🥰🥰❤❤🧡💛 thanks a lot 🙏🙏

  • @arnabmodak3377
    @arnabmodak3377 6 месяцев назад

    ARIMA Model Building starts here: 56:47

  • @sanjaisrao484
    @sanjaisrao484 4 месяца назад +1

    thanks

  • @scientensity
    @scientensity Год назад +1

    In a sarima model while doing an analysis i found that for d=0,D=1(as i did seasonal differencing one and no non-seasonal differencing) prediction is fitting whole data except initial 22 values(predicting almost 0 values for initial 22 values) which is the seasonality of my data.
    can you explain why is this happening?
    I hope you got my question

    • @learnerea
      @learnerea  11 месяцев назад

      Assuming you are using the same data as in video, please share your code at learnerea.edu@gmail.com so that we could have a view.. and guide you more specifically.. include the data as well if it's different from the video

  • @Cs11-CanhNau
    @Cs11-CanhNau 4 месяца назад

    The original data series is not a stationary series yet, I see you have done some way to convert it to a stationary series. But why do you use the initial data when training the model when it is not a stationary sequence?

    • @NutritionandMetabolism-uq5kf
      @NutritionandMetabolism-uq5kf 3 месяца назад

      I have the same query as well. I can understand the section on checking on stationarity, but I don't see how that's getting incorporated into the subsequent training and model fitting. If the original dataset can be used for training rather than the transformed dataset, what's the use of determining if the data is stationary or not? Did I miss something ? Otherwise, excellent video, clearly explained. Would be interested to see videos on Time Series Analysis using other models such as XGBoost, Prophet. Thank you sir.

  • @MrDevnandan
    @MrDevnandan 4 месяца назад

    Did you mistakenly plot the PACF of airP['arimaPred'] at time stamp - 1:15:52 ?
    I am not sure why you would plot PACF of predicted values. 😕

    • @srinivasreddy8134
      @srinivasreddy8134 4 месяца назад

      For that airP['12diff'] we have to take, as it is seasonal difference

  • @2380raj
    @2380raj 7 месяцев назад +1

    👌

  • @mattsamelson4975
    @mattsamelson4975 8 месяцев назад

    I have a situation where I can make reasonable training and predictions with the original (non-stationary) data. When I transform the data, I am able to successfully make it stationary BUT it loses all autocorrelation so predictions are junk. Have you ever seen this? I have found some things on line that says this is possible but it depends very much on the characteristics of the time series.

    • @learnerea
      @learnerea  8 месяцев назад

      Yes, the situation you're describing is not uncommon in time series analysis, and it's often a delicate balance to strike between achieving stationarity and preserving important characteristics like autocorrelation.
      When you difference or transform a time series to achieve stationarity, you are essentially altering the original data to make it more amenable to modeling. However, as you've observed, too aggressive a transformation can result in the loss of autocorrelation, which is crucial for capturing temporal dependencies in the data.
      Here are a few considerations and potential approaches to handle this situation:
      Selective Transformation:
      Instead of applying a uniform transformation to the entire time series, consider selectively applying transformations to specific components. For example, you might difference the data only where it's necessary or apply different transformations to different seasonal components.
      Partial Transformation:
      Rather than making the entire time series stationary, consider transforming only certain parts of it. For instance, you might apply differencing or another transformation to the trend component while leaving the seasonal component untouched.
      Different Models for Different Components:
      If your time series exhibits both trend and seasonality, you might consider using models that can handle each component separately. Seasonal decomposition of time series (STL) is one such approach where the time series is decomposed into trend, seasonal, and residual components, and each can be modeled independently.
      Advanced Models:
      Explore advanced models that can handle non-stationary data more effectively. Long Short-Term Memory (LSTM) networks and other recurrent neural networks (RNNs) are known for their ability to capture temporal dependencies in data.
      Ensemble Approaches:
      Combine predictions from models trained on the original data and models trained on the transformed data. Ensemble methods can sometimes capture the strengths of different models.
      Grid Search and Cross-Validation:
      Systematically experiment with different combinations of transformations and models. Use grid search and cross-validation to evaluate the performance of various configurations and find the optimal solution.
      It's worth noting that the ideal approach can vary depending on the specific characteristics of your time series data. Experimentation and a deep understanding of the data's behavior are key. If possible, consider consulting with domain experts or seeking feedback from colleagues who have experience with similar time series patterns.
      Remember that achieving stationarity is a means to an end (better model performance), and the goal is to strike a balance that preserves the essential characteristics of the data while making it amenable to modeling.

    • @mattsamelson4975
      @mattsamelson4975 8 месяцев назад

      Thanks for your detailed reply. How do you conduct a partial transformation? for example, do I difference only a section of the source data that I’m training the model on? How would I even then reverse transform predictions?

  • @user-rz2zl8iz3v
    @user-rz2zl8iz3v Год назад +1

    great

    • @learnerea
      @learnerea  Год назад

      thank you very much for watching

  • @esranurgunay1776
    @esranurgunay1776 Год назад +1

    Hello sir, in the 35:35 , ai didnt get the same result with you when i execute the line of df.head()

    • @learnerea
      @learnerea  Год назад +1

      >> You may like to revisit the code, you have created
      >> You can put the code here as well, we will analyze the diff. and can help

  • @siddharthakar9369
    @siddharthakar9369 3 месяца назад

    Where is the dataset ?

  • @esranurgunay1776
    @esranurgunay1776 Год назад +1

    if we were not use the stationarity stuffs, why we calculated them?

    • @learnerea
      @learnerea  Год назад +1

      Being the Data Scientist, you gotta explore all the posibbilities..
      as explained in the video as well... the decision was taken basis on analysis where it was observed that it won't perform better comparatively and it has also been suggested, that we will try making another video where we utilize the stationary data to see the how it performs..
      As a learner, your question make sense.. keep asking the questions for clarity

    • @user-gp8ww1xf3e
      @user-gp8ww1xf3e 8 месяцев назад

      i was wondering the same

  • @user-me1gh3ki4n
    @user-me1gh3ki4n 11 месяцев назад +1

    What does diff(12) mean

    • @learnerea
      @learnerea  11 месяцев назад

      diff computes the difference of a set of values, essentially subtracting each value from the subsequent value in an array or list, if can provide the timestamp here, will be able to give you the specific guidence

  • @user-xn7lm9to1y
    @user-xn7lm9to1y 10 месяцев назад

    Suppose month attribute is missing you only have year attribute in that case how can u make data stationary,can you explain please I mean u only have year and passenger attribute in that case how to make the data stationary.Please reply

    • @learnerea
      @learnerea  10 месяцев назад

      Stationarity can be on year basis as well..
      When you're dealing with time series data that only has a yearly frequency, the approach to making the data stationary is similar to what you'd do with more frequent data, but with some specifics to consider.
      Visualizing the Data:
      Start by plotting the data. This will give you an idea of the overall trend, seasonality, and variance. Since the data is yearly, you might not observe any distinct seasonality.
      python code -
      import matplotlib.pyplot as plt
      plt.plot(year, passenger)
      plt.xlabel('Year')
      plt.ylabel('Passenger')
      plt.title('Yearly Passenger Count')
      plt.show()
      Differencing:
      A common approach to making time series data stationary is by differencing the data. Differencing helps to remove trends in the data. You subtract the previous year's observation from the current year's observation.
      python code-
      passenger_diff = passenger.diff().dropna()
      After differencing, plot the data again to see if it appears more stationary.
      Checking for Stationarity:
      The Augmented Dickey-Fuller test is commonly used to check the stationarity of a time series.
      python code -
      from statsmodels.tsa.stattools import adfuller
      result = adfuller(passenger_diff)
      print('ADF Statistic:', result[0])
      print('p-value:', result[1])
      A low p-value (typically ≤ 0.05) indicates that the time series is stationary.
      Transformations:
      If differencing isn't enough, consider other transformations like:
      Log transformation: To stabilize variance.
      python code
      import numpy as np
      passenger_log = np.log(passenger)
      Rolling means: To smooth out short-term fluctuations and highlight longer-term trends.
      python code -
      rolling_mean = passenger.rolling(window=5).mean() # 5-year window as an example
      passenger_detrended = passenger - rolling_mean
      passenger_detrended.dropna(inplace=True)
      Decomposition:
      Even though the data is yearly, if you suspect any seasonality or a strong trend, you can use decomposition. The Seasonal Decomposition of Time Series (STL) from the statsmodels library can be useful.
      python code -
      from statsmodels.tsa.seasonal import STL
      stl = STL(passenger, seasonal=13)
      result = stl.fit()
      detrended = result.trend
      deseasonalized = result.seasonal
      You can then work with the residuals from the decomposition process, which should ideally be stationary.

  • @anghulingalolop3630
    @anghulingalolop3630 7 месяцев назад

    can you the forecast this?

  • @user-lh4wg2zm4z
    @user-lh4wg2zm4z Год назад

    Hi, you did not upload a video where stationery data was used.

  • @abhilashpatel1361
    @abhilashpatel1361 Год назад

    Hi can you plz help me to understand why lag for pacf is 20

    • @learnerea
      @learnerea  Год назад

      It will be great if you can share the time stamp where you spot this point

  • @user-mt2wx9ir8i
    @user-mt2wx9ir8i 5 месяцев назад

    my data is the form of year, week

  • @amazonamazon6510
    @amazonamazon6510 10 месяцев назад

    How to approach forecasting with he lockdown data?

    • @learnerea
      @learnerea  9 месяцев назад

      That's an excellent problem statement to choose, little bit of more detail which you might have provided is -
      >> what sort of model you want to develop
      >> what is the main purpose/scope of the model etc.
      lets assume that you want to build a credit risk model and the data which you are taking under consideration, includes the covid period as well. (Before I start, make sure that the data is in relatively balanced quantity & period). Below are the approach which you can undertake -
      Data Collection:
      Gather historical credit risk data, including loan performance, defaults, delinquencies, and relevant economic indicators.
      Include data specific to the COVID-19 period, such as unemployment rates, government stimulus programs, and financial relief measures.
      Data Preprocessing:
      Clean and preprocess the data by addressing missing values, outliers, and data inconsistencies.
      Create relevant features, such as lagged values of credit risk indicators and economic variables, to capture time dependencies.
      Exploratory Data Analysis (EDA):
      Perform EDA to understand the data's characteristics and relationships.
      Explore trends, seasonality, and patterns, paying specific attention to changes during the COVID-19 period.
      Define the Target Variable:
      Define the credit risk metric you want to predict, such as default probability or loan delinquency.
      Feature Selection:
      Identify relevant features that may influence credit risk. This includes economic indicators, loan characteristics, borrower information, and external factors.
      Time Series Decomposition:
      Decompose the time series data to understand underlying trends, seasonality, and residuals, considering the effects of COVID-19.
      Create a Historical Train-Test Split:
      Split the data into training and testing sets, ensuring that the testing set includes the COVID-19 period.
      Model Selection:
      Choose a suitable forecasting model. In this case, time series models like ARIMA, SARIMA, or Prophet may be appropriate.
      Consider using machine learning models like Gradient Boosting, Random Forest, or LSTM if you have sufficient data.
      Model Training:
      Train the selected model on the historical data, excluding the testing period.
      Model Validation:
      Evaluate the model's performance using the testing data, specifically during the COVID-19 period.
      Use appropriate evaluation metrics, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or classification metrics for binary outcomes.
      Model Interpretation:
      Interpret the model's predictions to understand which factors contribute to credit risk during the COVID-19 period.
      Feature Importance:
      Analyze feature importance to identify key drivers of credit risk during the pandemic.
      Model Refinement:
      Fine-tune the model and hyperparameters if the initial model's performance is suboptimal.
      Scenario Analysis:
      Conduct scenario analysis to assess credit risk under different economic conditions related to COVID-19, such as varying levels of unemployment or government interventions.
      Model Deployment:
      Deploy the trained model for ongoing credit risk assessment and predictions.
      Monitoring and Feedback Loop:
      Continuously monitor the model's performance and retrain it as new data becomes available.
      Regulatory Compliance:
      Ensure that your credit risk model complies with regulatory requirements and standards relevant to your industry.
      Documentation:
      Document the entire modeling process, including data sources, preprocessing steps, model selection, and evaluation metrics.
      Keep in mind that the unique challenges posed by the COVID-19 pandemic may require you to adapt your model and data sources to reflect changing economic conditions and government policies. Regularly update and refine your credit risk prediction model to account for these dynamics.

  • @user-gp8ww1xf3e
    @user-gp8ww1xf3e 8 месяцев назад

    Hi, i cannot find the data set, could you help me please! =D

    • @learnerea
      @learnerea  8 месяцев назад

      the dataset is part of seaborn library.. you can just run the code -
      import seaborn as sns
      df = sns.load_dataset('flights')
      you can also download the notebook github link provided in the description

  • @Devra380
    @Devra380 Год назад

    But sir the new statsmodels seems to have different functions

    • @learnerea
      @learnerea  Год назад

      you can mention the function name which has been used in the video from statsmodel but you do not find them in the model now..
      we will try to find and help you with closest alternative function if that doesn't exist

    • @Devra380
      @Devra380 Год назад

      ​@@learnerea​​@learnerea can you make a new video on implementation of arima.. On share market dataset or weather dataset

  • @saniyashahin-zp6oz
    @saniyashahin-zp6oz 9 месяцев назад

    share your python notebook sir @Learnerea

    • @learnerea
      @learnerea  9 месяцев назад

      Here you go -
      github.com/LEARNEREA/Data_Science/blob/main/Scripts/time_series_air_passengers.py

  • @meronika1400
    @meronika1400 Год назад

    Can you share this jupyter notebook with me?
    via mail

    • @learnerea
      @learnerea  Год назад +1

      Hi Meronika,
      you can find that using -
      file name - time_series_air_passengers.py
      url - github.com/LEARNEREA/Data_Science/tree/main/Scripts

  • @micahdelaurentis6551
    @micahdelaurentis6551 3 месяца назад

    the D parameter is the number of differences you take on your data, which is not what you said. This is as basic as it gets man, come on