Are Transformers Effective for Time Series Forecasting? Machine Learning Made Simple

Поделиться
HTML-код
  • Опубликовано: 25 окт 2024
  • For a more in-depth look into the research, read the following- artificialinte...
    Get 20% off for 1 year- codingintervie...
    Models like ChatGPT can predict the next number of a sequence. But can they do this well? You might think so, because Transformers were created to handle sequences of data, without forgetting the older tokens. This makes them a natural match for historical data, right?
    You'd be making a huge mistake. Researchers compared the performance of DLinear- a very simple Linear Model- to various Transformers. The results did not look pretty for the latter.
    The document below goes over their research in more detail and explains why Transformers fall apart when it comes to Time Series Forecasting.
    This video is part of my series Machine Learning Made Simple, where I break down various machine learning and artificial intelligence concepts in a simple manner. The goal of the series to explain the concepts behind the various ideas, so that you can have a theoretical understanding. This will help you take these ideas and implement them in different situations.
    Paper Details
    Are Transformers Effective for Time Series Forecasting?
    Recently, there has been a surge of Transformer-based solutions for the time series forecasting (TSF) task, especially for the challenging long-term TSF problem. Transformer architecture relies on self-attention mechanisms to effectively extract the semantic correlations between paired elements in a long sequence, which is permutation-invariant and anti-ordering to some extent. However, in time series modeling, we are to extract the temporal relations among an ordering set of continuous points. Consequently, whether Transformer-based techniques are the right solutions for long-term time series forecasting is an interesting problem to investigate, despite the performance improvements shown in these studies. In this work, we question the validity of Transformer-based TSF solutions. In their experiments, the compared (non-Transformer) baselines are mainly autoregressive forecasting solutions, which usually have a poor long-term prediction capability due to inevitable error accumulation effects. In contrast, we use an embarrassingly simple architecture named DLinear that conducts direct multi-step (DMS) forecasting for comparison. DLinear decomposes the time series into a trend and a remainder series and employs two one-layer linear networks to model these two series for the forecasting task. Surprisingly, it outperforms existing complex Transformer-based models in most cases by a large margin. Therefore, we conclude that the relatively higher long-term forecasting accuracy of Transformer-based TSF solutions shown in existing works has little to do with the temporal relation extraction capabilities of the Transformer architecture. Instead, it is mainly due to the non-autoregressive DMS forecasting strategy used in them. We hope this study also advocates revisiting the validity of Transformer-based solutions for other time series analysis tasks (e.g., anomaly detection) in the future.
    I know my Videos are very Bootleg. But you know what’s not Bootleg (Tech Made Simple)
    Use the discount 20% off for 1 year-
    Using this discount will drop the prices-
    800 INR (10 USD) → 640 INR (8 USD) per Month
    8000 INR (100 USD) → 6400INR (80 USD) per year (533 INR /month)
    Get 20% off for 1 year- codingintervie...
    If any of you would like to work on this topic, feel free to reach out to me. If you’re looking for AI Consultancy, Software Engineering implementation, or more- my company, SVAM, helps clients in many ways: application development, strategy consulting, and staffing. Feel free to reach out and share your needs, and we can work something out.
    That is it for this piece. I appreciate your time. As always, if you’re interested in working with me or checking out my other work, my links will be at the end of this email/post. If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.
    Reach out to me
    Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
    Small Snippets about Tech, AI and Machine Learning over here
    AI Newsletter- artificialinte...
    My grandma’s favorite Tech Newsletter- codingintervie...
    Check out my other articles on Medium. : rb.gy/zn1aiu
    My RUclips: rb.gy/88iwdd
    Reach out to me on LinkedIn. Let’s connect: rb.gy/m5ok2y
    My Instagram: rb.gy/gmvuy9
    My Twitter: / machine01776819

Комментарии • 7

  • @axe863
    @axe863 11 месяцев назад

    Financial Time Series have really complicated structure. Only under extreme financial fragility, is there an increase in predictability.

    • @ChocolateMilkCultLeader
      @ChocolateMilkCultLeader  11 месяцев назад

      Great point

    • @axe863
      @axe863 10 месяцев назад

      @DditsMas That is true. I also should add manias with a caveat. Also... theres time varying composition of distinct agents with varying time horizons ; risk attitudes/risk classifications ; styles etc. Bouts of predictability occur on during financial distress and manias because the mechanisms that creates "destructive nonstationarity" no longer exist and paths/dynamics become more certain due to time varying financial constraints on the critical agents to enforce "EM" results

    • @axe863
      @axe863 10 месяцев назад

      @DditsMas No. A Quant Developer/Engineer. Maybe in a few years I'll try to improve on it and make it real time. Let's say I have a V0 product.. its years away

    • @axe863
      @axe863 10 месяцев назад

      @DditsMas The best thing I've seen in the quant fund space without the need for (risky) predictive fin time serie modeling are mixes of portfolio diversification with risk scaling (volatility scaling).

    • @ChocolateMilkCultLeader
      @ChocolateMilkCultLeader  10 месяцев назад

      @@DditsMas that's true for a lot of data. Any data that we choose to evaluate will only be a limited representation. That's why it's important to have a lot of diversity in your data collection - it gives you a more complete overview of things