First Pass => D (All of the Above) Second Pass => B (Reduces computational complexity by focusing on subset of queries) Third Pas =D (10 for original transformer; 6 for informer)
Thank you so much for this clear explanation! I just started my journey in data science, and the papers are a bit demanding for me. This video makes my life much easier!
Fascinating. I’m considering different model architectures. I’d be interested in hearing about what advantages transformer-based architecture offer for time series forecasting vs other architectures. I understand what informer offers vs traditional transformer.
Beautiful ! I think the answers are: D, B, D And I'll do more research cuz I don't understand how the network is able to adjust the output according to the input Thank you sir
11:10 i thought the informer generates an output for each input which would be the size of the input window given to the encoder but in the graphic it looks like the orange colored blocks are the outputs which is less. is this because the subset of inputs given to the decoder are ground truth tokens so the decoder does not have to predict them? this would imply a scenario where the input and output domain are the same.
Good video! Well explained. In real life though a particular time series will correlate with itself and depend on other time series. Any way to take this into account to improve predictions?
According to the “experiments” section of the paper, it certainly looks like this architecture has the best performance compared to some models (including different transformer architectures)
Ok. It's all interesting. But how can I use it when time-series data are received in real-time? I can not batch process, only one by one. I tried to make some kind of buffering to collect several items and then process them all together. But I didn't succeed in this, because I couldn't incorporate it in common libraries used for neural networks
during real time inference, the model will typically be deployed as a part of a service. we get a request, pass this as a "batch size 1", get an output, and return the response.
Provide answers to your quizzes at the end. It's really irritating to see questions unanswered. How would someone verify it. Also, please stop saying "Quiz time"
First Pass => D (All of the Above)
Second Pass => B (Reduces computational complexity by focusing on subset of queries)
Third Pas =D (10 for original transformer; 6 for informer)
simply amazing , very well explained .
Thank you so much for this clear explanation! I just started my journey in data science, and the papers are a bit demanding for me. This video makes my life much easier!
This is THE BEST explanation I have seen. Great Work.
Thank you for explaining papers related to time-series. Would love to see your videos more on time series!!
Coming up soon
Fascinating. I’m considering different model architectures. I’d be interested in hearing about what advantages transformer-based architecture offer for time series forecasting vs other architectures. I understand what informer offers vs traditional transformer.
Beautiful !
I think the answers are: D, B, D
And I'll do more research cuz I don't understand how the network is able to adjust the output according to the input
Thank you sir
Ding ding ding. You got full points in quiz time!
And yea ~ glad this sparked more curiosity in you for further research
@@CodeEmporium Got only one correct. The last one learning and computational complexity :)
As always, great video, looking forward to next video on the code...
This is interesting. Eagerly looking forward to next episodes ❤
This is interesting
The quizzes are a great idea
This is Great! Keep'em comin!
good staff, keep up
11:10 i thought the informer generates an output for each input which would be the size of the input window given to the encoder but in the graphic it looks like the orange colored blocks are the outputs which is less. is this because the subset of inputs given to the decoder are ground truth tokens so the decoder does not have to predict them? this would imply a scenario where the input and output domain are the same.
The video I was just looking for
Super glad! Thanks for watching
Good video! Well explained. In real life though a particular time series will correlate with itself and depend on other time series. Any way to take this into account to improve predictions?
so it means it makes the process faster by prob sparse attention , distillation and generative inference but does it improve the accuracy also ?
According to the “experiments” section of the paper, it certainly looks like this architecture has the best performance compared to some models (including different transformer architectures)
studies, fitness, trading
I am just thinking about it you just made it..hope you are not reading my mind😄
I just might be :)
Would historical nutritional data count?
❤
Ok. It's all interesting. But how can I use it when time-series data are received in real-time? I can not batch process, only one by one. I tried to make some kind of buffering to collect several items and then process them all together. But I didn't succeed in this, because I couldn't incorporate it in common libraries used for neural networks
during real time inference, the model will typically be deployed as a part of a service. we get a request, pass this as a "batch size 1", get an output, and return the response.
How can someone get in Touch with you
?
Can you please blow up the Llama/Llama 2 architecture and code for us? Eagerly waiting for your LLM videos.
Yep! That’s definitely a future playlist idea
@@CodeEmporium Awesome. Thanks
A
DBD
Honestly I can't think of any context where I use historical data to inform my decisions other than financial.
Yea. Finance does seem like the bigger and obvious one to me too
Answer: D ?
For quiz 1, yes - it was all of them :)
Provide answers to your quizzes at the end. It's really irritating to see questions unanswered. How would someone verify it. Also, please stop saying "Quiz time"