First of all thank you very much for giving all this material out for free, it's a real asset in my day to day. Now the question : at 2:34 you mention that we don't want too many observations in our training set to "have the replication" as we increase the size of the training set. I am not sure i understand that part. The only sens i could make out of it is if the has (for example) a trend change at some time t1 and we are fitting only one trend curve in our model. In this case we would have no issue until our horizon starts encompassing points after t1. First our forecast error would grow, then points after t1 would become part of the training set probably helping a bit but we would be in an underfitting situation. Keeping the same model the only way out of this situation would be to fix the number of points in our training set so that as we shift forward the time of the first point in our training set it would eventually reach t1 and forget about the trend before t1 present in the data. Is this somehow related to what you meant ? is there more to it ? i am all ears for any kind of enlightenment on this. If i wasn't clear enough please let me know and i'll do my best to rephrase. Thank you !
In time series cross-validation we have expanding training sets. We estimate forecast accuracy based on the test sets. If we have a small initial training set, we can have many test sets, which helps provide better estimates of forecast accuracy. If we have a large initial training set, it is not possible to have so many test sets, and then the forecast accuracy won't be measured so well. By replications, I meant the repeated test sets.
How do you do this when your model requires extra variables? If I pass the stretched data into the forecast function it says that it ignores the h argument. Can it infer the h argument from the stretched data or is this not doing what I expect. Thanks.
First of all thank you very much for giving all this material out for free, it's a real asset in my day to day.
Now the question : at 2:34 you mention that we don't want too many observations in our training set to "have the replication" as we increase the size of the training set. I am not sure i understand that part.
The only sens i could make out of it is if the has (for example) a trend change at some time t1 and we are fitting only one trend curve in our model. In this case we would have no issue until our horizon starts encompassing points after t1. First our forecast error would grow, then points after t1 would become part of the training set probably helping a bit but we would be in an underfitting situation. Keeping the same model the only way out of this situation would be to fix the number of points in our training set so that as we shift forward the time of the first point in our training set it would eventually reach t1 and forget about the trend before t1 present in the data.
Is this somehow related to what you meant ? is there more to it ? i am all ears for any kind of enlightenment on this.
If i wasn't clear enough please let me know and i'll do my best to rephrase.
Thank you !
In time series cross-validation we have expanding training sets. We estimate forecast accuracy based on the test sets. If we have a small initial training set, we can have many test sets, which helps provide better estimates of forecast accuracy. If we have a large initial training set, it is not possible to have so many test sets, and then the forecast accuracy won't be measured so well. By replications, I meant the repeated test sets.
@@RobJHyndman Thank your very much for your answer, things are very clear now.
How do you do this when your model requires extra variables? If I pass the stretched data into the forecast function it says that it ignores the h argument. Can it infer the h argument from the stretched data or is this not doing what I expect. Thanks.
Here is an example: gist.github.com/robjhyndman/a11eba538d26ba60454abd600ea6204d