When d_i are small, it can't fit all points at the same time, so it doesn't even have the opportunity to stretch itself. For the second question, weight decay is equivalent to L2 regularisation. And SGD with momentum is similar to weight decay. Hence for optimizers like Adam which has momentum, it implicily entails a L2 regularisation.
at 7:50, it looks like 8 degrees of freedom best fits the sine curve. This suggests to me that fewer parameters is better.
5:36 double decent!
Thanks. Why when the d_i are small the model does not stretch it self. what about the situation where there is no regulariziton
When d_i are small, it can't fit all points at the same time, so it doesn't even have the opportunity to stretch itself. For the second question, weight decay is equivalent to L2 regularisation. And SGD with momentum is similar to weight decay. Hence for optimizers like Adam which has momentum, it implicily entails a L2 regularisation.