Statistical Learning: 10.7 Interpolation and Double Descent

Поделиться
HTML-код
  • Опубликовано: 13 дек 2024

Комментарии • 4

  • @HappyBelly10
    @HappyBelly10 2 года назад +2

    at 7:50, it looks like 8 degrees of freedom best fits the sine curve. This suggests to me that fewer parameters is better.

  • @annawilson3824
    @annawilson3824 Год назад +1

    5:36 double decent!

  • @anas.2k866
    @anas.2k866 Год назад

    Thanks. Why when the d_i are small the model does not stretch it self. what about the situation where there is no regulariziton

    • @yusun5722
      @yusun5722 6 месяцев назад

      When d_i are small, it can't fit all points at the same time, so it doesn't even have the opportunity to stretch itself. For the second question, weight decay is equivalent to L2 regularisation. And SGD with momentum is similar to weight decay. Hence for optimizers like Adam which has momentum, it implicily entails a L2 regularisation.