Blake Bordelon | Infinite Limits and Scaling Laws for Deep Neural Networks

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024
  • New Technologies in Mathematics Seminar 9/25/2024
    Speaker: Blake Bordelon
    Title: Infinite Limits and Scaling Laws for Deep Neural Networks
    Abstract: Scaling up the size and training horizon of deep learning models has enabled breakthroughs in computer vision and natural language processing. Empirical evidence suggests that these neural network models are described by regular scaling laws where performance of finite parameter models improves as model size increases, eventually approaching a limit described by the performance of an infinite parameter model. In this talk, we will first examine certain infinite parameter limits of deep neural networks which preserve representation learning and then describe how quickly finite models converge to these limits. Using dynamical mean field theory methods, we provide an asymptotic description of the learning dynamics of randomly initialized infinite width and depth networks. Next, we will empirically investigate how close the training dynamics of finite networks are to these idealized limits. Lastly, we will provide a theoretical model of neural scaling laws which describes how generalization depends on three computational resources: training time, model size and data quantity. This theory allows analysis of compute optimal scaling strategies and predicts how model size and training time should be scaled together in terms of spectral properties of the limiting kernel. The theory also predicts how representation learning can improve neural scaling laws in certain regimes. For very hard tasks, the theory predicts that representation learning can approximately double the training-time exponent compared to the static kernel limit.

Комментарии •