Sadhika Malladi: Mathematical Views on Modern Deep Learning Optimization

Поделиться
HTML-код
  • Опубликовано: 27 сен 2023
  • Speaker: Sadhika Malladi, Princeton University
    Date: 28 September 2023
    Abstract: This talk focuses on how rigorous mathematical tools can be used to describe the optimization of large, highly non-convex neural networks. We start by covering how stochastic differential equations (SDEs) provide a rigorous yet flexible model of how deep networks change over the course of training. We then cover how the SDEs yield practical insights into scaling training to highly distributed settings while preserving generalization performance. In the second half of the talk, we will explore the new deep learning paradigm of pre-training and fine-tuning large language models. We show that fine-tuning can be described by a very simplistic mathematical model, and insights allow us to develop a highly efficient and performant optimizer to fine-tune LLMs at scale. The talk will focus on various mathematical tools and the extent to which they can describe modern day deep learning.
    Seminar series website: sites.google.com/view/m-ml-sy...
  • НаукаНаука

Комментарии •