Second-order Optimization Methods for Machine Learning

Поделиться
HTML-код
  • Опубликовано: 22 окт 2024
  • Abstract:
    First-order optimization methods, particularly stochastic gradient descent, are the primary workhorse in machine learning (ML) for their historically low per-iteration costs. Recent theoretical advancements, including rapid convergence in overparameterized scenarios, implicit regularization, and the emergence of network architectures like skip connections, have solidified their dominance in ML. Nonetheless, sensitivity to hyperparameter tuning, susceptibility to saddle point entrapment, slow convergence in rugged landscapes with smaller networks, and inefficiency in constrained and distributed optimization settings remain significant challenges for these methods.
    Second-order methods, on the other hand, can attain superior convergence rates, overcome non-convexity and ill-conditioning, effectively handle constraints, and exploit parallelism and distributed architectures in novel ways. However, their non-trivial sub-problems as well as high per-iteration costs continue to limit their wide-spread usage. In light of this, I will provide an overview of ongoing research into efficient, robust, and scalable second-order optimization algorithms for ML. To that end, I will focus Newton-MR variants, a novel class of Newton-type methods, that offer many desirable theoretical and practical properties and have the potential to surpass first-order methods in the next generation of optimization methods for large-scale machine learning.
    Associate Professor Fred Roosta-Khorasani:
    Fred Roosta is an associate professor in the School of Mathematics and Physics at the University of Queensland (UQ). In addition, he is a chief investigator and a theme leader with the ARC Training Centre for Information Resilience (CIRES). Prior to joining UQ, he was a post-doctoral fellow in the Department of Statistics at the University of California, Berkeley. He obtained his PhD from the University of British Columbia in 2015.
    Fred’s research interests and prior works span several areas of applied mathematics and computer science, including machine learning, numerical optimization, scientific computing, computational statistics as well as distributed and high performance computing. He is generally interested in studying various theoretical and algorithmic aspects of solving modern data analysis problems. In 2018, he was awarded the Discovery Early Career Researcher Award (DECRA) by the Australian Research Council for his research on second-order optimization for machine learning.

Комментарии •