Martha White | Advances in Value Estimation in Reinforcement Learning

Поделиться
HTML-код
  • Опубликовано: 16 сен 2024
  • Sponsored by Evolution AI: www.evolution.ai/
    Paper: arxiv.org/abs/...
    Abstract: Temporal difference learning algorithms underlie most approaches in reinforcement learning, for both prediction and control. A well-known issue is that these approaches can diverge under nonlinear function approximation, such as with neural networks, and in the off-policy setting where data is generated by a different policy than the one being learned. Naturally, there has been a flurry of work towards resolving this issue, primarily through sound gradient-based methods, but many of these approaches have been avoided due to a perception that they are ineffective or hard-to-use. In this talk, I will discuss a new generalized objective that unifies several previous approaches and facilitates creating easy-to-use algorithms that consistently outperform temporal difference learning approaches in our experiments.
    Bio: Martha White is an Associate Professor of Computing Science at the University of Alberta and a PI of Amii---the Alberta Machine Intelligence Institute---which is one of the top machine learning centres in the world. She holds a Canada CIFAR AI Chair and received IEEE’s “AIs 10 to Watch: The Future of AI” award in 2020. She has authored more than 50 papers in top journals and conferences. Martha is an associate editor for TPAMI, and has served as co-program chair for ICLR and area chair for many conferences in AI and ML, including ICML, NeurIPS, AAAI and IJCAI. Her research focus is on developing algorithms for agents continually learning on streams of data, with an emphasis on representation learning and reinforcement learning.

Комментарии •