Understanding the Finite Element Method

The Strange Physics Principle That Shapes Reality

Dr. Paul Conti: How to Understand & Assess Your Mental Health | Huberman Lab Guest Series

TAKING OUR BABY HOME AFTER LABOR !! * NAME REVEAL *

What Really Happened to Mike Tyson Last Night: The Truth Behind His Loss to Jake Paul

Zelenskyy on Biden authorizing use of US-supplied longer range missiles against Russia

Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation - ArXiv:2407.181

Academia Accelerated

Просмотров 27

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 18 ноя 2024
Original paper: arxiv.org/abs/...
Title: Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation
Authors: Jean Seong Bjorn Choe, Jong-Kook Kim
Abstract:
Entropy Regularisation is a widely adopted technique that enhances policy optimisation performance and stability. A notable form of entropy regularisation is augmenting the objective with an entropy term, thereby simultaneously optimising the expected return and the entropy. This framework, known as maximum entropy reinforcement learning (MaxEnt RL), has shown theoretical and empirical successes. However, its practical application in straightforward on-policy actor-critic settings remains surprisingly underexplored. We hypothesise that this is due to the difficulty of managing the entropy reward in practice. This paper proposes a simple method of separating the entropy objective from the MaxEnt RL objective, which facilitates the implementation of MaxEnt RL in on-policy settings. Our empirical evaluations demonstrate that extending Proximal Policy Optimisation (PPO) and Trust Region Policy Optimisation (TRPO) within the MaxEnt framework improves policy optimisation performance in both MuJoCo and Procgen tasks. Additionally, our results highlight MaxEnt RL's capacity to enhance generalisation.

Комментарии •

Следующие

Автовоспроизведение

Understanding the Finite Element Method

Understanding the Finite Element Method

The Strange Physics Principle That Shapes Reality

The Strange Physics Principle That Shapes Reality

Dr. Paul Conti: How to Understand & Assess Your Mental Health | Huberman Lab Guest Series

Dr. Paul Conti: How to Understand & Assess Your Mental Health | Huberman Lab Guest Series

TAKING OUR BABY HOME AFTER LABOR !! * NAME REVEAL *

TAKING OUR BABY HOME AFTER LABOR !! * NAME REVEAL *

What Really Happened to Mike Tyson Last Night: The Truth Behind His Loss to Jake Paul

What Really Happened to Mike Tyson Last Night: The Truth Behind His Loss to Jake Paul

Zelenskyy on Biden authorizing use of US-supplied longer range missiles against Russia

Zelenskyy on Biden authorizing use of US-supplied longer range missiles against Russia

Juice WRLD & Nicki Minaj - AGATS2 (Insecure) (Official Audio)

Juice WRLD & Nicki Minaj - AGATS2 (Insecure) (Official Audio)

Wolfram Physics Project: Working Session Wednesday, Apr. 29, 2020 [Finding Black Hole Structures]

Wolfram Physics Project: Working Session Wednesday, Apr. 29, 2020 [Finding Black Hole Structures]

Should Computers Run the World? - with Hannah Fry

Should Computers Run the World? - with Hannah Fry

(Unfair) Norms in Fairness Research: A Meta-Analysis - ArXiv:2407.16895

(Unfair) Norms in Fairness Research: A Meta-Analysis - ArXiv:2407.16895

11. Byzantium - Last of the Romans

11. Byzantium - Last of the Romans

Eric Weinstein - Are We On The Brink Of A Revolution? (4K)

Eric Weinstein - Are We On The Brink Of A Revolution? (4K)

Frank Wilczek | From Quarks to Galaxies: A tour through the forefront of modern physics | Full Video

Frank Wilczek | From Quarks to Galaxies: A tour through the forefront of modern physics | Full Video

Fourier Transform, Fourier Series, and frequency spectrum

Fourier Transform, Fourier Series, and frequency spectrum

Leveraging Subgrid-Scale Spatial Organization + Variability for Improved Cloud Fraction....

Leveraging Subgrid-Scale Spatial Organization + Variability for Improved Cloud Fraction....

Wolfram Physics Project: A Discussion with Jim Gates

Wolfram Physics Project: A Discussion with Jim Gates

Bibib sudah tahu rahasianya!! 🤢😭 #funnyvideo #funny #funnyanimals #cuteanimals ##cute #pets

Bibib sudah tahu rahasianya!! 🤢😭 #funnyvideo #funny #funnyanimals #cuteanimals ##cute #pets

一碗水真的能端平吗？不能也得能！#四小只吖 #日常 #搞笑 #搞笑家庭 #姐弟 #家庭生活

一碗水真的能端平吗？不能也得能！#四小只吖 #日常 #搞笑 #搞笑家庭 #姐弟 #家庭生活

АНЕКДОТ ОТ КАТИ МОРГУНОВОЙ #мнесмешно #моргунова #прикол #воронин #бабьяк #mediumquality #юмор

АНЕКДОТ ОТ КАТИ МОРГУНОВОЙ #мнесмешно #моргунова #прикол #воронин #бабьяк #mediumquality #юмор

Побег из Тюрьмы : Тетрис помог Nuggets Gegagedigedagedago сбежать от Nikocado Avocado !

Побег из Тюрьмы : Тетрис помог Nuggets Gegagedigedagedago сбежать от Nikocado Avocado !

ВС РФ Зашли В Черниговскую Область🎖 Началось Запорожское Наступление⚔️ Военные Сводки За 15.11.2024

ВС РФ Зашли В Черниговскую Область🎖 Началось Запорожское Наступление⚔️ Военные Сводки За 15.11.2024

#ice #icequeen #cold #iceprincess #frozenqueen #galkina_anechka

#ice #icequeen #cold #iceprincess #frozenqueen #galkina_anechka

UNLIMITED CHOCOLATE 😲😍| My Dad is a Vending Machine!

UNLIMITED CHOCOLATE 😲😍| My Dad is a Vending Machine!

О первых вайнах и реакции коллег #галич #идагалич #меньшова #интервью

О первых вайнах и реакции коллег #галич #идагалич #меньшова #интервью