Reinforcement Learning from Human Feedback (RLHF) & Direct Preference Optimization (DPO) Explained

RLHF: How to Learn from Human Feedback with Reinforcement Learning

The Griffin architecture: A challenger to the Transformer

minecraft movie trailer… if it was good

Texas at Michigan | Highlights | Big Ten Football | 09/07/2024

DRAGON BALL: Sparking! ZERO - Majin Buu Saga Character Trailer

RLOO: A Cost-Efficient Optimization for Learning from Human Feedback in LLMs

BuzzRobot

Просмотров 3,4 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 сен 2024

Комментарии • 1

@BuzzRobot Месяц назад ⁺⁶
Timestamps:
0:00 Introduction
0:33 Background on RLHF (Reinforcement learning from Human Feedback)
5:45 Back to basics: REINFORCE
8:44 From PPO (Proximal Policy Optimization) to REINFORCE
23:27 Results of the new optimization method, RLOO
32:35 Conclusions
34:05 Q&A

Следующие

Автовоспроизведение

Reinforcement Learning from Human Feedback (RLHF) & Direct Preference Optimization (DPO) Explained

Reinforcement Learning from Human Feedback (RLHF) & Direct Preference Optimization (DPO) Explained

RLHF: How to Learn from Human Feedback with Reinforcement Learning

RLHF: How to Learn from Human Feedback with Reinforcement Learning

The Griffin architecture: A challenger to the Transformer

The Griffin architecture: A challenger to the Transformer

minecraft movie trailer… if it was good

minecraft movie trailer… if it was good

Texas at Michigan | Highlights | Big Ten Football | 09/07/2024

Texas at Michigan | Highlights | Big Ten Football | 09/07/2024

DRAGON BALL: Sparking! ZERO - Majin Buu Saga Character Trailer

DRAGON BALL: Sparking! ZERO – Majin Buu Saga Character Trailer

I BROKE THE DRIBBLING ON NBA 2K25 ALREADY! BEST GAME BREAKING GLITCHY DRIBBLE MOVES TO GET OPEN 100%

I BROKE THE DRIBBLING ON NBA 2K25 ALREADY! BEST GAME BREAKING GLITCHY DRIBBLE MOVES TO GET OPEN 100%

10 weird algorithms

10 weird algorithms

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Proximal Policy Optimization | ChatGPT uses this

Proximal Policy Optimization | ChatGPT uses this

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Mastering RLHF with AWS: A Hands-on Workshop on Reinforcement Learning from Human Feedback

Mastering RLHF with AWS: A Hands-on Workshop on Reinforcement Learning from Human Feedback

MIT 6.S191 (2023): Reinforcement Learning

MIT 6.S191 (2023): Reinforcement Learning

Ilya Sutskever (OpenAI Chief Scientist) - Building AGI, Alignment, Spies, Microsoft, & Enlightenment

Ilya Sutskever (OpenAI Chief Scientist) - Building AGI, Alignment, Spies, Microsoft, & Enlightenment

Арсен выражает уважение #сатир #пародия #маркарян #satyr

Арсен выражает уважение #сатир #пародия #маркарян #satyr

Men Vs Women Survive The Wilderness For $500,000

Men Vs Women Survive The Wilderness For $500,000

Бомба бом бом #uzbwedding #rek #той #kulgilivideo #hahaidea #svadbauz #uzbekistanmusic #dance бомба

Бомба бом бом #uzbwedding #rek #той #kulgilivideo #hahaidea #svadbauz #uzbekistanmusic #dance бомба

IT'S MY LIFE + WATER #drumcover

IT'S MY LIFE + WATER #drumcover

СКОЛЬКО СТОИТ ОДЕЖДА ИГРОКА PUBG MOBILE 😨

СКОЛЬКО СТОИТ ОДЕЖДА ИГРОКА PUBG MOBILE 😨

Chuck Be Like : Not My Problem😂 | #brawlstars #shorts

Chuck Be Like : Not My Problem😂 | #brawlstars #shorts

Купили ТАЧКУ и УЕХАЛИ НА КРАЙ ЗЕМЛИ…!

Купили ТАЧКУ и УЕХАЛИ НА КРАЙ ЗЕМЛИ…!

Я ЖЕ БЕРЕМЕННА#cat

Я ЖЕ БЕРЕМЕННА#cat