Large Language Models from scratch

Reinforcement Learning: AlphaGo

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Blox Fruits ALL Changes in Dragon Rework Update

The FULL Guide To Get Fully AWAKENED Draco Race V4 (V1, V2 & V3) | Blox Fruits

The Breakfast Club Reacts To Jay-Z’s Attorney Saying Him & Diddy Aren’t Friends + More

Reinforcement Learning: ChatGPT and RLHF

Graphics in 5 Minutes

Просмотров 14 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 фев 2025

Комментарии • 18

@EternityUnknown 7 месяцев назад ⁺¹⁴
I just binged this playlist at 1 am. Absolutely worth it. You deserve more views.
@dudeguy8864 Месяц назад
agreed
@colorblindzebra 5 месяцев назад ⁺⁴
PLEASE COMEBACK!! You are an amazing theacher!
@Coder.tahsin 7 месяцев назад ⁺³
All of your videos are amazing, please upload more
@ireoluwaTH Год назад ⁺¹
Welcome back!
Hope to see more of these videos..
@tuulymusic3856 10 месяцев назад ⁺⁴
Please come back, your videos are great!
@pegasusbupt Год назад ⁺²
Amazing content! Please keep them coming!
@HoverAround 8 месяцев назад
Joel, excellent explanation and talk! Thank you!
@jasonpmorrison Год назад ⁺¹
Super helpful - thank you for this series!
@胡里安-n6m 9 месяцев назад ⁺¹
help me a lot, can't wait to see more
@onhazrat Год назад
🎯 Key Takeaways for quick navigation:
00:00 🤖 Reinforcement learning improves large language models like ChatGPT.
00:25 🃏 Large language models face issues like bias, errors, and quality.
01:11 📊 Training data quality impacts results; removing bad jokes might help.
01:55 🧩 Training on both good and bad jokes improves language models.
02:38 🔄 Language models are policies, reinforcement learning uses policy gradient.
03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition.
03:35 🤔 RLHF theory: Language model might already know jokes' boundary.
04:18 🏆 Training a reward network predicts human ratings for model's output.
04:47 🔄 Reward network is a modified language model for predicting ratings.
05:14 📝 Approach: Humans write text, train reward network, refine model with RL.
05:57 ⚖️ Systems convert comparisons to ratings for reward network training.
06:11 😄 RLHF successfully improves language models, including humor.
Made with HARPA AI
@n45a_ 3 месяца назад
ok everything makes sense now, thx
@RaulMartinezRME Год назад ⁺¹
Great content!!
@0xeb- Год назад ⁺¹
Good teaching.
@vamsinadh100 Год назад ⁺¹
You are the Best
@stayhappy-forever 9 месяцев назад ⁺⁴
come back :(
@0xeb- Год назад ⁺¹
How long it takes to train a reward network? And how reliable would it be?
@neo4242002 7 месяцев назад
Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?

Следующие

Автовоспроизведение

Large Language Models from scratch

Large Language Models from scratch

Reinforcement Learning: AlphaGo

Reinforcement Learning: AlphaGo

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models

Blox Fruits ALL Changes in Dragon Rework Update

Blox Fruits ALL Changes in Dragon Rework Update

The FULL Guide To Get Fully AWAKENED Draco Race V4 (V1, V2 & V3) | Blox Fruits

The FULL Guide To Get Fully AWAKENED Draco Race V4 (V1, V2 & V3) | Blox Fruits

The Breakfast Club Reacts To Jay-Z’s Attorney Saying Him & Diddy Aren’t Friends + More

The Breakfast Club Reacts To Jay-Z’s Attorney Saying Him & Diddy Aren’t Friends + More

How Employees Are Coffee Badging To Avoid Full Days At The Office

How Employees Are Coffee Badging To Avoid Full Days At The Office

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

o3-mini is the FIRST DANGEROUS Autonomy Model | INSANE Coding and ML Abilities

o3-mini is the FIRST DANGEROUS Autonomy Model | INSANE Coding and ML Abilities

Reinforcement Learning from scratch

Reinforcement Learning from scratch

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Reinforcement Learning from Human Feedback (RLHF) Explained

Reinforcement Learning from Human Feedback (RLHF) Explained

AI Is Making You An Illiterate Programmer

AI Is Making You An Illiterate Programmer

How ChatGPT is Trained

How ChatGPT is Trained

Большой бесплатный курс по ChatGPT - 1 часть

Большой бесплатный курс по ChatGPT - 1 часть

Чем грозит дефолт США?💸

Чем грозит дефолт США?💸

ПРОФЕССИЯ (смешное видео, юмор, приколы, поржать)

ПРОФЕССИЯ (смешное видео, юмор, приколы, поржать)

Решил оплатить шаурму переводом

Решил оплатить шаурму переводом

Бизнес на СТУДИИ ПОДКАСТОВ в СПБ - клиентов НЕТ, денег тоже!

Бизнес на СТУДИИ ПОДКАСТОВ в СПБ - клиентов НЕТ, денег тоже!

Самый легкий материал в мире!

Самый легкий материал в мире!

КАК НА ФОТО #shorts

КАК НА ФОТО #shorts

Игра в Калмэра 2. И ВИДЕО НОВОЕ ПОСМОТРИТЕ НА КАНАЛЕ

Игра в Калмэра 2. И ВИДЕО НОВОЕ ПОСМОТРИТЕ НА КАНАЛЕ

ПУШКИ И МАСЛО. БЕСЕДА С ИГОРЕМ ЛИПСИЦЕМ @Igor.Lipsits1950

ПУШКИ И МАСЛО. БЕСЕДА С ИГОРЕМ ЛИПСИЦЕМ @Igor.Lipsits1950