Reinforcement Learning: ChatGPT and RLHF

Поделиться
HTML-код
  • Опубликовано: 3 дек 2024

Комментарии • 17

  • @EternityUnknown
    @EternityUnknown 5 месяцев назад +8

    I just binged this playlist at 1 am. Absolutely worth it. You deserve more views.

  • @colorblindzebra
    @colorblindzebra 3 месяца назад +3

    PLEASE COMEBACK!! You are an amazing theacher!

  • @Coder.tahsin
    @Coder.tahsin 5 месяцев назад +3

    All of your videos are amazing, please upload more

  • @tuulymusic3856
    @tuulymusic3856 8 месяцев назад +4

    Please come back, your videos are great!

  • @胡里安-n6m
    @胡里安-n6m 7 месяцев назад +1

    help me a lot, can't wait to see more

  • @HoverAround
    @HoverAround 6 месяцев назад

    Joel, excellent explanation and talk! Thank you!

  • @n45a_
    @n45a_ Месяц назад

    ok everything makes sense now, thx

  • @ireoluwaTH
    @ireoluwaTH Год назад +1

    Welcome back!
    Hope to see more of these videos..

  • @pegasusbupt
    @pegasusbupt Год назад +2

    Amazing content! Please keep them coming!

  • @jasonpmorrison
    @jasonpmorrison Год назад +1

    Super helpful - thank you for this series!

  • @onhazrat
    @onhazrat Год назад

    🎯 Key Takeaways for quick navigation:
    00:00 🤖 Reinforcement learning improves large language models like ChatGPT.
    00:25 🃏 Large language models face issues like bias, errors, and quality.
    01:11 📊 Training data quality impacts results; removing bad jokes might help.
    01:55 🧩 Training on both good and bad jokes improves language models.
    02:38 🔄 Language models are policies, reinforcement learning uses policy gradient.
    03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition.
    03:35 🤔 RLHF theory: Language model might already know jokes' boundary.
    04:18 🏆 Training a reward network predicts human ratings for model's output.
    04:47 🔄 Reward network is a modified language model for predicting ratings.
    05:14 📝 Approach: Humans write text, train reward network, refine model with RL.
    05:57 ⚖️ Systems convert comparisons to ratings for reward network training.
    06:11 😄 RLHF successfully improves language models, including humor.
    Made with HARPA AI

  • @0xeb-
    @0xeb- Год назад +1

    Good teaching.

  • @vamsinadh100
    @vamsinadh100 Год назад +1

    You are the Best

  • @RaulMartinezRME
    @RaulMartinezRME Год назад +1

    Great content!!

  • @0xeb-
    @0xeb- Год назад +1

    How long it takes to train a reward network? And how reliable would it be?

  • @stayhappy-forever
    @stayhappy-forever 7 месяцев назад +2

    come back :(

  • @neo4242002
    @neo4242002 5 месяцев назад

    Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?