Reinforcement Learning: ChatGPT and RLHF

Поделиться
HTML-код
  • Опубликовано: 16 июн 2024
  • Reinforcement Learning from human feedback, and how it's used to help train large language models like ChatGPT.
    Part 3 of RL from scratch series.
    • Reinforcement Learning...
    0:00 - intro
    0:06 - large language models
    0:35 - learning to tell jokes
    1:13 - fine tuning with better data
    1:26 - positive and negative examples
    2:03 - reinforcement learning for LLMs
    3:00 - labeling fewer examples
    3:56 - reward networks
    5:08 - summing it up
    5:23 - variants
    5:57 - chatGPT, Bard, Claude, Llama
    6:09 - finally, a good joke!

Комментарии • 14

  • @tuulymusic3856
    @tuulymusic3856 2 месяца назад +2

    Please come back, your videos are great!

  • @Coder.tahsin
    @Coder.tahsin 4 дня назад

    All of your videos are amazing, please upload more

  • @HoverAround
    @HoverAround 23 дня назад

    Joel, excellent explanation and talk! Thank you!

  • @ireoluwaTH
    @ireoluwaTH 10 месяцев назад +1

    Welcome back!
    Hope to see more of these videos..

  • @jasonpmorrison
    @jasonpmorrison 8 месяцев назад +1

    Super helpful - thank you for this series!

  • @pegasusbupt
    @pegasusbupt 8 месяцев назад +2

    Amazing content! Please keep them coming!

  • @user-cm5es5kk7j
    @user-cm5es5kk7j Месяц назад +1

    help me a lot, can't wait to see more

  • @RaulMartinezRME
    @RaulMartinezRME 10 месяцев назад +1

    Great content!!

  • @0xeb-
    @0xeb- 10 месяцев назад +1

    Good teaching.

  • @neo4242002
    @neo4242002 День назад

    Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?

  • @vamsinadh100
    @vamsinadh100 7 месяцев назад +1

    You are the Best

  • @0xeb-
    @0xeb- 10 месяцев назад +1

    How long it takes to train a reward network? And how reliable would it be?

  • @stayhappy-forever
    @stayhappy-forever Месяц назад +2

    come back :(

  • @onhazrat
    @onhazrat 10 месяцев назад

    🎯 Key Takeaways for quick navigation:
    00:00 🤖 Reinforcement learning improves large language models like ChatGPT.
    00:25 🃏 Large language models face issues like bias, errors, and quality.
    01:11 📊 Training data quality impacts results; removing bad jokes might help.
    01:55 🧩 Training on both good and bad jokes improves language models.
    02:38 🔄 Language models are policies, reinforcement learning uses policy gradient.
    03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition.
    03:35 🤔 RLHF theory: Language model might already know jokes' boundary.
    04:18 🏆 Training a reward network predicts human ratings for model's output.
    04:47 🔄 Reward network is a modified language model for predicting ratings.
    05:14 📝 Approach: Humans write text, train reward network, refine model with RL.
    05:57 ⚖️ Systems convert comparisons to ratings for reward network training.
    06:11 😄 RLHF successfully improves language models, including humor.
    Made with HARPA AI