10 minutes paper (episode 20); InstructGPT

Поделиться
HTML-код
  • Опубликовано: 19 окт 2024

Комментарии • 9

  • @vivekpadman5248
    @vivekpadman5248 Год назад +2

    The only video one needs to understand how rlhf actually works. Great demonstration sir, thanks a lot.

  • @ezarbali2713
    @ezarbali2713 Год назад +2

    Great vid!
    Btw: the value function takes only the state as an input and averages the reward for each action possible. To obtain optimal policy out of the state value function one has to iterate through the V(s_next) and look at the action where the maximum expected reward is given.
    The action-state value function takes in the state and action and outputs the cumulative reward.

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 11 месяцев назад +2

    I was hoping you would start with Dear Fellow Scholars hahaha

  • @filipelauar2686
    @filipelauar2686 Год назад +1

    Great video, thanks for making it!!

  • @musicalwanderings7380
    @musicalwanderings7380 Год назад +2

    You need to zoom in the sections of paper so we can see things clearly..... Just showing unintelligible font size isn't helping......

  • @amirrezamohammadi
    @amirrezamohammadi Год назад +1

    Interesting, Thanks

  • @rotacidni
    @rotacidni Год назад

    Is the key part of the instructGPT is its value policy which taking input of prompt and answers?

  • @shahimvedaei242
    @shahimvedaei242 Год назад

    awesome