CS 285: Eric Mitchell: Reinforcement Learning from Human Feedback: Algorithms & Applications

Поделиться
HTML-код
  • Опубликовано: 1 фев 2025

Комментарии • 3

  • @PsiGame-yu7so
    @PsiGame-yu7so 4 месяца назад +2

    This is a severely underviewed video, the best walkthrough and explanation on DPO I've ever seen!

  • @AurobindoTripathy
    @AurobindoTripathy 11 месяцев назад +2

    good energy, mildly funny, by far the best articulation...can only come from the DPO inventors, Eric Mitchell et, al

  • @SantoshGupta-jn1wn
    @SantoshGupta-jn1wn Год назад

    Thanks for posting this!