WARP: On the Benefits of Weight Averaged Rewarded Policies

Поделиться
HTML-код
  • Опубликовано: 17 янв 2025

Комментарии • 7

  • @EngineeredFemale
    @EngineeredFemale 6 месяцев назад +1

    Great video. Thank you for covering this! Could you also have a look at 'Terminator' paper and make a video on that? It looks like it's a new architecture.

    • @gabrielmongaras
      @gabrielmongaras  6 месяцев назад +1

      Will take a look at it! Not sure if I'll make a video on it yet though. I usually do if the model proposed something really interesting or has really good results!

  • @lexer_
    @lexer_ 6 месяцев назад +2

    There is a strange disconnect between the kinds concepts you explain here. Some of them are like absolute beginner, you have never coded before and don't know anything about machine learning but the majority of the video requires like at least multiple months of crash course if not years of experience in the field to understand. I think your videos could benefit from a more consistent assumption of competence of the audience. Or maybe approach it in sections for difference audience competencies?
    I really apprechiate these walk-throughs through papers. It makes it so much easier to concentrate for me than only reading on my own so I really want anyone that does this to succeed.

    • @gabrielmongaras
      @gabrielmongaras  6 месяцев назад +1

      Thanks for the feedback!! Usually I try to keep most papers I read a little more technical as I'll find myself explaining things like transformers over and over again, which could lead to videos being unnecessarily long (they already feel way too long, been trying to reduce lengths). Some videos, like the stable diffusion one, I try to make a little more beginner friendly assuming knowledge of CNNs, MLPs and basic training. I think I should probably communicate this better and make a clear distinction between the two perhaps in the title somewhere. I think with this video, I wanted it to be somewhere in the middle which led to problems.
      I like the idea of approaching in sections based on knowledge level. For example, I could've explained REINFORCE a little or point to a resource about it. While those who know it could skip this part, those who don't could watch it or go to the resource. Will think about this some more for future videos!

    • @lexer_
      @lexer_ 6 месяцев назад +1

      @@gabrielmongaras I personally got annoyed at the very beginner level explanations early on which felt like it took forever and I feel lucky that I stuck it out long enough to actually make it to the interesting parts later on. For a while I thought that was really all there would be in this video, explanations of some very basic concepts. But that is of course very different depending on where one might be comming from in terms of familiarity with these topics.
      I will watch out for future videos of yours for sure.

    • @gabrielmongaras
      @gabrielmongaras  6 месяцев назад +1

      @@lexer_ Got it! I didn't realize the intros were annoying 😅 Thanks again for you feedback, it's really helpful! From now, I think I will directly mention if I am explaining a basic concept, saying to skip if one already knows said concept and to provide resources instead of explaining concepts I assume most should know. In this video, I could've probably just skipped the RLHF explanation all together and pointed to one of the many explanations of RLHF. I'm going to experiment with this a little in future videos!

    • @codylane2104
      @codylane2104 6 месяцев назад

      @@gabrielmongaras Yeah, pointing to good basic level explanatory material is a great idea! There's a ton of materials on the net, yet only a fraction is really worth studying. 🙂