Reinforcement Learning for LLMs in 2025

Поделиться
HTML-код
  • Опубликовано: 10 фев 2025

Комментарии •

  • @NaturalMelodicHarmonic
    @NaturalMelodicHarmonic 12 часов назад

    Great video. Deffo think model size is what is hampering meaningful results.

    • @TrelisResearch
      @TrelisResearch  2 часа назад

      Yeah I’m gonna try ablate bigger. This is a key Q

  • @thanartchamnanyantarakij9950
    @thanartchamnanyantarakij9950 20 часов назад

    Waiting for this!

  • @TheLokiGT
    @TheLokiGT 17 часов назад

    Very good content, as usual.

  • @biochemcompsci
    @biochemcompsci 15 часов назад

    Top Notch content. Thank you.

  • @gileneusz
    @gileneusz 17 часов назад

    very good explanation of current RL approaches, SOTA video about it. Ideas for improvement: you can do some kind of presentation giving more insights about RL and fine tuning, I remember you did something like that in the past, but maybe updated version on DeepSeek approaches. Maybe stages on how to do a model and then fine tune it so it would be able to reason but with no-code presentation form.

    • @TrelisResearch
      @TrelisResearch  17 часов назад

      good points, yeah will aim to do that in the follow-on

    • @gileneusz
      @gileneusz 17 часов назад

      @@TrelisResearch there were some news that Berkeley reaserchers recreated "aha moment" for $30, you can also do video on that (just sharing ideas on videos, not demanding them lol)

  • @rajaakhil588
    @rajaakhil588 16 часов назад

    Your in-depth content surpasses that of many other RUclipsrs. A tutorial demonstrating computer use model training using reinforcement learning and simulated UI would be highly valuable. Would a (GRPO) approach be suitable for this image-inclusive data? Finally, to enhance the reasoning process, could we incorporate a "tool_call" tag enabling LLMs to utilize tools during reasoning, rather than solely in the answer phase?

    • @TrelisResearch
      @TrelisResearch  2 часа назад

      That’s a cool idea and I’ll add to my list of potential ideas.
      Yea you can add tools. It does make eval a bit harder because now there can be stochasticity in the tool. But broadly a good idea

  • @TemporaryForstudy
    @TemporaryForstudy 20 часов назад

    Nice man, much needed after deepseek. I am gonna watch and do hands on. Hey, do you have any job for AI engineer? may be someone in your network? Please let me know I want to do remote work

    • @TrelisResearch
      @TrelisResearch  19 часов назад +1

      check trelis.com for developer collaborations, which are the path way to joining the Trelis team

  • @akshayvasisht
    @akshayvasisht 16 часов назад

    can u pls make a video of applying RL to vision llms

  • @Little-bird-told-me
    @Little-bird-told-me 21 час назад

    Could ORPO’s balance of cross-entropy and odds ratios make it a more stable alternative to PPO-based RLHF? Also, does the beta parameter generalize across models, or does it require fine-tuning?

    • @TrelisResearch
      @TrelisResearch  18 часов назад

      I'll talk more about this in the next video but the x-entropy inclusion kind of serves a similar role to the KL divergence in GRPO or PPO (which keeps the model grounded towards original weights).
      Beta does not generalise all that well in my experience and needs tuning. Somewhere between 0.2 and 0.5