Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Поделиться
HTML-код
  • Опубликовано: 5 окт 2024
  • НаукаНаука

Комментарии • 37

  • @mshonle
    @mshonle 12 часов назад +17

    I tried reading this paper three times but then decided it would have been more optimal if they doubled the number of scientists writing it…

  • @tingtingin
    @tingtingin 4 часа назад +5

    He's alive!

  • @kikijuju4809
    @kikijuju4809 11 часов назад +3

    Long time no see

  • @akanjiemmanuel4807
    @akanjiemmanuel4807 13 часов назад +2

    Interesting paper

  • @existenceisillusion6528
    @existenceisillusion6528 23 минуты назад +1

    Are we sure a* is not a type-o that should have been y*?
    Also, best of weighted N beam majority?

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader 12 часов назад +2

    My goat is back

  • @Veptis
    @Veptis 9 часов назад

    Will have to check the whole video later. But I think IBM has had a somewhat similar paper recently. about the training rate changing based on epoch/mini batch performance on the benchmark or something. It's called scheduler something

  • @MasamuneX
    @MasamuneX 13 часов назад +3

    what if we use monty carlo tree search on tree of thought llms then we just keep the highest quality output and train a new foundation model on that synthetic data and repeat until asi

    • @montymemoladi8067
      @montymemoladi8067 12 часов назад +2

      Sounds like a promising approach and I think its reasonably close to what the big labs are planning to do

    • @AtAtaylor
      @AtAtaylor 11 часов назад

      People have already done this

    • @scoffpickle9655
      @scoffpickle9655 11 часов назад +1

      Or just use something similar to Thinker:learning to plan and act to kinda (predict) a few tokens ahead which might increase quality

    • @Adhil_parammel
      @Adhil_parammel 24 минуты назад

      Oracle to guide and reach asi required.

  • @benedictsmith2415
    @benedictsmith2415 Минуту назад

    Equation 1 just serves as a theoretical foundation for the "compute-optimal" concept but it cannot be directly used for optimization because:
    Intractability: Finding the truly optimal hyperparameters θ across all possible prompts and compute budgets a*(q) would require an exhaustive search.....
    Unknown Ground Truth: In a real-world setting, we don't know the ground-truth correct answer y*(q) for unseen prompt, so directly optimizing the indicator function is impossible.

  • @islandfireballkill
    @islandfireballkill 13 часов назад +4

    Wake up, babe. New Yannic video just dropped.

  • @MinecraftJuiceHD
    @MinecraftJuiceHD 3 часа назад

    Isn't beam search done per token? Why does yannic say that they grade the answers?

  • @KadeemSometimes
    @KadeemSometimes 13 часов назад +1

    Nice

  • @TheAIEpiphany
    @TheAIEpiphany 5 часов назад

    21:48 What can be unburdened by what has been

  • @gileneusz
    @gileneusz 12 часов назад

    he's the best

  • @aa-xn5hc
    @aa-xn5hc 4 часа назад

    Please the news back!

  • @LysergicKids
    @LysergicKids 12 часов назад

    It can't be, a new paper that's not 98% marketing wank? Is the world healing, brothers

  • @csabaczcsomps7655
    @csabaczcsomps7655 46 минут назад

    I think wath you want. When a kid see you put one apple than put one more he will answer we have 2. So we write 1+1=2. Then he will take notation always as true wthitout recall the apple video. This mean some training need 2 module, video then video-notation asociotion. And probable use notation is 3 step. My noob opinion.

  • @nineteenfortyeight6762
    @nineteenfortyeight6762 9 часов назад +2

    Why in the name of all that's holy are we asking an LLM to do arithmetic?? 😭

    • @hunterkudo9832
      @hunterkudo9832 7 часов назад +1

      Because being able to do arithmetic is a good indicator of being able to reason. We want LLMs to be good reasoners because a lot of tasks in the real world will require LLMs and soon AI agents to reason like a human can.

    • @HUEHUEUHEPony
      @HUEHUEUHEPony 3 часа назад +1

      Because not all of us are interested in roleplay slop

  • @fontenbleau
    @fontenbleau 12 часов назад +1

    Python is just dead end pathway. One guy on RUclips writes neural network in Assembly low-level language and it's 500 times faster than Pytorch on 1 CPU core on one same task. We need full rewrite of networks and models.

    • @scoffpickle9655
      @scoffpickle9655 11 часов назад +1

      Please tell me who made that. It seems so interesting

    • @scoffpickle9655
      @scoffpickle9655 11 часов назад +1

      Also yeah, C or C++ is better for actually useful and fast models, python is good for modularity and prototyping but god it is so fucking slow

    • @biomerl
      @biomerl 11 часов назад +2

      Wat? 99 percent of training is done on gpu which is already cpp

    • @scoffpickle9655
      @scoffpickle9655 10 часов назад

      @biomerl Yeah sorry I dont have much knowledge on low level ML

    • @kennycommentsofficial
      @kennycommentsofficial 9 часов назад

      @@scoffpickle9655easiest starting place is search youtube for matrix multiplication with cuda (basically just c code)

  • @ozordiprince9405
    @ozordiprince9405 13 часов назад

    200 views in 15 minutes. Bro fell off