Speculations on Test-Time Scaling (o1)

Sasha Rush 🤗

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 14 ноя 2024

Комментарии • 28

@test-sc2iy 2 дня назад ⁺⁸
we're in a spot where a serious person can seriously say "it's SIMPLY the model talking to itself until it solves the problem" , and we enthusiasts shrug and move along. What a time to be alive.
@ambientswn 23 часа назад
But there is so much more to problem-solving than recursive iteration, is it? Humans solve problems using hypermodalities. Bodily sensations, sounds, smells, gut bioma, and emotional states all impact how we think. Then there are the more or less understood “a-ha!” moments or trial-and-error lucky guesses where intuitive judgment makes the call. We also have subconscious processing during sleep tackling the most difficult problems we are stuck with, accompanied by cerebrospinal fluid flushing over our brain tissue. Then there are hungover days when creativity takes the lead for some (e.g., Hemingway). Good luck trying to introduce a central nervous system depressant like alcohol into an LLM and then get the best out of it, lol. I can only imagine how difficult it is to capture all these nuances in current or future LLM architectures. Almost seems like we need something else to augment LLMs with.
@DanielBonaker День назад ⁺¹
Very interesting summary, thanks a lot. My intuition is that evaluation/test is where we can grow / low hanging fruits.
@drhxa День назад ⁺¹
Stream of search + let's verify step-by-step has looked the most likely to me. It might be that they just put their heads down and worked really hard to solve the collapse problems and optimized generalizability.
Regardless, amazing overview, thanks a bunch for sharing
@familiabartolome9725 День назад ⁺¹
such a good overview - thank you for the insights, quite instructive and accessible
@sanesanyo 2 дня назад ⁺²
Thank you so much for such an informative video 🙏🙏.
@openroomxyz 2 дня назад ⁺³
Thanks for creating this video
@novantha1 7 часов назад
I find this ridiculous and remarkably improbable. Did you see the missed space in the example CoT from o1? That matches Sam Altman’s laidback writing style, he’s clearly writing all the CoT a test-time by hand.
@theK594 2 дня назад
This is fantastic work❤!
@HansKonrad-ln1cg День назад
for search it is important to search over ideas. not letters or tokens or words or sentences or paragraphs but ideas. so an llm needs to be able to output a token that says that it has finished laying out an idea, and thus a new idea can begin at this point. if an llm is constantly interrupted at the lower levels, it can never fully finish the idea. that would also help battle the combinatorial explosion that makes search on lower levels untreatable. its like a human chess player that only considers a few moves vs a brute force algorithm that considers millions of moves that are leading nowhere.
@srush_nlp 23 часа назад
Agreed. Lots of choices though in how to actually build that. Need steps that cause tangible progress.
@vaioslaschos День назад
That is awesome. It saved me lots of time. I am trying to use some of these techniques for the AIMO Kaggle contest. If anyone is interested drop me a message.
@DistortedV12 2 дня назад ⁺¹
Did he mention that they use reasoning tokens?
@srush_nlp 2 дня назад ⁺²
Oh no I forgot to mention that! In my notation the reasoning token is how you know to move from z to y. It's kind of implied by the color changing from green to red.
@wwkk4964 2 дня назад
Brilliant!
@DistortedV12 2 дня назад
I think not following from expert examples is a stretch. They could of helped finetune the CoT mechanism having people write out their thought processes while solving problems especially for math and coding. Edit: i see it addressed at 20:30
@srush_nlp 2 дня назад ⁺¹
Yeah I agree that there are expert examples somewhere in the training procedure. Wanted to emphasize that these play less of a role than I would have assumed before diving into this area (if you believe the OAI comments).
@tankieslayer6927 2 дня назад ⁺¹
@@DistortedV12 I think to achieve scale, the data has to be generated by the model itself via a step-by-step prompt, the correctness of the solution has to be easily verified. For example, the AIME problems have an integer solution between 0-999. One can then use process and advantage reward on such dataset.
@420_gunna 2 дня назад ⁺¹
Goat
@DistortedV12 2 дня назад
Thinking LLMs from Meta, LLM-Berry, ARC AGI paper from MIT on test time training. Can someone (a LLM) ideally Noam Brown or otherwise comment how these are related to what is discussed here?
@srush_nlp 2 дня назад
* Thinking LLMs is quite related. It uses an LLM as the verifier (I was emphasizing automatic verifiers in this talk.).
* LLM-Berry is an effort to do a MCTS style search on existing Llama models without learning.
* ARC-AGI paper that came out today seems really neat! They do SGD at test time, so pretty different than these methods that only do CoT at test time.
@DistortedV12 2 дня назад
@@srush_nlp thank you so much for responding to my questions! Very great talk / liked how you pointed out the core problem so other researchers can focus efforts
@tankieslayer6927 2 дня назад ⁺⁶
Test compute capability is still constrained by the data used for the RL training, which is harder to curate. You can give a D student an infinite amount of time on an exam and he is certainly not going to get an A.
@wwkk4964 2 дня назад ⁺²
Depends entirely on the verifier and the test.
@haiderameer9473 День назад
But synthetic data can solve this restraint. Just have increasingly more capable models create more synthetic data to allow further reinforcement learning, and so on.
@ambientswn День назад
@@haiderameer9473 No it doesn't as its still combinatorics at work, D -> A remains a challenge. No amount of recursive repetition of one domain over even seemingly infinite window of time will make you an expert in another that you know little about
@NerdCrusader 2 дня назад ⁺¹
Has to be process reward
@srush_nlp 2 дня назад ⁺¹
Yeah, it definitely seems like that is part of the equation. The question is whether that is everything.