Will have to check the whole video later. But I think IBM has had a somewhat similar paper recently. about the training rate changing based on epoch/mini batch performance on the benchmark or something. It's called scheduler something
what if we use monty carlo tree search on tree of thought llms then we just keep the highest quality output and train a new foundation model on that synthetic data and repeat until asi
Equation 1 just serves as a theoretical foundation for the "compute-optimal" concept but it cannot be directly used for optimization because: Intractability: Finding the truly optimal hyperparameters θ across all possible prompts and compute budgets a*(q) would require an exhaustive search..... Unknown Ground Truth: In a real-world setting, we don't know the ground-truth correct answer y*(q) for unseen prompt, so directly optimizing the indicator function is impossible.
I think wath you want. When a kid see you put one apple than put one more he will answer we have 2. So we write 1+1=2. Then he will take notation always as true wthitout recall the apple video. This mean some training need 2 module, video then video-notation asociotion. And probable use notation is 3 step. My noob opinion.
Because being able to do arithmetic is a good indicator of being able to reason. We want LLMs to be good reasoners because a lot of tasks in the real world will require LLMs and soon AI agents to reason like a human can.
Python is just dead end pathway. One guy on RUclips writes neural network in Assembly low-level language and it's 500 times faster than Pytorch on 1 CPU core on one same task. We need full rewrite of networks and models.
I tried reading this paper three times but then decided it would have been more optimal if they doubled the number of scientists writing it…
lol same
They didn't share any code 🔴❌️
He's alive!
Long time no see
Interesting paper
Are we sure a* is not a type-o that should have been y*?
Also, best of weighted N beam majority?
My goat is back
Will have to check the whole video later. But I think IBM has had a somewhat similar paper recently. about the training rate changing based on epoch/mini batch performance on the benchmark or something. It's called scheduler something
what if we use monty carlo tree search on tree of thought llms then we just keep the highest quality output and train a new foundation model on that synthetic data and repeat until asi
Sounds like a promising approach and I think its reasonably close to what the big labs are planning to do
People have already done this
Or just use something similar to Thinker:learning to plan and act to kinda (predict) a few tokens ahead which might increase quality
Oracle to guide and reach asi required.
Equation 1 just serves as a theoretical foundation for the "compute-optimal" concept but it cannot be directly used for optimization because:
Intractability: Finding the truly optimal hyperparameters θ across all possible prompts and compute budgets a*(q) would require an exhaustive search.....
Unknown Ground Truth: In a real-world setting, we don't know the ground-truth correct answer y*(q) for unseen prompt, so directly optimizing the indicator function is impossible.
Wake up, babe. New Yannic video just dropped.
Isn't beam search done per token? Why does yannic say that they grade the answers?
Nice
21:48 What can be unburdened by what has been
he's the best
Please the news back!
It can't be, a new paper that's not 98% marketing wank? Is the world healing, brothers
I think wath you want. When a kid see you put one apple than put one more he will answer we have 2. So we write 1+1=2. Then he will take notation always as true wthitout recall the apple video. This mean some training need 2 module, video then video-notation asociotion. And probable use notation is 3 step. My noob opinion.
Why in the name of all that's holy are we asking an LLM to do arithmetic?? 😭
Because being able to do arithmetic is a good indicator of being able to reason. We want LLMs to be good reasoners because a lot of tasks in the real world will require LLMs and soon AI agents to reason like a human can.
Because not all of us are interested in roleplay slop
Python is just dead end pathway. One guy on RUclips writes neural network in Assembly low-level language and it's 500 times faster than Pytorch on 1 CPU core on one same task. We need full rewrite of networks and models.
Please tell me who made that. It seems so interesting
Also yeah, C or C++ is better for actually useful and fast models, python is good for modularity and prototyping but god it is so fucking slow
Wat? 99 percent of training is done on gpu which is already cpp
@biomerl Yeah sorry I dont have much knowledge on low level ML
@@scoffpickle9655easiest starting place is search youtube for matrix multiplication with cuda (basically just c code)
200 views in 15 minutes. Bro fell off