Improving LLM accuracy with Monte Carlo Tree Search

Trelis Research

Просмотров 13 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 6 ноя 2024

Комментарии • 61

@KopikoArepo 4 месяца назад ⁺⁴
Beautiful. Just like us. the more we fail, the better. Explore vs. Exploit. I love humanity. ❤
@9tongagi 2 месяца назад
The Monte Carlo method surely approach the better result of the probabilistic model, but costs are really high. No matter what, good job for the clear explanation👍
@TrelisResearch 2 месяца назад
yeah it's true
@KarlLew 4 месяца назад ⁺¹
Me with Tarot Cards till I get the answer I like. But seriously, MTS seems like a formal way to structure an extended interaction with a user. MTS feels a lot like what I do when I use Google AI Search as I barrage it with a cloud of different prompts when searching for a particular piece of knowledge for which I may not know the conventional terminology. in other words, the internediate answers provide information for prompt refinement. For example, I once started with “nitrogen in soil” and ended up with “soil nitrification”, which was the prompt that gave the knowledge I sought. Thanks for the vid!
@TrelisResearch 4 месяца назад ⁺¹
yeah, I think that's right
@_paixi 4 месяца назад
Fascinating paper and excellent demonstration. Llama3-8B can answer some difficult math and coding problems using this that the top open-source models fail to do with a direct answer. The first thing I noticed was it games the rating response by pretending to run unit tests that pass. Adding to the critique prompt it was a written test and the answerer had no access to a computer to run tests fixed that and it has started solving some easy ARC-AGI tasks I couldn't get proprietary models to solve.
@TrelisResearch 4 месяца назад
Thanks! Yeah seems to help a bit with ARC
@nashvillebrandon 4 месяца назад ⁺³
You're the exact person I was hoping would make a video in this after I read that paper. Could this technique be enhanced even further with retrieval?
@unclecode 4 месяца назад ⁺¹
Fascinating! This morning, I posted on X about MCTS and this paper, and later, RUclips showed me your video. Such a great coincidence. I found the coefficient C in the UCT formula for balancing exploration and exploitation really interesting. I experimented with different settings and even made it random like temperature. The results are intriguing-might share the repo and a video soon.
I wonder what would happen if we built a neural network like MoE but with this MCTS structure and trained it. Would it train while searching and reasoning? Could it generate a model far better at reasoning? What do you think? Anyway, kudos to you-you're right on track and well updated as usual.
@TrelisResearch 4 месяца назад
Yeah that’s interesting regarding training. I’d have to think more deeply.
@waneyvin 4 месяца назад ⁺⁵
great job, it's literally manual reinforcement learning!🤣🤣🤣
@kunalsuri8316 4 месяца назад ⁺⁴
Holly molly!! I was just reading today about how MCTS can be used to improve LLMs. Are you reading minds now?
@marilynlucas5128 4 месяца назад
😂 I’ll tell you. The RUclips algorithm is very spooky. It can almost read your mind. I call it God’s mind.
@miladmirmoghtadaei5038 4 месяца назад
Thanks man. Great intro to MCTS. What I am curious about is why we do a random selection among the first generation and not have it rate that one and select the best answer from the root.
@TrelisResearch 4 месяца назад
That may also work.
You do want your initial seeds to be very random just so you have the tree searching a wide search space. If you just start with one rated answers, all following derivatives will be a bit similar, which limits scope.
@MasamuneX 4 месяца назад ⁺¹
A potental improvement is to have a dynamic child node amount based on the rating also the weight defining exploration vs exploitation could be dynamically set too maybe even by the llm filling in more than just a score
also the backprop of the ratings is cool but there could be some decay so that nodes wayyy up on the tree dont get super locked in if you're doing a tree that is 8 layers deep
@TrelisResearch 4 месяца назад ⁺¹
Good thoughts.
Yeah probably using the rating/assessment is smart because right now that info isn't incorporated into generating responses (it's only used for UCT).
UCT basically decays away from exploration as more experiments are run and focuses more on exploitation. This broadly makes sense, but yes, possibly this should be tuned.
@LairdForet 27 дней назад
Great video - Thanks!
@free_thinker4958 4 месяца назад
Thanks a lot man 😎👏❤️ we would like you to devote a future video to talk about the CLIN paper to build a self improving language agents
@TrelisResearch 4 месяца назад
will take a look, thanks
@hcm9999 Месяц назад
Question: given enough time, can MCTS (Monte Carlo Tree Search) find the best solution?
The problem about MCTS is that it chooses the child node with the highest probability of having a solution.
As long as those probabilities don't change, MCTS will choose the same node, no matter how many iterations you perfom. That means some leaves (terminal nodes) are unreachable.
If the best solution happens to be in an unreachable leaf, MCTS will never find it.
@TrelisResearch Месяц назад ⁺¹
Howdy. Two Qs there.
The score in MCTS combines an explore and exploit portion, so it won’t get stuck . Typically the explore term is set to decay over time. So yes, you could get stuck by the end. You could of course remove or reduce that decay term.
In theory yes mcts could find the solution given infinite time (whether it knows that’s the solution depends on whether the problem is verifiable). Of course, infinite time might be too long!
@ПетрФомин-щ9ж 4 месяца назад ⁺²
Isn't this the Q-Star algorithm we've been dreaming of?
@free_thinker4958 4 месяца назад
It's kinda like that
@tonyppe 4 месяца назад ⁺¹
subscribed, very interesting. good work on explaining it :)
@TrelisResearch 4 месяца назад
Thanks! Appreciate it
@tullyfisher 4 месяца назад ⁺¹
Hey, are you still doing things on patent-me? No content on the page (?)
@TrelisResearch 4 месяца назад ⁺¹
I closed it down actually.
@tullyfisher 4 месяца назад
@@TrelisResearch Thanks very much for the info! :)
@Saurabh5228 4 месяца назад ⁺²
How does it differ from the tree of thoughts prompting?
@TrelisResearch 4 месяца назад ⁺⁷
They are related ideas! Tree of thought typically involves a more deterministic approach to where to go next in the tree, whereas Monte Carlo is based on probabilistic evaluation (with back-propagation in this case of rewards to all parent nodes). It's a bit confusing making clear distinctions between both here because - if temperature is 1 - and you are trying multiple samples, there is a probabilistic element to tree of thought and to the approach taken here.
More on Tree of Thought here: www.promptingguide.ai/techniques/tot
If I had to boil it down, two key elements here are:
1. Backpropagating rewards from each evaluation through the parent nodes (which has the benefit of adding probabilistic information on the strength of the parent nodes)
2. Using UCT, which is a specific approach for balancing exploration versus exploitation.
@aissabakhil1696 4 месяца назад
thank you , can you make an explanation about gguf quantization and how to convert custom multimodal to gguf
@TrelisResearch 4 месяца назад ⁺¹
ooh, multi-modal I haven't looked at, but check out the quantization video from last year in the fine-tuning playlist from Trelis
@aissabakhil1696 4 месяца назад
Thank you
@saurabhkram007 Месяц назад
Did you explain the PromptAgent paper in just half an hour?
@TrelisResearch 29 дней назад
have to say I didn't read that paper but now that I see it, yes, there are similarities. Probably if I read the paper in full there is quite a bit more than what I did in this video
@andrew_moffat 4 месяца назад
yeahhhh perfect explanation, thank you bro
@r.s.e.9846 4 месяца назад
Thanks! How could we improve this with a compiler, search or some form of symbolic reasoning?
@TrelisResearch 4 месяца назад ⁺¹
Yup, will see if I can get a video on that live soon
@scscyou 4 месяца назад
How do we integrate this as part of our AI client, for example when running local server with a web-based UI? Are there any complete, packaged solutions?
@TrelisResearch 4 месяца назад ⁺²
Unless using groq (and even using groq) this is going to be slow for low latency applications. You would need to wrap all of this as its own server and then hit that endpoint, definitely a bit more work
@scscyou 4 месяца назад
@@TrelisResearch Of course, I'm assuming use cases where we want accuracy. For example, to batch multiple questions and then get back in an hour to see the best responses to all of them. Preferably with an Agents workflow, where LLM can talk to itself to iterate over a solution (like a code); and with the ability to invoke external tools (compilation, browser, calculator...). Ollama server (containerized for security & compatibility) could be the best starting point, but what we need is a user-friendly way to use everything at once, like a module. Using Jupyter notebooks to run a custom python code fragment is an impossibly steep curve for many of us who need to use such AI for practical purposes (e.g. I don't really work in Python, but I could definitely invoke a local API algorithmically)
@ghrasko 4 месяца назад
Hi, at 2m13s you say that the essence of Monte Carlo method is that it's programatically changes the prompts (instead of doing it manually). As I see, it is not at all doing it. It is NOT refining the prompt. Am I misunderstanding something?
@TrelisResearch 4 месяца назад
I just depends how you define prompt. I mean prompt as the full input to the LLM. This gets updated because part of the full prompt is the draft answer. You’re correct that the instruction set is pre-defined.
@nathandfox 4 месяца назад
Why didn't the author try using MCTS + GPT4 to see if it can improve even at that level?
@TrelisResearch 4 месяца назад
That's a good suggestion and I don't know why. Perhaps they just wanted to get the paper published and llama 3 showed what they needed. Also, there's the question of what to compare MCTS + GPT4 to? The benchmarks tested are saturated (true, they could have tried something like ARC).
@pooascyrous5722 4 месяца назад
any idea how we can use this idea in finance and trading decisions?
@TrelisResearch 4 месяца назад
Well yeah you can use it as a way to prompt an LLM to make a prediction and refine that prediction. I'll see if I can make a quick vid on that at some point.
@ravenecho2410 4 месяца назад
U mean the early chess algos?
@avwie132 4 месяца назад
In other words: keep juggling until you get proper results
@TrelisResearch 4 месяца назад ⁺¹
Yeah, kind of, for better or worse. Although Monte Carlo is (assuming a good evaluator) much better than random juggling.
@pensiveintrovert4318 4 месяца назад ⁺⁶
You may get a better answer. You can't possibly know if the answer is the best answer. Don't lie to yourself. Even identifying the better answer is not easy unless others are obviously wrong.
@TrelisResearch 4 месяца назад ⁺⁶
Yeah, absolutely agreed, and tried to make that clear in the vid. But accept if it’s not clear enough!
@KCM25NJL 4 месяца назад ⁺¹
While this is technically true, it does in fact make for at least some semblance of hierarchical self-reflection on a longer timeframe than doing simple X-shot CoT prompting..... which is a step closer to system 2 thinking than we have. Of course while this implementation is rudimentary and has it's noted cons, I'm almost certain it's a step in the right direction. I'd personally like to see this used as a method of generating synthetic data in a way that gives at least a statistical improvement(via fine-tuning) in prompt or prompt-chain answers from smaller LLM's.
@TrelisResearch 4 месяца назад
@@KCM25NJL I suppose techniques like SPIN show that this is possible, and probably MCTS is a pareto improvement over that.
All of that said, my gut feel is that - for pre-training:
- language models are perhaps more useful for filtering data (see fineweb edu) than for generating synthetic data, unless you are using a powerful model to train a smaller model.
@KCM25NJL 4 месяца назад
@@TrelisResearch Yes indeed the frontier to smaller model for top down refinement was my thinking. If Llama 3 8B with MCTS can achieve similar math scores as GPT 4o, the biggest highlight for me is not one of capability, but of efficiency.
@goodtothinkwith 4 месяца назад
That’s true for people too. We can’t be disappointed with not getting absolute certainty
@jamesbrown6591 4 месяца назад
Not here for advice, just a sucker for slides with whimsical shaky font
@TrelisResearch 4 месяца назад
Whimsical is best
@ShanyGolan 4 месяца назад
The idea has been done before. So its not new
@TrelisResearch 4 месяца назад
True that Monte Carlo is not new. I'm unsure whether the paper (applying it this way to LLMs) is new or not (I didn't feel I gave a strong opinion on that in the vid). If you have a link of a previous paper doing what this paper does, could you post it here? Cheers

Следующие

Автовоспроизведение

LLM Tool Use - GPT4o-mini, Groq & Llama.cpp