They Mixed Every small LLM Into One LARGE Expert!!!

1littlecoder

Просмотров 5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 окт 2024
The complementary potential of Large Language Models (LLM) assumes off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and tasks so that an ensemble of LLMs can achieve consistently better performance. Existing ensemble methods for LLMs mainly focus on reward model ranking of outputs, leading to significant computation overhead. To combat this issue, we revisit the complementary potential of LLMs and further elaborate it by mining latent expertise with off-the-shelf reward models. We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function, which can precisely distribute each query to the LLM with expertise about it. We also integrate a tag-based label enhancement to mitigate noise from uncertainty when using rewards as silver supervision. Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks. Zooter outperforms the best single model on average and ranks first on 44% of tasks, even surpassing multiple reward model ranking methods.
🔗 Links 🔗
arxiv.org/abs/...
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1lit...
🧭 Follow me on 🧭
Twitter - / 1littlecoder
Linkedin - / amrrs

Комментарии • 23

@timmygilbert4102 4 месяца назад ⁺⁴
Mixture of model, mom
@marcfruchtman9473 4 месяца назад ⁺²
Thanks for the video. (I appreciate the amount of effort and time it takes to make these, I left some comments in another spot -- regarding the paper).
@DamaruM 4 месяца назад ⁺²
good video. just an issue i am noticing recently. some if your videos are starting very abruptly skipping initial audio. it would be better if you can have a small delay or intro at the beginning.
@1littlecoder 4 месяца назад
Thank you for the feedback. I'm sorry about it. I'll keep that in mind every time I make a cut before the start.
@DannyGerst 4 месяца назад ⁺¹
Thanks for that Mixure of model I was planning something like that for translation. Struggeling with Routing and now you are presenting me the missing piece 🎉
@alokyadav-ye2xw 4 месяца назад ⁺¹
what video editing software are you using for animations?
@marcfruchtman9473 4 месяца назад
I don't really buy into this type of system. The main reason is that these systems can't possibly work on untrained data, since they have no way to arbitrate what a "correct" answer actually is. It might be ok if the answer is already known, but then for your particular query, you really don't know if they got it right or wrong. Sure, it can "guess".
Now, if the end-user submits to some sort of democratic query AI, where the results displayed include the most popular answers as well as the least popular, then it might make more sense. Ultimately, tho, as humans we have to look at any answer carefully to check if it maps to known truths. If an answer doesn't come out as expected, then you probably need to do some serious in depth analysis and deep query as to why the answer is not correlating with your prior knowledge. But to distill this down... just like if you ask 10 random people a survey, you will get different answers, and just because 2 or 3 agree, that doesn't mean that they got it right.
It might be useful if a query has been done that already matches a known "trained" answer... then the AI can have some confidence that the responder from the query routing received a known correct answer. This would mean that all the AI participants would have to submit an "Answer" with a confidence level, and probably their own version of the question that they answered. But the problem with that is there are very few well trained question - answer pairs versus the entire body of human knowledge available as a corpus.
@mickelodiansurname9578 4 месяца назад
Not sure its as fine grained as all that.... I think its training is on the knowledge of a given number of models... maybe OpenChat is in there and they base the ranking on having given it thousands of inputs and measured thousands of outputs so OpenChat is now a model the zooter can use.... but other models like maybe DBRX it has no training or meaning to the Zooter cos it has no previously trained score... you'd have to wait for them to update the Zooter model to use DBRX as one of your models to use?, or if its open source then find out what the training methodology is and fine tune it on DBRX too? Am I making sense here? What it might allow for though is us creating our own FAUX Moe type models from 5 or 10 or 20 or even 100 models including image and audio models. Would this allow us to run just the zooter locally with API endpoints to the live models in the cloud? If thats the case it has a serious use case, albeit one with horrendous latency.
@KevinKreger 4 месяца назад
It certainly improves results, but it's not very flexible, and it is a lot like calling a tool. I look at it this way. I can give my GPT4 based LLM agent 'tools', including other LLMs with known strengths to call as tools. So, yes, you have an LLM rather than such a small router, but if you are going to let your LLM have a stab at picking a tool then you get it for free.
@mickelodiansurname9578 4 месяца назад
I think though this removes the need for the GPT4 to decide.... cos removing the GPT4 to decide is a big thing... it would drop inference costs a lot!
@axe863 4 месяца назад ⁺⁴
Stacked Sparse Ensembling ... we meet again
@1littlecoder 4 месяца назад
Haha
@zaidshaikh010 4 месяца назад
this one is better👍
@reynoldoramas3138 4 месяца назад ⁺¹
Just to ask, they are training a simple seq classifier based on labeled data, what is the novelty here?
@1littlecoder 4 месяца назад ⁺¹
I don't think it's got any particular novelty except them showing it actually works vs other RMR models. I could be wrong. I'm recently leaning into Routers and that's how decided to cover this.
@reynoldoramas3138 4 месяца назад
@@1littlecoder ok thanks, it is a good approach, but sometimes authors just make some cool names jajajaja. I love your channel, following for more than a year now, awesome. Greetings from Cuba.
@mickelodiansurname9578 4 месяца назад
Okay so I'm not entirely sure here, let me spit this brain fart out. To my understanding what this means is that so long as the model you plan to use within the ensemble has been a part of the ranking training of the zooter.... then you can use any assemblage of OS models you like, preferably choosing them for their skills in a given domain? So I could have a folder with maybe 20 models in there and so long as they are inclulded as part of the zooters training that model 'might' be used depending on the prompt given? Likely then different of better trained Zooters would come along and they sit in front of your model assemblage, which could be prompted individually but instead you prompt the zooter, its then decides and you get your output from whatever the zooter decided. Am I right here?
@emmanuelkolawole6720 4 месяца назад
But they are not talking about inference speed. Because if there are two models controlling the prompt direction and response ranking, that is added inference delays. Zooter and ranker are good ideas but we need to see how it performs in real world
@joser100 4 месяца назад
Is Kraken based on this model (using Zooter i mean) or something else all together?
@user-wr4yl7tx3w 4 месяца назад
as a suggestion, can you do a video where you can elaborate on the Zooter method
@paulogabrielcayres9245 4 месяца назад
Preciso de tutorial sobre sumarização multivideo, poderia me ajudar?
@joaops4165 4 месяца назад
Parece que o modelo da Google consegue resumir vídeos, da uma pesquisada por "Summarize a video file with audio with Gemini 1.5 Pro", tem alguns códigos de exemplo que não parece muito difícil.

Следующие

Автовоспроизведение