These small models are not only good for low memory situations but also where you can have multiple models run at once. Work is being done where you can run 405B by loading and unloading layers (epochs) in small memory configurations to run more advanced models much slower and run these small models for routing and interactivity at the same time. All this could be done locally in situations where you don’t want to send the data it is working with (like personal information) off the device.
Yes please a video on fine tuning these models would be awesome. Also videos showing the tiny models running on edge devices and or in browser would be super cool as well.
Thank you for this quick update Sam! BTW, "QWen" should probably be pronounced as "qian wen" in original Chinese with the hidden meaning of "capable of answering to thousands of questions". 😀
Another obvious use case for the mini models is moderation. APIs like OpenAI require you make a moderation call before making the inference call which means two round trips to the server before you get any content you can show to the user. If you can do moderaion on device, then you only need one round trip, making your realtime chats appear faster to the user. Moderation, routing, summarization = mini models for the win.
A video on how to finetune these small models would be great! By the way, being from Denmark I always test these models in Danish as well as in English. Llama 3.2 3B is by far the best small, multilingual model I have tested - far better than Gemma 2 2B!
ohh that is super interesting to know. Is Danish one of the 8-9 prioritized languages or is it just getting better at European languages in general I wonder.
@@samwitteveenai It appears it doesn't understand some language rules or I am using too small models - tried o1-mini:latest / DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.i1-Q4_K_M.gguf:latest / Qwen2.5-14B_Uncencored-Q6_K_L.gguf:latest . I.e. I asked all to write me 4 verse poems in Polish about "Bocian" . It does create some correct lines but in the middle it mixes wrong words here and there and most of the time it doesn't make sense like it would be saying a story of sort. Here o1 mini : Bocian wysoki, z wody unosi się swobodnie, Czerwone dzióbki biały kaptur trzyma. Lecąc lecieli nad pól i lasów brzegi, Piekne słońce oświetla mu skrzydła jak diamenty." "Lecąc lecieli" sounds bad :) It's like "Flying they flew ...."same word repeated. However I think this one is quite good compared to the other output 3/4 actually.
@@samwitteveenai I feel the need to clarify that its abilities are, of course, no where near what it is in English. But it is the first small language model I have tried, that is able to produce a Danish summary of a Danish text, which is mostly correct and coherent. It does still suffer from making up words (I think it sometimes confuses Danish with Swedish and Norwegian), but gemma 2 and other models are much worse in this regard. Also, its knowledge regarding Denmark is very limited - as you would expect for such a small model, I suppose. If for example I ask it to list the last 5 prime ministers of Denmark it only knows the current one and hallucinates the rest. When asking it to list the last 5 governors of any US state, I find that it typically gets 4-5 right.
I looked up both these languages and they aren't in their main multilingual priority languages. Speaking to a friend they pointed out that there aren't huge amounts of Facebook users there, so that might be a reason. Meta themselves are benefiting from all the data they have for training etc. I think it also prioritizes some of their training decisions
It kind of makes me sad that meta trained llama two on audio and pictures and made it where I can output, audio and pictures, and then Nerfed the model removed the decoders for “safety” reasons. And released it even though L3 was already out, and now they are using that llama three version of the model on their app where you can talk to it, as if it was GPT4 Omni.
These small models are not only good for low memory situations but also where you can have multiple models run at once. Work is being done where you can run 405B by loading and unloading layers (epochs) in small memory configurations to run more advanced models much slower and run these small models for routing and interactivity at the same time. All this could be done locally in situations where you don’t want to send the data it is working with (like personal information) off the device.
Very good point about multiple models, totally agree.
Yes please a video on fine tuning these models would be awesome. Also videos showing the tiny models running on edge devices and or in browser would be super cool as well.
This is great news! Can't wait to start using the lightweight models.
Thank you for this quick update Sam! BTW, "QWen" should probably be pronounced as "qian wen" in original Chinese with the hidden meaning of "capable of answering to thousands of questions". 😀
lol I tried to pronounce it like their devrel guy does. Is there an audio some where I can hear it ?
Hey Sam, Great video 👌
Will be waiting for fine-tuning 1b json in and out
yeah thats a good use case.
No intro no music right to the point amazing work Sam.I wish to know your opinion about unsloth ?
I love unsloth. Its a simple but good way for people to do LoRAs
Nice video! It would be awesome if you can make a video of how to fine tune these small models.
11 and 90B make since because it's 3b and 20B vision parameters respectively? That's what I would guess right off the bat.
is there is a multimodal llm can fine-tuning for sentiment analysis from text, image, video and audio ?
Another obvious use case for the mini models is moderation. APIs like OpenAI require you make a moderation call before making the inference call which means two round trips to the server before you get any content you can show to the user. If you can do moderaion on device, then you only need one round trip, making your realtime chats appear faster to the user.
Moderation, routing, summarization = mini models for the win.
A video on how to finetune these small models would be great! By the way, being from Denmark I always test these models in Danish as well as in English. Llama 3.2 3B is by far the best small, multilingual model I have tested - far better than Gemma 2 2B!
they all kinda fail in Polish :D but well, in english it's quite nice
ohh that is super interesting to know. Is Danish one of the 8-9 prioritized languages or is it just getting better at European languages in general I wonder.
@@samwitteveenai It appears it doesn't understand some language rules or I am using too small models - tried o1-mini:latest / DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.i1-Q4_K_M.gguf:latest /
Qwen2.5-14B_Uncencored-Q6_K_L.gguf:latest . I.e. I asked all to write me 4 verse poems in Polish about "Bocian" . It does create some correct lines but in the middle it mixes wrong words here and there and most of the time it doesn't make sense like it would be saying a story of sort. Here o1 mini : Bocian wysoki, z wody unosi się swobodnie,
Czerwone dzióbki biały kaptur trzyma.
Lecąc lecieli nad pól i lasów brzegi,
Piekne słońce oświetla mu skrzydła jak diamenty."
"Lecąc lecieli" sounds bad :) It's like "Flying they flew ...."same word repeated. However I think this one is quite good compared to the other output 3/4 actually.
@@samwitteveenai I feel the need to clarify that its abilities are, of course, no where near what it is in English. But it is the first small language model I have tried, that is able to produce a Danish summary of a Danish text, which is mostly correct and coherent. It does still suffer from making up words (I think it sometimes confuses Danish with Swedish and Norwegian), but gemma 2 and other models are much worse in this regard.
Also, its knowledge regarding Denmark is very limited - as you would expect for such a small model, I suppose. If for example I ask it to list the last 5 prime ministers of Denmark it only knows the current one and hallucinates the rest. When asking it to list the last 5 governors of any US state, I find that it typically gets 4-5 right.
I looked up both these languages and they aren't in their main multilingual priority languages. Speaking to a friend they pointed out that there aren't huge amounts of Facebook users there, so that might be a reason. Meta themselves are benefiting from all the data they have for training etc. I think it also prioritizes some of their training decisions
It kind of makes me sad that meta trained llama two on audio and pictures and made it where I can output, audio and pictures, and then Nerfed the model removed the decoders for “safety” reasons. And released it even though L3 was already out, and now they are using that llama three version of the model on their app where you can talk to it, as if it was GPT4 Omni.
Can you train a model with a new conputer language
1B model is fast 😀👍
How much vram do you need?