Llama 3.2 goes Multimodal and to the Edge

Sam Witteveen

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 15 ноя 2024

Комментарии • 24

@toadlguy Месяц назад ⁺⁵
These small models are not only good for low memory situations but also where you can have multiple models run at once. Work is being done where you can run 405B by loading and unloading layers (epochs) in small memory configurations to run more advanced models much slower and run these small models for routing and interactivity at the same time. All this could be done locally in situations where you don’t want to send the data it is working with (like personal information) off the device.
@samwitteveenai Месяц назад
Very good point about multiple models, totally agree.
@comfixit Месяц назад
Yes please a video on fine tuning these models would be awesome. Also videos showing the tiny models running on edge devices and or in browser would be super cool as well.
@i_accept_all_cookies Месяц назад
This is great news! Can't wait to start using the lightweight models.
@chenqu773 Месяц назад
Thank you for this quick update Sam! BTW, "QWen" should probably be pronounced as "qian wen" in original Chinese with the hidden meaning of "capable of answering to thousands of questions". 😀
@samwitteveenai Месяц назад
lol I tried to pronounce it like their devrel guy does. Is there an audio some where I can hear it ?
@aminzarei1557 Месяц назад
Hey Sam, Great video 👌
Will be waiting for fine-tuning 1b json in and out
@samwitteveenai Месяц назад
yeah thats a good use case.
@ibrahimhalouane8130 Месяц назад
No intro no music right to the point amazing work Sam.I wish to know your opinion about unsloth ?
@samwitteveenai Месяц назад ⁺¹
I love unsloth. Its a simple but good way for people to do LoRAs
@autoflujo Месяц назад
Nice video! It would be awesome if you can make a video of how to fine tune these small models.
@SirajFlorida Месяц назад ⁺³
11 and 90B make since because it's 3b and 20B vision parameters respectively? That's what I would guess right off the bat.
@IsmailIfakir Месяц назад
is there is a multimodal llm can fine-tuning for sentiment analysis from text, image, video and audio ?
@IvarDaigon Месяц назад
Another obvious use case for the mini models is moderation. APIs like OpenAI require you make a moderation call before making the inference call which means two round trips to the server before you get any content you can show to the user. If you can do moderaion on device, then you only need one round trip, making your realtime chats appear faster to the user.
Moderation, routing, summarization = mini models for the win.
@jmspat14b Месяц назад ⁺¹
A video on how to finetune these small models would be great! By the way, being from Denmark I always test these models in Danish as well as in English. Llama 3.2 3B is by far the best small, multilingual model I have tested - far better than Gemma 2 2B!
@pozytywniezakrecony151 Месяц назад
they all kinda fail in Polish :D but well, in english it's quite nice
@samwitteveenai Месяц назад
ohh that is super interesting to know. Is Danish one of the 8-9 prioritized languages or is it just getting better at European languages in general I wonder.
@pozytywniezakrecony151 Месяц назад
@@samwitteveenai It appears it doesn't understand some language rules or I am using too small models - tried o1-mini:latest / DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.i1-Q4_K_M.gguf:latest /
Qwen2.5-14B_Uncencored-Q6_K_L.gguf:latest . I.e. I asked all to write me 4 verse poems in Polish about "Bocian" . It does create some correct lines but in the middle it mixes wrong words here and there and most of the time it doesn't make sense like it would be saying a story of sort. Here o1 mini : Bocian wysoki, z wody unosi się swobodnie,
Czerwone dzióbki biały kaptur trzyma.
Lecąc lecieli nad pól i lasów brzegi,
Piekne słońce oświetla mu skrzydła jak diamenty."
"Lecąc lecieli" sounds bad :) It's like "Flying they flew ...."same word repeated. However I think this one is quite good compared to the other output 3/4 actually.
@jmspat14b Месяц назад
@@samwitteveenai I feel the need to clarify that its abilities are, of course, no where near what it is in English. But it is the first small language model I have tried, that is able to produce a Danish summary of a Danish text, which is mostly correct and coherent. It does still suffer from making up words (I think it sometimes confuses Danish with Swedish and Norwegian), but gemma 2 and other models are much worse in this regard.
Also, its knowledge regarding Denmark is very limited - as you would expect for such a small model, I suppose. If for example I ask it to list the last 5 prime ministers of Denmark it only knows the current one and hallucinates the rest. When asking it to list the last 5 governors of any US state, I find that it typically gets 4-5 right.
@samwitteveenai Месяц назад ⁺²
I looked up both these languages and they aren't in their main multilingual priority languages. Speaking to a friend they pointed out that there aren't huge amounts of Facebook users there, so that might be a reason. Meta themselves are benefiting from all the data they have for training etc. I think it also prioritizes some of their training decisions
@Nick_With_A_Stick Месяц назад ⁺¹
It kind of makes me sad that meta trained llama two on audio and pictures and made it where I can output, audio and pictures, and then Nerfed the model removed the decoders for “safety” reasons. And released it even though L3 was already out, and now they are using that llama three version of the model on their app where you can talk to it, as if it was GPT4 Omni.
@nosuchthing8 22 дня назад
Can you train a model with a new conputer language
@proflead Месяц назад
1B model is fast 😀👍
@nosuchthing8 22 дня назад
How much vram do you need?

Следующие

Автовоспроизведение

NEW - Anthropic Updated Claude Models & Computer Use Agents!!