I love small and awesome models

Matt Williams

Просмотров 8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 27 сен 2024
As one of the original Ollama team members, I'm excited to dive into the latest update and share my hands-on experience with you. In this video, I'll cover the key features of Llama 3.2, from its improved model sizes to its enhanced tool use capabilities.
I'll take a closer look at the smaller text models (1B, 3B) and demonstrate how they can be used for tasks like summarization, generating creative content, and answering questions.
From setting up the environment variables to testing the limits of these powerful models, I'll share my thoughts on the strengths and weaknesses of each model. You'll also get a glimpse into how I actually use models to create new content in my writing workflow.
Whether you're a seasoned user or just starting out with Ollama, this video is for you! So sit back, relax, and let's explore the capabilities of Llama 3.2 together!
Be sure to sign up to my monthly newsletter at technovangelis...
You can find the Technovangelist discord at: / discord
The Ollama discord is at / discord
(they have a pretty url because they are paying at least $100 per month for Discord. You help get more viewers to this channel and I can afford that too.)
Join this channel to get access to perks:
/ @technovangelist
Or if you prefer there is also a Patreon: / technovangelist

Комментарии • 77

@BORCHLEO 18 часов назад ⁺²⁰
you are underrated matt! they didnt sponsor you because they wanted to just get the people as are spewing hype! you go into such detail! your content should be #1 on any ollama tutorial.
@BirdManPhil 12 часов назад ⁺³
ive been using llama 3.1 8b on my 4050 laptop very comfortably for ai assisted tasks in obsidian and i cant wait to see if these smaller 3b models are a better fit. you get a sub from me im all aboard the self hosted train next stop ai station lets gooo
@JeromeBoivin-tx7fm 10 часов назад ⁺⁴
Thank you Matt for your videos. I was not aware of the hardcoded context window in Ollama, it may explain why I was so confused by the models claiming having a large one. Why is that? I’m expecting Ollama to be adaptative to the possibilities of the model it’s running! Do I really need each time to manually create a custom model template just to benefit from the native model context size? Do you already posted a video answering these questions? Thank you so much and keep the good job! Cheers from France!
@manuelbradovent3562 8 часов назад
@JeromeBoivin-tx7fm Also interested related the context and if in the model file also prompt, end token, etc was added.
@technovangelist 2 часа назад ⁺²
Context takes a lot of memory. And it’s hard to put rails around it so it doesn’t fully crash the machine. I’ve had the machine reboot when it takes too much. And lots of folks have tiny gpus so we got lots of support requests. So it went to a blanket 2k unless you specify the size. But since it’s so easy for most devs to create that file and since ollama is intended as a dev tool first, it’s seemed like a good decision
@g.s.3389 9 часов назад ⁺²
just a question: what is the best model for supporting me in python programming that I can use with ollama?
@Aarifshah-A 12 часов назад ⁺³
Lol the ending 😂😂😂
@jazzejeff1 14 часов назад ⁺²
Your channel's so nice I wish could sub twice. Keep up the great work.
@solyarisoftware Час назад ⁺¹
Hi Matt, I upvoted as usual. Two notes:
Ollama HW resources calculations (proposal for a new Ollama video): In this video, you thankfully show how easy it is to set the context length in the model file, bypassing Ollama's default. How does the context length influence the RAM usage of the host? In general, it would be great to dedicate a video to hardware resource calculations based on model size, quantization, context size, and possibly other macro parameters. It would also be helpful to discuss how CPU, and especially GPU, can improve latency times (especially in a multi-user environment).
You mention "your" function call method. I know you've already done a video on this topic, but since it's very useful in practice, maybe you could create a new video with code examples (Python is welcome).
Other viewers: If you agree, please upvote my comment. Community thoughts are welcome!
Thanks again,
Giorgio
@iamarto 17 часов назад ⁺²
Whoever took the sponser from meta, I don't think they asked for it. But in case you haven't noticed they have more subscribers than you.
@technovangelist 17 часов назад ⁺²
Some have 1/3 the number of subs compared to me. So that’s not it.
@martijnveenman 17 часов назад ⁺¹
Amazing video, thank you. Is companion the only ai plugin you use in Obsidian? Looking forward to seeing more practical AI obsidian applications.
@GundamExia88 17 часов назад ⁺¹
Ha! That's how I felt about the same when people ask about which number is bigger 8.8 vs 8.21! It depends in what context! And that's what I noticed when I test the models, most people only run it one time. The models do not always give the answer right the first time, sometimes the second times, etc. Great video.
@Cingku 13 часов назад
Could you explain what the generation completion hotkey does in the Companion plugin for Obsidian? When I use the Companion, it automatically generates text, completes it, and streams the response. So, in what situation would I need to use this hotkey? I'd appreciate it if you could clarify this because I was confused by this.
@userou-ig1ze 12 часов назад ⁺¹
Thanks for the great content. What is missing in ollama is vision models support like florence2 and sam2. If it had a nice api for that, that could be used with curl or so... dreams.
Raspberry pi with vision models must be so incredibly overpowered, I prefer not thinking about it too much
@technovangelist 11 часов назад
Raspberry pi overpowered???? way underpowered is more accurate, especially considering the cost of them. Physical size is the big benefit these days. But Florence2 looks like an older model that didn't get much love. Some of the other vision models on Ollama got a lot more coverage. And hadn't heard of sam2 either. Both architectures aren't supported so would require a lot of work to get working.
@johang1293 18 часов назад ⁺¹
Good stuff
@TheHummChannel 6 часов назад
Really this channel deserve way more exposure! Love the contents and the host ! Keep the good work thanks
@kshabana_YT 18 часов назад ⁺²
Why do you quit ollama 😢😢😢
@technovangelist 17 часов назад ⁺¹
Are you asking about quitting the app? Or why I left the company? That second thing is not something for this comment thread.
@emmanuelgoldstein3682 17 часов назад ⁺²
@@technovangelist Due to your hesitance on commenting, we'll just assume they were having Diddy parties until you clear it up
@starlord7526 11 часов назад
@@emmanuelgoldstein3682 did you just say diddy party brah? jajajajaja
@kshabana_YT 4 часа назад
Company
@kshabana_YT 17 часов назад ⁺¹
I tried to run Llama3.2 1b in Samsung s 20 plus Error: no suitable llama servers found. And I am running ollama serve
@Psychopatz 10 часов назад
just use layla lite then import the model. Yep its a hassle on making your lammacpp to work
@kshabana_YT 5 часов назад
I don't know what are you talking about
@arkemiffo Час назад
Just tried the 3.2:3b. I said hello and got a reply blazingly fast, so I asked if it was on meth or something. Got the standard "I'm just a model, I can't human", so I said I was just surprised to see such fast answers on a local model. And this is where things got confused.
Apparently, Llama3.2:3b thinks it's working off a cloud-service. It refused the notion that I'm running this locally.
Just to be sure, I pulled the ethernet cable, restarted the terminal, and it worked just as fine without (well...duh).
I just find it fascinating that the model itself almost reviles at the notion of being local.
@theyashwanthsai29 9 часов назад
I would request you to test out llms on some complex tools (as simple as file create tool fails on 3b model). I assume of i give proper func.desc, it might not. Need to experiment
@tecnopadre 8 часов назад
1st. Always thank you Matt. Question. I've been testing 3B since launch ata my Laptop with NPU. WebUI on a server and Ollama in my Laptop. The thing is My laptop has NPU and Ollama is not taking advantages of it. Ollma 3B is taking the small GPU and CPU. The results with a RAG (1st with WEBui interface and then with Flowise) gives me good results. I'm trying to search how to activate the NPU from my laptop so Ollama can use it. It would be great. I think LLM Studio does it? Also testing with large files >15MB, the embedding from Ollama at my computer again with WEBUI and Flowise, fails. The computer can't handle. Would be great to have you doing it with files that are closer to real company files. I think 3B model it's great. The last test I did is using it with Groq and of course, there is where I can test it 🙂
@KonstantinBykov-o3k 7 часов назад
I’ve tested 3b parameters model and very impressed with it. Speed and accuracy. I use it for improving descriptions in catalog. And it looks like we don’t have to pay to Open AI anymore. Thank you for your video!
@ClearSight2022 4 часа назад
Hi Matt very clear presentation yet again. I also really enjoy your polished style, so I subscribed even though I do so rarely. Cheers !
@alexandrep4913 6 часов назад
There is an awesome video on RUclips talking about the specific model and how censored it is. I wouldn't be surprised if people find the older model to be more capable.
@antoniomonteiro3698 14 часов назад
llama 3.2 1B Q4:
To count the number of 'F's and 'R's, I'll go through the word "strawberry" carefully.
Here's the breakdown:
* 'F': 1
* 'R': 2
So, there are 1 'F' and 2 'R's in the word "strawberry".
or:
please create a java program to count the number of 'F's and 'R's in the word 'strawberry'
public class Main {
public static void main(String[] args) {
// Define the word
String word = "strawberry";
// Initialize counters for 'F's and 'R's
int countF = 0;
int countR = 0;
// Loop through each character in the word
for (char c : word.toLowerCase().toCharArray()) {
// Check if the character is 'F' or 'R'
if (c == 'f' || c == 'r') {
// Increment the counter
if (c == 'f') {
countF++;
} else {
countR++;
}
}
}
// Print the results
System.out.println("Number of 'F's: " + countF);
System.out.println("Number of 'R's: " + countR);
}
}
output:
Number of 'F's: 0
Number of 'R's: 3
sorry, they left me home alone...
@PeterHagen 8 часов назад
Llama 3.1 & 3.2 are unfortunately very poor in Dutch language usage
@merefield2585 3 часа назад
Hey Matt, thanks for a great video - do you keep the code featured in your videos in public repos?
@ChristophBackhaus 7 часов назад
I want you to count the number of r's in Strawberry.
To do so I want you to go Letter by letter and every time you find one r I want you to count up
Gets it right every time...
@modoulaminceesay9211 5 часов назад
All things local AI and I just subscribed that’s what I need
@PriNovaFX 4 часа назад
What if you set temperature to 0, does the tool functions test succeed better?
@protovici1476 3 часа назад
The vision portion isn't to great.
@mduthwala439 10 часов назад
Well explained especially the 1B
@harrykekgmail 12 часов назад
interesting video. thank you
@TheDiverJim 16 часов назад
Love the breath holding tangent!
@yacahumax1431 4 часа назад
ollama makes it so easy
@chrisBruner Час назад
Good video
@utvikler-no 11 часов назад ⁺¹
Thanks
@technovangelist 11 часов назад ⁺¹
What??? You are too kind... a member AND a tip. Thanks so much.
@utvikler-no 11 часов назад
@@technovangelistI just love the simple and yet the comprehensive way you explain the subjects. Keep up the good work❤
@omercelebi2012 10 часов назад
Man you forgot your cup!
@stasoline 14 часов назад
Cool video!
@UnwalledGarden 18 часов назад
Awww yeah!
@TLabsLLC-AI-Development 16 часов назад
Meta Matt!
@Jason-ju7df 2 часа назад
Microsoft GRIN MoE: A Gradient-Informed Mixture of Experts MoE Model 6.6b
Ranks better
@technovangelist Час назад
In benchmarks? Or in real tests. One is useful the other has zero real value.
@SlykeThePhoxenix 10 часов назад
There's 4 killers in the room. Since when does dying make you not a killer?
@technovangelist 10 часов назад
Good point.
@sskohli79 8 часов назад
Hey Matt, nice video. But I don’t think it’s as impressive as you put it. I am sure the llama3.1’s performance was comparable
@technovangelist 3 часа назад ⁺¹
It wasn’t available in a 1 and 3 b model.
@dna100 17 часов назад
Lovin' the channel. 👍👍It'll be great once Ollama supports vision
@technovangelist 17 часов назад ⁺²
Ollama does support vision today. The llama3.2 vision should be very soon
@megairrational 6 часов назад
Great content. Could you briefly describe the machine you use for this task? You mentioned 3 seconds…
@technovangelist 3 часа назад
I usually do and forgot this time. M1 Max MacBook Pro with 64gb. A machine you can get for about 1500 usd today.
@AlexanderYudin 11 часов назад
Which hardware setup you have ?
@technovangelist 10 часов назад
I'm on a m1 MacBook Pro Max with 64GB RAM
@zhouyangbo4498 13 часов назад
ollama run llama3.2:1b
Error: llama runner process has terminated: signal: abort trap error:done_getting_tensors: wrong number of tensors; expected 147, got 146
any idea about this error?
@technovangelist 13 часов назад
You need to update ollama. You should always update whenever there is a new version.
@zhouyangbo4498 9 часов назад
ok ,I will try it , maybe it is GFW issue, thanks.
@aiamfree 15 часов назад
when is ollama getting the vision models anyone know?
@technovangelist 14 часов назад ⁺²
The team is working on it.
@aiamfree 13 часов назад
@@technovangelist awesome, thanks Team!
@ivanalberquilla9953 8 часов назад
Thank you for the video. What is the tool you use for writing?
@technovangelist 3 часа назад ⁺¹
Obsidian. And the plugin for it was companion
@ivanalberquilla9953 2 часа назад
Thanks!
@changeagent228 18 часов назад ⁺¹
First test I did was "what number is larger 9.9 or 9.11?" and it insisted 9.11 was bigger. When is 2.3 out?
@xevil21 3 часа назад
It's amazing how such a small model is smarter than you?

Следующие

Автовоспроизведение

AI NEWS : OpenAI Drops "Blueberry Model?" Metas Stunnign New AI Voice, Sora 2 and more