I love small and awesome models
HTML-код
- Опубликовано: 27 сен 2024
- As one of the original Ollama team members, I'm excited to dive into the latest update and share my hands-on experience with you. In this video, I'll cover the key features of Llama 3.2, from its improved model sizes to its enhanced tool use capabilities.
I'll take a closer look at the smaller text models (1B, 3B) and demonstrate how they can be used for tasks like summarization, generating creative content, and answering questions.
From setting up the environment variables to testing the limits of these powerful models, I'll share my thoughts on the strengths and weaknesses of each model. You'll also get a glimpse into how I actually use models to create new content in my writing workflow.
Whether you're a seasoned user or just starting out with Ollama, this video is for you! So sit back, relax, and let's explore the capabilities of Llama 3.2 together!
Be sure to sign up to my monthly newsletter at technovangelis...
You can find the Technovangelist discord at: / discord
The Ollama discord is at / discord
(they have a pretty url because they are paying at least $100 per month for Discord. You help get more viewers to this channel and I can afford that too.)
Join this channel to get access to perks:
/ @technovangelist
Or if you prefer there is also a Patreon: / technovangelist
you are underrated matt! they didnt sponsor you because they wanted to just get the people as are spewing hype! you go into such detail! your content should be #1 on any ollama tutorial.
ive been using llama 3.1 8b on my 4050 laptop very comfortably for ai assisted tasks in obsidian and i cant wait to see if these smaller 3b models are a better fit. you get a sub from me im all aboard the self hosted train next stop ai station lets gooo
Thank you Matt for your videos. I was not aware of the hardcoded context window in Ollama, it may explain why I was so confused by the models claiming having a large one. Why is that? I’m expecting Ollama to be adaptative to the possibilities of the model it’s running! Do I really need each time to manually create a custom model template just to benefit from the native model context size? Do you already posted a video answering these questions? Thank you so much and keep the good job! Cheers from France!
@JeromeBoivin-tx7fm Also interested related the context and if in the model file also prompt, end token, etc was added.
Context takes a lot of memory. And it’s hard to put rails around it so it doesn’t fully crash the machine. I’ve had the machine reboot when it takes too much. And lots of folks have tiny gpus so we got lots of support requests. So it went to a blanket 2k unless you specify the size. But since it’s so easy for most devs to create that file and since ollama is intended as a dev tool first, it’s seemed like a good decision
just a question: what is the best model for supporting me in python programming that I can use with ollama?
Lol the ending 😂😂😂
Your channel's so nice I wish could sub twice. Keep up the great work.
Hi Matt, I upvoted as usual. Two notes:
Ollama HW resources calculations (proposal for a new Ollama video): In this video, you thankfully show how easy it is to set the context length in the model file, bypassing Ollama's default. How does the context length influence the RAM usage of the host? In general, it would be great to dedicate a video to hardware resource calculations based on model size, quantization, context size, and possibly other macro parameters. It would also be helpful to discuss how CPU, and especially GPU, can improve latency times (especially in a multi-user environment).
You mention "your" function call method. I know you've already done a video on this topic, but since it's very useful in practice, maybe you could create a new video with code examples (Python is welcome).
Other viewers: If you agree, please upvote my comment. Community thoughts are welcome!
Thanks again,
Giorgio
Whoever took the sponser from meta, I don't think they asked for it. But in case you haven't noticed they have more subscribers than you.
Some have 1/3 the number of subs compared to me. So that’s not it.
Amazing video, thank you. Is companion the only ai plugin you use in Obsidian? Looking forward to seeing more practical AI obsidian applications.
Ha! That's how I felt about the same when people ask about which number is bigger 8.8 vs 8.21! It depends in what context! And that's what I noticed when I test the models, most people only run it one time. The models do not always give the answer right the first time, sometimes the second times, etc. Great video.
Could you explain what the generation completion hotkey does in the Companion plugin for Obsidian? When I use the Companion, it automatically generates text, completes it, and streams the response. So, in what situation would I need to use this hotkey? I'd appreciate it if you could clarify this because I was confused by this.
Thanks for the great content. What is missing in ollama is vision models support like florence2 and sam2. If it had a nice api for that, that could be used with curl or so... dreams.
Raspberry pi with vision models must be so incredibly overpowered, I prefer not thinking about it too much
Raspberry pi overpowered???? way underpowered is more accurate, especially considering the cost of them. Physical size is the big benefit these days. But Florence2 looks like an older model that didn't get much love. Some of the other vision models on Ollama got a lot more coverage. And hadn't heard of sam2 either. Both architectures aren't supported so would require a lot of work to get working.
Good stuff
Really this channel deserve way more exposure! Love the contents and the host ! Keep the good work thanks
Why do you quit ollama 😢😢😢
Are you asking about quitting the app? Or why I left the company? That second thing is not something for this comment thread.
@@technovangelist Due to your hesitance on commenting, we'll just assume they were having Diddy parties until you clear it up
@@emmanuelgoldstein3682 did you just say diddy party brah? jajajajaja
Company
I tried to run Llama3.2 1b in Samsung s 20 plus Error: no suitable llama servers found. And I am running ollama serve
just use layla lite then import the model. Yep its a hassle on making your lammacpp to work
I don't know what are you talking about
Just tried the 3.2:3b. I said hello and got a reply blazingly fast, so I asked if it was on meth or something. Got the standard "I'm just a model, I can't human", so I said I was just surprised to see such fast answers on a local model. And this is where things got confused.
Apparently, Llama3.2:3b thinks it's working off a cloud-service. It refused the notion that I'm running this locally.
Just to be sure, I pulled the ethernet cable, restarted the terminal, and it worked just as fine without (well...duh).
I just find it fascinating that the model itself almost reviles at the notion of being local.
I would request you to test out llms on some complex tools (as simple as file create tool fails on 3b model). I assume of i give proper func.desc, it might not. Need to experiment
1st. Always thank you Matt. Question. I've been testing 3B since launch ata my Laptop with NPU. WebUI on a server and Ollama in my Laptop. The thing is My laptop has NPU and Ollama is not taking advantages of it. Ollma 3B is taking the small GPU and CPU. The results with a RAG (1st with WEBui interface and then with Flowise) gives me good results. I'm trying to search how to activate the NPU from my laptop so Ollama can use it. It would be great. I think LLM Studio does it? Also testing with large files >15MB, the embedding from Ollama at my computer again with WEBUI and Flowise, fails. The computer can't handle. Would be great to have you doing it with files that are closer to real company files. I think 3B model it's great. The last test I did is using it with Groq and of course, there is where I can test it 🙂
I’ve tested 3b parameters model and very impressed with it. Speed and accuracy. I use it for improving descriptions in catalog. And it looks like we don’t have to pay to Open AI anymore. Thank you for your video!
Hi Matt very clear presentation yet again. I also really enjoy your polished style, so I subscribed even though I do so rarely. Cheers !
There is an awesome video on RUclips talking about the specific model and how censored it is. I wouldn't be surprised if people find the older model to be more capable.
llama 3.2 1B Q4:
To count the number of 'F's and 'R's, I'll go through the word "strawberry" carefully.
Here's the breakdown:
* 'F': 1
* 'R': 2
So, there are 1 'F' and 2 'R's in the word "strawberry".
or:
please create a java program to count the number of 'F's and 'R's in the word 'strawberry'
public class Main {
public static void main(String[] args) {
// Define the word
String word = "strawberry";
// Initialize counters for 'F's and 'R's
int countF = 0;
int countR = 0;
// Loop through each character in the word
for (char c : word.toLowerCase().toCharArray()) {
// Check if the character is 'F' or 'R'
if (c == 'f' || c == 'r') {
// Increment the counter
if (c == 'f') {
countF++;
} else {
countR++;
}
}
}
// Print the results
System.out.println("Number of 'F's: " + countF);
System.out.println("Number of 'R's: " + countR);
}
}
output:
Number of 'F's: 0
Number of 'R's: 3
sorry, they left me home alone...
Llama 3.1 & 3.2 are unfortunately very poor in Dutch language usage
Hey Matt, thanks for a great video - do you keep the code featured in your videos in public repos?
I want you to count the number of r's in Strawberry.
To do so I want you to go Letter by letter and every time you find one r I want you to count up
Gets it right every time...
All things local AI and I just subscribed that’s what I need
What if you set temperature to 0, does the tool functions test succeed better?
The vision portion isn't to great.
Well explained especially the 1B
interesting video. thank you
Love the breath holding tangent!
ollama makes it so easy
Good video
Thanks
What??? You are too kind... a member AND a tip. Thanks so much.
@@technovangelistI just love the simple and yet the comprehensive way you explain the subjects. Keep up the good work❤
Man you forgot your cup!
Cool video!
Awww yeah!
Meta Matt!
Microsoft GRIN MoE: A Gradient-Informed Mixture of Experts MoE Model 6.6b
Ranks better
In benchmarks? Or in real tests. One is useful the other has zero real value.
There's 4 killers in the room. Since when does dying make you not a killer?
Good point.
Hey Matt, nice video. But I don’t think it’s as impressive as you put it. I am sure the llama3.1’s performance was comparable
It wasn’t available in a 1 and 3 b model.
Lovin' the channel. 👍👍It'll be great once Ollama supports vision
Ollama does support vision today. The llama3.2 vision should be very soon
Great content. Could you briefly describe the machine you use for this task? You mentioned 3 seconds…
I usually do and forgot this time. M1 Max MacBook Pro with 64gb. A machine you can get for about 1500 usd today.
Which hardware setup you have ?
I'm on a m1 MacBook Pro Max with 64GB RAM
ollama run llama3.2:1b
Error: llama runner process has terminated: signal: abort trap error:done_getting_tensors: wrong number of tensors; expected 147, got 146
any idea about this error?
You need to update ollama. You should always update whenever there is a new version.
ok ,I will try it , maybe it is GFW issue, thanks.
when is ollama getting the vision models anyone know?
The team is working on it.
@@technovangelist awesome, thanks Team!
Thank you for the video. What is the tool you use for writing?
Obsidian. And the plugin for it was companion
Thanks!
First test I did was "what number is larger 9.9 or 9.11?" and it insisted 9.11 was bigger. When is 2.3 out?
It's amazing how such a small model is smarter than you?