Thanks for this and all af your videos. How much is “a lot of extra memory” ? Would 32GB RAM be enough or do I need 128GB RAM on new M4 MacBook? Llama3.1 runs just fine in 32GB RAM
Hey, Matt. This is a spot on topic in a highly desirable and necessary course. Thank you. Just one question, You mentioned to be careful setting the context size 'cause you might run out of memory. Is that CPU or GPU memory? If you have a bit of GPU VRAM, does the main memory get used for more than just what a program might normally use for program storage and temporary data?
Great stuff. As usual I’d say. So, other than ‘hit and miss’ approach…any possible way you might suggest for hunting down the right model to use with Fabric, for instance?
Matt, thanks for your content. Is there an Ollama model that you can use to check for plagiarism? I am creating short articles using ChatGPT. Another question. Is there a command that can interrupt llama3.1 while it’s outputting an answer? /bye doesn’t work.
I don’t think a model will check but that seems a good use for rag. Do a search for similar content, chunk it up and your comparison article. Then similarity search. If it has a bunch of chunks very similar to content in any one other article it would be another piece of evidence pointing to plagiarism. But it might still need some assessment to figure it out for sure.
@@technovangelist Matt, I now understand RAG and how you can use it to extend an LLM, but I won't be able to implement your very good idea. But, I see how you think--deep tech. So, what do you think about Grammarly? It will check text, and it's just $12 a month. When I graduated in 1973, they only had mainframes. I worked for Chrysler (MI Tank). And worked with Madonna's father, Tony Ciccone.
@@technovangelist OMG. I will need to do a search on that. I worry about my solar powered WiFi camera I bought from Amazon and that WiFi power adapter my wife uses to activate our coffee maker in the morning. Thanks.
"If, for example, I have more than one model downloaded, and one is chat, another is multimodal, and another generates images, can I make it so that Ollama chooses which model to use based on a prompt, or does it by default use the one you've chosen with the `ollama run` command?"
How can i download a model in .gguf format locally,my reason is am transferring the model to a computer being used remotely in a health facility with no phone or internet network.
You want to dl the model from hf? And then add to ollama? Or you want to do with ollama then transfer to a different computer? Ollama uses gguf but I don’t understand exactly what you want
@@technovangelist hm... so... how do i check if this custom model is successfully using 130k context rather than 2k default context?, I'm wondering this because... here is the story: i was try zed code editor and load deepseek-coder-v2 as expected in zed it show 2k context length (i believe it is the default deepaeek... ollama) then i do ollama create mydeepseek with max_ctx 130k specified in modelfile back to zed load that mydeepseek in zed and.... it still show 2k maximum context length I re check the model file and it is still set at 130k scratching my head, then i decide to edit zed configs.json or is it settings.json i forgot which file name but anyway in there i specified mydeepseek in ollama should have maximum token 130k then re open zed wala... it is 130k max. then i wonder how do i check mydeepseek maxctx, i believe zed have default max token 2k global ollama setting unless user specify it, or.... my model file is wrong typed.
@@technovangelist I have nvidia gtx 1650 4gb sir, Thank you very much for responding fastly and I have an issue of antimalware executable running on my windows laptop and it is consuming a lot of memory how can i fix that
It's too bad, we used to be able to filter by newest models including the user submitted ones. It was fun discovering new user models but now there's no way to do that.
6:10 a video on benchmarks world is so necessary
After watching this video, I can't stop singing "Das Model" from Kraftwerk. Thanks, Matt; this course is awesome.
Hi Matt, great content.
I loved this way, with the subtitled videos.
Thanks!
Loved the hints to choose the best model for the problem you want to solve
Thanks for these Matt. Super useful. I hope you'll continue through to Open WebUI and it's more advanced features.
Thanks for taking time to make these videos!
Thanks for this nex course
Thanks for this and all af your videos.
How much is “a lot of extra memory” ?
Would 32GB RAM be enough or do I need 128GB RAM on new M4 MacBook?
Llama3.1 runs just fine in 32GB RAM
Depends on the size of the model, the size of the max context, and the size of the context you are using. There isn't a great calculator either
Hey, Matt. This is a spot on topic in a highly desirable and necessary course. Thank you. Just one question, You mentioned to be careful setting the context size 'cause you might run out of memory. Is that CPU or GPU memory? If you have a bit of GPU VRAM, does the main memory get used for more than just what a program might normally use for program storage and temporary data?
thank you very much Matt, this is really helpful
Great stuff. As usual I’d say. So, other than ‘hit and miss’ approach…any possible way you might suggest for hunting down the right model to use with Fabric, for instance?
Definitely not hit and miss. Try a lot and be methodical. Find the best one for you.
TBH thought it would be a boring basic subject 😅 boy I was wrong!
Thanks for the video ❤ keep it up
What would you day is the best model for pdf to json tasks? :) and is there a way to get the output without linebreaks? greetings
What does "K_L/M/S" etc mean for quantized models? Why are L larger than M for same quantization?
Matt, thanks for your content. Is there an Ollama model that you can use to check for plagiarism? I am creating short articles using ChatGPT. Another question. Is there a command that can interrupt llama3.1 while it’s outputting an answer? /bye doesn’t work.
Ctrl c will stop.
I don’t think a model will check but that seems a good use for rag. Do a search for similar content, chunk it up and your comparison article. Then similarity search. If it has a bunch of chunks very similar to content in any one other article it would be another piece of evidence pointing to plagiarism. But it might still need some assessment to figure it out for sure.
@@technovangelist Matt, I now understand RAG and how you can use it to extend an LLM, but I won't be able to implement your very good idea. But, I see how you think--deep tech. So, what do you think about Grammarly? It will check text, and it's just $12 a month. When I graduated in 1973, they only had mainframes. I worked for Chrysler (MI Tank). And worked with Madonna's father, Tony Ciccone.
I used to use grammarly until the company I worked at banned the use of it for security issues.
@@technovangelist OMG. I will need to do a search on that. I worry about my solar powered WiFi camera I bought from Amazon and that WiFi power adapter my wife uses to activate our coffee maker in the morning. Thanks.
YEAH!!!!!!!
"If, for example, I have more than one model downloaded, and one is chat, another is multimodal, and another generates images, can I make it so that Ollama chooses which model to use based on a prompt, or does it by default use the one you've chosen with the `ollama run` command?"
It doesn’t do that. But you could build an app that does that.
@@technovangelist ok . 100 Thanks
How can i download a model in .gguf format locally,my reason is am transferring the model to a computer being used remotely in a health facility with no phone or internet network.
You want to dl the model from hf? And then add to ollama? Or you want to do with ollama then transfer to a different computer? Ollama uses gguf but I don’t understand exactly what you want
can i do this in model file?
FROM llama3.1
PARAMETER num_ctx 130000
or should i set that in environment instead?
Yup. That goes in the modelfile.
@@technovangelist hm... so... how do i check if this custom model is successfully using 130k context rather than 2k default context?,
I'm wondering this because... here is the story:
i was try zed code editor and load deepseek-coder-v2 as expected in zed it show 2k context length (i believe it is the default deepaeek... ollama)
then i do ollama create mydeepseek with max_ctx 130k specified in modelfile
back to zed load that mydeepseek in zed and.... it still show 2k maximum context length
I re check the model file and it is still set at 130k
scratching my head, then i decide to edit zed configs.json or is it settings.json i forgot which file name but anyway in there i specified mydeepseek in ollama should have maximum token 130k
then re open zed wala... it is 130k max.
then i wonder how do i check mydeepseek maxctx, i believe zed have default max token 2k global ollama setting unless user specify it, or.... my model file is wrong typed.
The parameter in num_ctx not max_ctx as you show in this text
You can also set it in the api. Maybe zed is overriding what is set in the modelfile.
But the best way to set it for ollama is in the modelfile. That’s not an environment var thing.
Hello sir can you explain me how to install cuda drivers and make ollama use gpu for running models
Follow nvidia instructions
Ollama will use the gpu automatically if it’s supported. If you have a very old gpu it won’t work. What gpu do you have
@@technovangelist I have nvidia gtx 1650 4gb sir, Thank you very much for responding fastly and I have an issue of antimalware executable running on my windows laptop and it is consuming a lot of memory how can i fix that
Easy. Remove that software and don’t do anything silly with your computer
I don’t see the 1650 being supported. The ti version is.
It's too bad, we used to be able to filter by newest models including the user submitted ones. It was fun discovering new user models but now there's no way to do that.