Ollama Course - 3 - How to use the Ollama.com site to Find Models

Matt Williams

Просмотров 8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 ноя 2024

Комментарии • 44

@AK-ox3mv 13 дней назад ⁺¹
6:10 a video on benchmarks world is so necessary
@fabriai 3 месяца назад ⁺²
After watching this video, I can't stop singing "Das Model" from Kraftwerk. Thanks, Matt; this course is awesome.
@talktotask-ub5fh Месяц назад
Hi Matt, great content.
I loved this way, with the subtitled videos.
Thanks!
@em22en 3 месяца назад
Loved the hints to choose the best model for the problem you want to solve
@vexy1987 3 месяца назад ⁺¹
Thanks for these Matt. Super useful. I hope you'll continue through to Open WebUI and it's more advanced features.
@squartochi 3 месяца назад
Thanks for taking time to make these videos!
@CuvelierPhilippe 3 месяца назад ⁺¹
Thanks for this nex course
@nielsstighansen1185 3 месяца назад
Thanks for this and all af your videos.
How much is “a lot of extra memory” ?
Would 32GB RAM be enough or do I need 128GB RAM on new M4 MacBook?
Llama3.1 runs just fine in 32GB RAM
@technovangelist 3 месяца назад
Depends on the size of the model, the size of the max context, and the size of the context you are using. There isn't a great calculator either
@jimlynch9390 3 месяца назад ⁺¹
Hey, Matt. This is a spot on topic in a highly desirable and necessary course. Thank you. Just one question, You mentioned to be careful setting the context size 'cause you might run out of memory. Is that CPU or GPU memory? If you have a bit of GPU VRAM, does the main memory get used for more than just what a program might normally use for program storage and temporary data?
@andrewzhao7769 3 месяца назад
thank you very much Matt, this is really helpful
@unokometanti8922 3 месяца назад
Great stuff. As usual I’d say. So, other than ‘hit and miss’ approach…any possible way you might suggest for hunting down the right model to use with Fabric, for instance?
@technovangelist 3 месяца назад ⁺¹
Definitely not hit and miss. Try a lot and be methodical. Find the best one for you.
@MoeMan-f2w 3 месяца назад
TBH thought it would be a boring basic subject 😅 boy I was wrong!
Thanks for the video ❤ keep it up
@jonasmenter3640 2 месяца назад
What would you day is the best model for pdf to json tasks? :) and is there a way to get the output without linebreaks? greetings
@willTryAgainTmrw Месяц назад
What does "K_L/M/S" etc mean for quantized models? Why are L larger than M for same quantization?
@CrazyTechy 3 месяца назад
Matt, thanks for your content. Is there an Ollama model that you can use to check for plagiarism? I am creating short articles using ChatGPT. Another question. Is there a command that can interrupt llama3.1 while it’s outputting an answer? /bye doesn’t work.
@technovangelist 3 месяца назад
Ctrl c will stop.
@technovangelist 3 месяца назад
I don’t think a model will check but that seems a good use for rag. Do a search for similar content, chunk it up and your comparison article. Then similarity search. If it has a bunch of chunks very similar to content in any one other article it would be another piece of evidence pointing to plagiarism. But it might still need some assessment to figure it out for sure.
@CrazyTechy 3 месяца назад
@@technovangelist Matt, I now understand RAG and how you can use it to extend an LLM, but I won't be able to implement your very good idea. But, I see how you think--deep tech. So, what do you think about Grammarly? It will check text, and it's just $12 a month. When I graduated in 1973, they only had mainframes. I worked for Chrysler (MI Tank). And worked with Madonna's father, Tony Ciccone.
@technovangelist 3 месяца назад
I used to use grammarly until the company I worked at banned the use of it for security issues.
@CrazyTechy 3 месяца назад
@@technovangelist OMG. I will need to do a search on that. I worry about my solar powered WiFi camera I bought from Amazon and that WiFi power adapter my wife uses to activate our coffee maker in the morning. Thanks.
@FrankSchwarzfree 3 месяца назад
YEAH!!!!!!!
@spacekill 2 месяца назад
"If, for example, I have more than one model downloaded, and one is chat, another is multimodal, and another generates images, can I make it so that Ollama chooses which model to use based on a prompt, or does it by default use the one you've chosen with the `ollama run` command?"
@technovangelist 2 месяца назад
It doesn’t do that. But you could build an app that does that.
@spacekill 2 месяца назад
@@technovangelist ok . 100 Thanks
@mpesakapoeta 3 месяца назад
How can i download a model in .gguf format locally,my reason is am transferring the model to a computer being used remotely in a health facility with no phone or internet network.
@technovangelist 3 месяца назад
You want to dl the model from hf? And then add to ollama? Or you want to do with ollama then transfer to a different computer? Ollama uses gguf but I don’t understand exactly what you want
@NLPprompter 2 месяца назад
can i do this in model file?
FROM llama3.1
PARAMETER num_ctx 130000
or should i set that in environment instead?
@technovangelist 2 месяца назад
Yup. That goes in the modelfile.
@NLPprompter Месяц назад
@@technovangelist hm... so... how do i check if this custom model is successfully using 130k context rather than 2k default context?,
I'm wondering this because... here is the story:
i was try zed code editor and load deepseek-coder-v2 as expected in zed it show 2k context length (i believe it is the default deepaeek... ollama)
then i do ollama create mydeepseek with max_ctx 130k specified in modelfile
back to zed load that mydeepseek in zed and.... it still show 2k maximum context length
I re check the model file and it is still set at 130k
scratching my head, then i decide to edit zed configs.json or is it settings.json i forgot which file name but anyway in there i specified mydeepseek in ollama should have maximum token 130k
then re open zed wala... it is 130k max.
then i wonder how do i check mydeepseek maxctx, i believe zed have default max token 2k global ollama setting unless user specify it, or.... my model file is wrong typed.
@technovangelist Месяц назад
The parameter in num_ctx not max_ctx as you show in this text
@technovangelist Месяц назад
You can also set it in the api. Maybe zed is overriding what is set in the modelfile.
@technovangelist Месяц назад ⁺¹
But the best way to set it for ollama is in the modelfile. That’s not an environment var thing.
@muraliytm3316 3 месяца назад
Hello sir can you explain me how to install cuda drivers and make ollama use gpu for running models
@technovangelist 3 месяца назад
Follow nvidia instructions
@technovangelist 3 месяца назад
Ollama will use the gpu automatically if it’s supported. If you have a very old gpu it won’t work. What gpu do you have
@muraliytm3316 3 месяца назад
@@technovangelist I have nvidia gtx 1650 4gb sir, Thank you very much for responding fastly and I have an issue of antimalware executable running on my windows laptop and it is consuming a lot of memory how can i fix that
@technovangelist 3 месяца назад
Easy. Remove that software and don’t do anything silly with your computer
@technovangelist 3 месяца назад ⁺¹
I don’t see the 1650 being supported. The ti version is.
@FlorianCalmer 3 месяца назад
It's too bad, we used to be able to filter by newest models including the user submitted ones. It was fun discovering new user models but now there's no way to do that.

Следующие

Автовоспроизведение