But the reason it was using so much cpu is that you need a gpu with more memory if you want to run a 70b model fast. Especially if there is a lot Of context.
@@technovangelist Ok so the 64GB memory requirement for 70b is at the GPU level not PC level. The RTX 3090 has only 24GB of GDDR6X memory that is why it is so sluggish. So, will be shopping for something like an A100 with 80GB of HBM2e memory and another rig since I won't be playing Helldivers 2 on that one!! ;-),
i am so glad you came back. so sorry about that. there were some really sketchy comments a while back for ages, so I approve each one, but the approve and delete buttons are close on the phone. And my daughter stepped out for a second and then DEMANDED i start reading Sophie Mouse (hopefully the need to read the same story every night goes away soon after she turns 6) right away and I fat fingered.
Hi! I come from using textgen webui, how does it compare to Ollama? What parameters like, temp, rep penalty etc does it use? Also, what system prompt does it use?
Thanks for the great video! Ollama was super easy to get up and running on Windows. I've been testing "ollama run llama3.2" on a few different machines: My refurbished Lenovo with an Intel i7, 64GB RAM, and 1TB SSD (no GPU) handles it pretty well for only $280! My old HP ProLiant server with dual quad-core Xeons (no GPU) is a bit faster, but it cost me $10K years ago 😅 My M2 MacBook Pro with 32GB RAM and 1TB SSD absolutely flies through it! Definitely worth the $3.5K price tag for this kind of performance
I open RUclips for a nice session of video watching and sir you have not disappointed! The timing was perfection! Been waiting for this day!!! Edit: Also for those wondering it is on the main Ollama webpage now too :)
Hi, I have llama 3.1 running on an rtx 2060 super. Computational resource requirements aren't nearly as high as people think with these models, hope this helps someone that thinks they need 10s of thousands in equipment :)
This is great. Thank you! I do have a question. When I run LM Studio (windows) I get a URL that I can use for the API to call the model from my python applications. I'm assuming that Ollama has the same functionality. How do I get that setup, and where do I find the URL to plug into my scripts? Thanks again!
I have a bunch of model I downloaded for under WSL2. Can I migrate them for the windows install of Ollama? Be nice to save time and data of not re-downloading them all.
I haven't had time to check out ollama yet, what are the advantages of ollama over oobabooga? Is it mostly ease use? Does it have significant features that justify jumping over from oobabooga? I already have 100s of GBs of models downloaded, can I convert them to your blob format without having to re-download? Those are just my initial questions. I'll be checking it out regardless.
I would say ease of use and power. You can create new models from the weights you already downloaded, but you need to know the prompt and template. That should be available in your current app or where ever you got the models from. there are docs in the repo
Does Ollama have an SSE2, etc. requirement? I have a 12 core Xeon with 80gb ram & 12gb vram. Weirdly the python common tools they often have CPU requirements, but uh, it's mostly on GPU right?
Never heard that one come up for ollama. I don’t think so. Ollama has no python code at all. the first version did but it was removed quickly because it was way too limiting. There used to be a requirement for avx but now it only uses it if it’s there.
Matt, I want to move the model blobs off of my C: drive to another drive where I have more free disk space. Does the Windows Ollama support the OLLAMA_MODELS environment variable somehow? And is there a way to confirm that Ollama is detecting my NVIDIA gpu?
I have a Dell Latitude 3410 with Intel Core i5 10310U 8 core 1.7 GHz, 16 GB RAM, 8 shared with Intel UHD. When running ollama with some of the models like llama3 or qwen, I noticed in the Task Manager that the GPU is hardly being utilized. Is there some way to utilize the GPU and improve the performance of a model? Needless to say, the output is agonizingly slow.
Yes but often it will be faster just to pull the models. That said you can create modelfiles for each of the model weights you have, include the system prompt and parameters and create a new model from that. There are some videos here about creating new ollama models or you can refer to the docs on modelfiles
The reason I have seen most mention is that LMStudio is a great place to start, but it's limiting. I haven't spent much time with it because its too frustrating, but that’s the feedback I have seen online. Not sure if it helps
HELLO CAN ANYONE KNOW HOW TO DELETE THE OLLAMA MODELS FROM LINUX DISTRIBUTION ON WIN 11 . I CANNOW DELETE EXACT OLLAMA MODELS .AND SUDOR RM (OLLAMA MODEL ) ONLY DELETED THE NAME I SUPPOSY .THE SPACE STILL SAME
Ollama rm modelname will delete the model. If another model uses the same weights file that will stay so you would need to delete all the models that use the same weights to see a difference.
Thanks for the update for the windows version. Can Ollama only run with gpu and not with cpu? (for example the gguf quantized version). I have 32gb of Ram and 2gb of vram on my laptop.
Glad to see this, though already running it in Docker and not sure what the advantage of switching to native is considering I don't have an NVidia GPU.
without a gpu, native is going to be far faster, because you don't have the multiple levels of abstraction. Docker on Windows is going to be the slowest of the 3 options.
Windows Error??: Anyone getting an error running a model on windows. When finish pulling or try to run model, i get: Error: Post "127.0.0.1:11434/api/chat": read tcp 127.0.0.1:51387->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host. I fully deleted my previous WSL install and cant see any port 11434 conflict. Any ideas??
Someone posted a comment about a 3090 on windows. I hate this interface. Too easy to delete and no way to get it back. Ask again whoever you were
But the reason it was using so much cpu is that you need a gpu with more memory if you want to run a 70b model fast. Especially if there is a lot
Of context.
@@technovangelist Ok so the 64GB memory requirement for 70b is at the GPU level not PC level. The RTX 3090 has only 24GB of GDDR6X memory that is why it is so sluggish. So, will be shopping for something like an A100 with 80GB of HBM2e memory and another rig since I won't be playing Helldivers 2 on that one!! ;-),
i am so glad you came back. so sorry about that. there were some really sketchy comments a while back for ages, so I approve each one, but the approve and delete buttons are close on the phone. And my daughter stepped out for a second and then DEMANDED i start reading Sophie Mouse (hopefully the need to read the same story every night goes away soon after she turns 6) right away and I fat fingered.
Hi Matt, are there any drivers for Intel UHD, ollama seems to use only the CPU
Amd and nvidia only
I literally checked their website 3-4 hours ago to see if the have the windows version up. I wasn't, now it is.
I checked this morning lol
Magic 😁
Hi! I come from using textgen webui, how does it compare to Ollama? What parameters like, temp, rep penalty etc does it use? Also, what system prompt does it use?
the system prompt and template will come from the model. But they are already set (and customizable if you like) in the model itself.
Thanks for the great video! Ollama was super easy to get up and running on Windows. I've been testing "ollama run llama3.2" on a few different machines:
My refurbished Lenovo with an Intel i7, 64GB RAM, and 1TB SSD (no GPU) handles it pretty well for only $280!
My old HP ProLiant server with dual quad-core Xeons (no GPU) is a bit faster, but it cost me $10K years ago 😅
My M2 MacBook Pro with 32GB RAM and 1TB SSD absolutely flies through it! Definitely worth the $3.5K price tag for this kind of performance
Thanks for giving me enough time to get hooked on linux
I was actually waiting for this, but just like you said, decided to install Linux on all my machines :)
Thanks for updating.. I was struggling with WSL and almost give up, then your vid show up on my youtube homepage. XD
Thank you. I’ve been waiting for this!
Thank you so much. I never touched Linux in my life, so this really saved me.
Thanks for bringing the good news! Just installed it yesterday on WSL2, guess I'll reinstall it natively now.
There are directions that work, and those that do not, AI is difficult and lots of fake stuff. Trusting you is so easy, you are GOLD!
Yay! I have a Windows Gaming Laptop that I have been dying to try Ollama on because of the GPU. This is going to be Soooo much better! Thanks guys!
That's really good news, thanks! :D :D :D
It was about time! XD
Yay! No more WSL2. Thank you.
this is amazing and THANK YOU! 💌 Do you have to do anything special for ollama to use the nvidia drivers?
That weird silenceness in the end XD
I open RUclips for a nice session of video watching and sir you have not disappointed! The timing was perfection!
Been waiting for this day!!!
Edit: Also for those wondering it is on the main Ollama webpage now too :)
Very nice sharing 👍
congratulations !
This is very cool, I am getting 10% improvement running native compare to WSL.
Omg awesome thanks for the update
What?! YAY!!! Thank you very much Ollama wanna play 🙂
Hi, I have llama 3.1 running on an rtx 2060 super. Computational resource requirements aren't nearly as high as people think with these models, hope this helps someone that thinks they need 10s of thousands in equipment :)
It can need a lot of you configure it to use the full context. Straight from ollama it uses a 2k context but can support 128k
This is great. Thank you! I do have a question. When I run LM Studio (windows) I get a URL that I can use for the API to call the model from my python applications. I'm assuming that Ollama has the same functionality. How do I get that setup, and where do I find the URL to plug into my scripts? Thanks again!
localhost:11434/api/chat or generate. all the docs are in the repo. This is an area where ollama really shines over the alternatives.
I have a bunch of model I downloaded for under WSL2. Can I migrate them for the windows install of Ollama? Be nice to save time and data of not re-downloading them all.
Does this mean we'll have nuclear fusion this week???!!
I haven't had time to check out ollama yet, what are the advantages of ollama over oobabooga? Is it mostly ease use? Does it have significant features that justify jumping over from oobabooga? I already have 100s of GBs of models downloaded, can I convert them to your blob format without having to re-download? Those are just my initial questions. I'll be checking it out regardless.
I would say ease of use and power. You can create new models from the weights you already downloaded, but you need to know the prompt and template. That should be available in your current app or where ever you got the models from. there are docs in the repo
@@technovangelist Awesome, thanks!
Finally
And thanks for reminding me to drink water❤
hello, any idea how to set keep_alive when running the windows exe ?
you are THE GUY thx
Dumb question but I have ollama webui on my synology nas and Ollama on my PC (GPU) how would one make the nas webui work with my pc????
Does Ollama have an SSE2, etc. requirement? I have a 12 core Xeon with 80gb ram & 12gb vram. Weirdly the python common tools they often have CPU requirements, but uh, it's mostly on GPU right?
Never heard that one come up for ollama. I don’t think so. Ollama has no python code at all. the first version did but it was removed quickly because it was way too limiting. There used to be a requirement for avx but now it only uses it if it’s there.
Great work. Now can you create more channels for the discord? One is a mess.
Matt, I want to move the model blobs off of my C: drive to another drive where I have more free disk space. Does the Windows Ollama support the OLLAMA_MODELS environment variable somehow? And is there a way to confirm that Ollama is detecting my NVIDIA gpu?
yes, that env var is the way to do it.
Hi, Ollama doesn't seem to be utilising my GPU, what can be the cause for this?
this is great, thanks. how do Iinstall in a different location?
I have a Dell Latitude 3410 with Intel Core i5 10310U 8 core 1.7 GHz, 16 GB RAM, 8 shared with Intel UHD. When running ollama with some of the models like llama3 or qwen, I noticed in the Task Manager that the GPU is hardly being utilized. Is there some way to utilize the GPU and improve the performance of a model? Needless to say, the output is agonizingly slow.
I don’t see any mention of a nvidia or amd gpu. Without that no gpu can be used
Thanks for this. I downloaded mistral-7B-v0.1 from somewhere on the internet. How do I load this model that I have locally on my hard drive?
Watch the video Adding Custom Models to Ollama
ruclips.net/video/0ou51l-MLCo/видео.html
@@technovangelist thank you
hey boys can you help me with this :"wsarecv: An existing connection was forcibly closed by the remote host."
so if i do what you said in the video, how can i make it listen to my voice and speak back to me like chatgpt4o?
When we download a model e.g. mistral where does it store in our local machine ? As its available locally so we do not need internet right ?
Yes. All local after you get it.
What if I already have LLM files downloaded, how can I point Ollama to the folder and use it?
Yes but often it will be faster just to pull the models. That said you can create modelfiles for each of the model weights you have, include the system prompt and parameters and create a new model from that. There are some videos here about creating new ollama models or you can refer to the docs on modelfiles
if I want to use ollama as a llm model in chatbot it can be useful?
Yes, definitely
now how to have code assistant in vs code an ollama in windows?
Finally ❤
Thank you
how do i make llama3 uncensored for research purposes
u can run other unrestricted open source ai models
you cant, i just wasted 3 hours trying
Error: no suitable llama servers found
sometimes rare edge cases come up. You will probably get a quicker response in the ollama discord at discord.gg/ollama.
it was about time lol
Viewer: "You know can you use multiple takes in case you flub a line."
Matt: "What did you just say to me?"
Huh? I don’t understand
Why would I use this over LMStudio?
The reason I have seen most mention is that LMStudio is a great place to start, but it's limiting. I haven't spent much time with it because its too frustrating, but that’s the feedback I have seen online. Not sure if it helps
@@technovangelist plus LMStudio is not open source
HELLO CAN ANYONE KNOW HOW TO DELETE THE OLLAMA MODELS FROM LINUX DISTRIBUTION ON WIN 11 . I CANNOW DELETE EXACT OLLAMA MODELS .AND SUDOR RM (OLLAMA MODEL ) ONLY DELETED THE NAME I SUPPOSY .THE SPACE STILL SAME
Ollama rm modelname will delete the model. If another model uses the same weights file that will stay so you would need to delete all the models that use the same weights to see a difference.
Hi Matt, can we use ollama without GPU? If yes, how?
Yes you can! But its super slow. GPU is really needed to make the experience good.
@@technovangelist Yes you are correct, I checked it on wsl2 with gguf .. it was like 3-5 sec per token on my machine.
Can I install it on the D: drive?
Thanks for the update for the windows version. Can Ollama only run with gpu and not with cpu? (for example the gguf quantized version). I have 32gb of Ram and 2gb of vram on my laptop.
So ollama uses the quantized gguf models. So that means it will use gpu when there is a good gpu there, but will drop down to cpu when its not.
204MB/s
DAMNNNNNNNNNNNNNNNNNNNN
So it runs in CLI? Why is it there?
You would prefer it to not be on windows?
Finally on windows… because windows sucks ass…
Glad to see this, though already running it in Docker and not sure what the advantage of switching to native is considering I don't have an NVidia GPU.
without a gpu, native is going to be far faster, because you don't have the multiple levels of abstraction. Docker on Windows is going to be the slowest of the 3 options.
@@technovangelist cool thanks, will definitely give it a go
Windows Error??:
Anyone getting an error running a model on windows. When finish pulling or try to run model, i get:
Error: Post "127.0.0.1:11434/api/chat": read tcp 127.0.0.1:51387->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host.
I fully deleted my previous WSL install and cant see any port 11434 conflict.
Any ideas??