Hi, thanks for the video, it is really great but since you do tutorial for python as well and I’m really interested about it, can you please provide us with a tutorial on how to run the model in python? It would be pretty useful for making own applications or voice interacting.
regarding '(or even non fine-tuned models)', he does not need to make a seperate tutorial for cloud. you know how to run the model locally, now you should learn how to run 'any kind of service' on the cloud, then you can merge these two concepts and achieve what you need.
The Flood gates just got breached. 😂 kidding, but this is awesome i wish you mentioned ollama-webui as well. it might be more helpful for people with home lab/lab setup
I am a java developer . I domt have idea about these AI stuff. How to train these systems or models when deployed locally . With documents??. From whom.and where to get . Whats the aurhenticity of those docs 🤔🤔
I would not necessarily buy a PC for this specifically. You can also just rent a server for a small amount of money and run Ollama there. But in general, you need enough RAM (at least 16GB I would say) and a decent GPU (preferably NVIDIA due to CUDA).
yeah, if you have hardware for $8k you can run the good models very easily LOL. I wonder how many can actually install them and no video is mentioning the actual requirements but I think it should be 48 GB VRAM
there is a phrase you need to tell mixtral before asking him anything. Lookup dolphin mixtral uncensored phrase on google and youll find it. Also, if you use LM studio you can change the gpu usage, there is a parameter you can set to -1 (forgot the name) and it uses your gpu to 100%
If you're on Windows, try LM-Studio. You could probably also compile ollama yourself for Windows, although it's probably not going to be the best experience.
I love how I used Lm studio 1 day before you posted this video 😂
Hi, thanks for the video, it is really great but since you do tutorial for python as well and I’m really interested about it, can you please provide us with a tutorial on how to run the model in python? It would be pretty useful for making own applications or voice interacting.
would be nice if you could give a demo on how to run custom/fine-tuned open source models on cloud services (or even non fine-tuned models)
In other words explain the big question for most companies right now.
regarding '(or even non fine-tuned models)', he does not need to make a seperate tutorial for cloud. you know how to run the model locally, now you should learn how to run 'any kind of service' on the cloud, then you can merge these two concepts and achieve what you need.
Awsome! How could i run this in VSC as to extract the response generated?
I got mine working as a result of this. Thank you.
🙌🏻
Looks cool. Now all I need is a mega-spec machine :-(
Do you have any videos about post-processing LLM outputs and achieving the best accuracy (e.g. we have math tasks with an exact and only one answer)?
I've just switched to nvidia and the model runs so much faster. I definitely can't recommend using this if you don't have a nvidia gpu.
The Flood gates just got breached. 😂 kidding, but this is awesome
i wish you mentioned ollama-webui as well. it might be more helpful for people with home lab/lab setup
You can run most of the models without a GPU; however, it isn't as fast.
Do I need RAM or VRAM to run it locally? How slow would it be?
@@davidmegri It's reasonably fast. Also, you only need about 8 GB of ram to handle most models. You don't need any VRAM.
Great news! Gonna try it out. Thanks man!@@mak448a
Are you using any gpu? If so what is the configuration?
It automatically selects the hardware. If enough vram, on gpu, otherwise on cpu. Also, you can easily make a web ui for it ( gradio).
Im curious to get an idea of performance. I tried using privateGPT a few months ago and it wasnt great.
how to access this LLM with python locally? how to import it into a python script? any videos on that topic? great content by the way
Thanks for video , can you please also let us know, how to uninstall any module , it is eating up huge space,
Thank you!
i can hear new keyboard sounds 😄
Same keyboard but tape modded and two O rings per key. Had to make it a bit less loud 😅
@@NeuralNine very good i like it 🔥
Thanks for the video! Even on my 16 Gb ram it runs but answers slowly but it s ok anyway
So ollama is like LMstudio right?
Don't forget the llama.cpp .
Yes, but it runs in the command line and is open source.
@mak448a try ollama web ui repo. Also, you can write your own with gradio library (under 50 lines).
@@aminrazeghi2962why not streamlit, is gradio better for this specific case?
Very useful! Thanks
one month later and there is still no support for windows and also no support for AMD gpu
I am a java developer . I domt have idea about these AI stuff. How to train these systems or models when deployed locally . With documents??. From whom.and where to get . Whats the aurhenticity of those docs 🤔🤔
Very cool!! Thank you!
Good find!
awesome thanks!!
Hi. I am beginer in the topic.
Would you advice me what typical confguration of PC/laptop would be sufficient for that?
I would not necessarily buy a PC for this specifically. You can also just rent a server for a small amount of money and run Ollama there. But in general, you need enough RAM (at least 16GB I would say) and a decent GPU (preferably NVIDIA due to CUDA).
is there a way to hire a VPS with things like T4 GPU by the query so we can use the high-end hardware on a query-based pay-as-you-go plan?
If you run in Windows via WSL, can you use GPU? Since virtualization doesn't use native devices
❤
Do I need to have GPU to run these models?
Got the Windows version, how to know that it is the 70B version of LLama2?
GPU RAM or CPU RAM?
All i heard is
CLICKY CLACK CLACK CLACK CLACK.
such a noisy keyboard. Might as well use a typewriter
I fucking searched up how to put llama on a localhost NOT RUN LLAMA ON LOCALLY
Can I finetune a model on my local data and use that model to do my desired tasks locally with ollama?
I would like to learnt that too
Main question is can these models remember the previous prompts and responses? I guess not. That’s a big downside
How to interact with these models programatically? Like in a Java server application???
You can just send POST requests to them (for example via the requests package in Python).
yeah, if you have hardware for $8k you can run the good models very easily LOL. I wonder how many can actually install them and no video is mentioning the actual requirements but I think it should be 48 GB VRAM
Naco??? from better call saul???
On mac.
Is an msi delta 15 capable enough? I run linux with drivers
it should be.
@@umutsen2290 hmm. How noticeable would it be compared to chatgpt
dolphin-mixtral runs very slow on my device(10% GPU usage) and also is censored
the standard mistral runs very smoothly though. Anybody know why?
there is a phrase you need to tell mixtral before asking him anything. Lookup dolphin mixtral uncensored phrase on google and youll find it. Also, if you use LM studio you can change the gpu usage, there is a parameter you can set to -1 (forgot the name) and it uses your gpu to 100%
because it is a large model?
@@aaroldaaroldson708 nah my gpu is only used by 10 percent, thats my main problem. I would like the other 90 percent to be used
@@jvuyuxcvxhykykyu3653 I see. I think there must be a way to tune it in ollama configs
yo i downloaded it and used it already but now when i try to run it will download the model again+
did you find a solution?
how do you delete llms youve installed using this method? the files are hard to find
Run each in a container (docker)
can we run olama 70b in if we give swap memory??
unlikely, it would be super slow on system ram and you need like 64GB of vram for a model that big, same amount of system ram if not.
Windows available
Could u pliz be having any resources to get started pliz..
Any video on that
What kind of resources?
LM studio is just better dont do this unless you have specific intentions in mind
Always use bigger fonts for RUclips, not all of us have eyesight like you.
bro looks like those ai video craetions for presantations
Thx_.
no information whatsoever
64 GB requeriment is aberrant.
+++++
Do these guys hate Windows users? JK...maybe not. C'mon with the Windows version already. Ubuntu is great but i need Windows!!!
Ollma is the easiest to set up. But no windows yet. There are many other alternatives such as lm studio, llama.cpp and llama.cpp-python and gpt4all.
If you're on Windows, try LM-Studio. You could probably also compile ollama yourself for Windows, although it's probably not going to be the best experience.
Im restricted to windows because alot of software i use for work requires it. Give WSL a try though.
Agreed get WSL
O! Lama!