Run Your Own LLM Locally: LLaMa, Mistral & More

NeuralNine

Просмотров 67 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 26 ноя 2024

Комментарии • 87

@Deltaforce8472_ 11 месяцев назад ⁺⁵
I love how I used Lm studio 1 day before you posted this video 😂
@devox7583 11 месяцев назад ⁺¹⁰
Hi, thanks for the video, it is really great but since you do tutorial for python as well and I’m really interested about it, can you please provide us with a tutorial on how to run the model in python? It would be pretty useful for making own applications or voice interacting.
@mmzzzmeemee 11 месяцев назад ⁺¹⁰
would be nice if you could give a demo on how to run custom/fine-tuned open source models on cloud services (or even non fine-tuned models)
9 месяцев назад
In other words explain the big question for most companies right now.
@umutsen2290 8 месяцев назад
regarding '(or even non fine-tuned models)', he does not need to make a seperate tutorial for cloud. you know how to run the model locally, now you should learn how to run 'any kind of service' on the cloud, then you can merge these two concepts and achieve what you need.
@diogomarques8 9 месяцев назад ⁺⁴
Awsome! How could i run this in VSC as to extract the response generated?
@sarnobat2000 25 дней назад
I got mine working as a result of this. Thank you.
@NeuralNine 18 дней назад
🙌🏻
@daveys 11 месяцев назад ⁺²
Looks cool. Now all I need is a mega-spec machine :-(
@mak_kry 8 месяцев назад ⁺¹
Do you have any videos about post-processing LLM outputs and achieving the best accuracy (e.g. we have math tasks with an exact and only one answer)?
@seasong7655 9 месяцев назад
I've just switched to nvidia and the model runs so much faster. I definitely can't recommend using this if you don't have a nvidia gpu.
@HomelessDeamon 11 месяцев назад ⁺²
The Flood gates just got breached. 😂 kidding, but this is awesome
i wish you mentioned ollama-webui as well. it might be more helpful for people with home lab/lab setup
@mak448a 11 месяцев назад ⁺⁴
You can run most of the models without a GPU; however, it isn't as fast.
@davidmegri 11 месяцев назад
Do I need RAM or VRAM to run it locally? How slow would it be?
@mak448a 11 месяцев назад ⁺¹
@@davidmegri It's reasonably fast. Also, you only need about 8 GB of ram to handle most models. You don't need any VRAM.
@davidmegri 11 месяцев назад
Great news! Gonna try it out. Thanks man!@@mak448a
@mdbayazid6837 11 месяцев назад ⁺⁴
Are you using any gpu? If so what is the configuration?
@aminrazeghi2962 11 месяцев назад ⁺¹
It automatically selects the hardware. If enough vram, on gpu, otherwise on cpu. Also, you can easily make a web ui for it ( gradio).
@uberdan08 11 месяцев назад ⁺¹
Im curious to get an idea of performance. I tried using privateGPT a few months ago and it wasnt great.
@GHOOSTHUN 2 месяца назад
how to access this LLM with python locally? how to import it into a python script? any videos on that topic? great content by the way
@malikrajat 7 месяцев назад
Thanks for video , can you please also let us know, how to uninstall any module , it is eating up huge space,
@Anjinink 11 месяцев назад ⁺¹
Thank you!
@youssefalkhodary 11 месяцев назад
i can hear new keyboard sounds 😄
@NeuralNine 11 месяцев назад ⁺¹
Same keyboard but tape modded and two O rings per key. Had to make it a bit less loud 😅
@youssefalkhodary 11 месяцев назад
@@NeuralNine very good i like it 🔥
@НикитаЦеханович 8 месяцев назад
Thanks for the video! Even on my 16 Gb ram it runs but answers slowly but it s ok anyway
@StellarStoic 11 месяцев назад ⁺⁵
So ollama is like LMstudio right?
@aminrazeghi2962 11 месяцев назад
Don't forget the llama.cpp .
@mak448a 11 месяцев назад ⁺⁴
Yes, but it runs in the command line and is open source.
@aminrazeghi2962 11 месяцев назад
@mak448a try ollama web ui repo. Also, you can write your own with gradio library (under 50 lines).
@Anton_Sh. 11 месяцев назад
@@aminrazeghi2962why not streamlit, is gradio better for this specific case?
@proflead 3 месяца назад
Very useful! Thanks
@seasong7655 10 месяцев назад ⁺¹
one month later and there is still no support for windows and also no support for AMD gpu
@subramanianchenniappan4059 9 месяцев назад
I am a java developer . I domt have idea about these AI stuff. How to train these systems or models when deployed locally . With documents??. From whom.and where to get . Whats the aurhenticity of those docs 🤔🤔
@seanbreeden 10 месяцев назад
Very cool!! Thank you!
@procorefritz4968 11 месяцев назад
Good find!
@Ataraxia1 9 месяцев назад
awesome thanks!!
@yam217 25 дней назад
Hi. I am beginer in the topic.
Would you advice me what typical confguration of PC/laptop would be sufficient for that?
@NeuralNine 18 дней назад
I would not necessarily buy a PC for this specifically. You can also just rent a server for a small amount of money and run Ollama there. But in general, you need enough RAM (at least 16GB I would say) and a decent GPU (preferably NVIDIA due to CUDA).
@oliverli9630 11 месяцев назад ⁺¹
is there a way to hire a VPS with things like T4 GPU by the query so we can use the high-end hardware on a query-based pay-as-you-go plan?
@dominiks5318 6 месяцев назад
If you run in Windows via WSL, can you use GPU? Since virtualization doesn't use native devices
@ThiagoSilvaOfficial 5 месяцев назад
❤
@erodotosdemetriou6506 10 месяцев назад ⁺¹
Do I need to have GPU to run these models?
@luminaire7085 9 месяцев назад
Got the Windows version, how to know that it is the 70B version of LLama2?
@siddarth26 4 месяца назад
GPU RAM or CPU RAM?
@mrvincefox 11 месяцев назад ⁺²
All i heard is
CLICKY CLACK CLACK CLACK CLACK.
such a noisy keyboard. Might as well use a typewriter
@CreySound 2 месяца назад
I fucking searched up how to put llama on a localhost NOT RUN LLAMA ON LOCALLY
@nasiksami2351 10 месяцев назад ⁺¹
Can I finetune a model on my local data and use that model to do my desired tasks locally with ollama?
@ajarivas72 9 месяцев назад
I would like to learnt that too
@IrakliKavtaradzepsyche 9 месяцев назад
Main question is can these models remember the previous prompts and responses? I guess not. That’s a big downside
@WikiPeoples 4 месяца назад
How to interact with these models programatically? Like in a Java server application???
@NeuralNine 4 месяца назад
You can just send POST requests to them (for example via the requests package in Python).
@quaterman1270 10 месяцев назад
yeah, if you have hardware for $8k you can run the good models very easily LOL. I wonder how many can actually install them and no video is mentioning the actual requirements but I think it should be 48 GB VRAM
@mistikalcanavarlarparlamen3265 4 месяца назад
Naco??? from better call saul???
@pythonlibrarian224 10 месяцев назад
On mac.
@qjdianenfjaiwk 9 месяцев назад
Is an msi delta 15 capable enough? I run linux with drivers
@umutsen2290 8 месяцев назад
it should be.
@qjdianenfjaiwk 8 месяцев назад
@@umutsen2290 hmm. How noticeable would it be compared to chatgpt
@jvuyuxcvxhykykyu3653 10 месяцев назад
dolphin-mixtral runs very slow on my device(10% GPU usage) and also is censored
the standard mistral runs very smoothly though. Anybody know why?
@charlesmayer2047 10 месяцев назад
there is a phrase you need to tell mixtral before asking him anything. Lookup dolphin mixtral uncensored phrase on google and youll find it. Also, if you use LM studio you can change the gpu usage, there is a parameter you can set to -1 (forgot the name) and it uses your gpu to 100%
@aaroldaaroldson708 10 месяцев назад
because it is a large model?
@jvuyuxcvxhykykyu3653 10 месяцев назад
@@aaroldaaroldson708 nah my gpu is only used by 10 percent, thats my main problem. I would like the other 90 percent to be used
@aaroldaaroldson708 10 месяцев назад
@@jvuyuxcvxhykykyu3653 I see. I think there must be a way to tune it in ollama configs
@razeik5.088 9 месяцев назад
yo i downloaded it and used it already but now when i try to run it will download the model again+
@user_matrix2108 9 месяцев назад
did you find a solution?
@charlesmayer2047 10 месяцев назад
how do you delete llms youve installed using this method? the files are hard to find
@haganlife 9 месяцев назад
Run each in a container (docker)
@yasiruperera-s3h 8 месяцев назад
can we run olama 70b in if we give swap memory??
@aouyiu 4 месяца назад
unlikely, it would be super slow on system ram and you need like 64GB of vram for a model that big, same amount of system ram if not.
@enlightened12345 8 месяцев назад
Windows available
@yasufadhili 11 месяцев назад ⁺¹
Could u pliz be having any resources to get started pliz..
Any video on that
@mak448a 11 месяцев назад ⁺¹
What kind of resources?
@charlesmayer2047 10 месяцев назад
LM studio is just better dont do this unless you have specific intentions in mind
@prezlamen Месяц назад
Always use bigger fonts for RUclips, not all of us have eyesight like you.
@quadsis5569 3 месяца назад
bro looks like those ai video craetions for presantations
@philtoa334 11 месяцев назад
Thx_.
@ptrdim 8 месяцев назад ⁺¹
no information whatsoever
@ralvarezb78 11 месяцев назад
64 GB requeriment is aberrant.
@iegorshevchenko8365 5 месяцев назад
+++++
@tonywhite4476 11 месяцев назад ⁺²
Do these guys hate Windows users? JK...maybe not. C'mon with the Windows version already. Ubuntu is great but i need Windows!!!
@aminrazeghi2962 11 месяцев назад ⁺¹
Ollma is the easiest to set up. But no windows yet. There are many other alternatives such as lm studio, llama.cpp and llama.cpp-python and gpt4all.
@mak448a 11 месяцев назад ⁺¹
If you're on Windows, try LM-Studio. You could probably also compile ollama yourself for Windows, although it's probably not going to be the best experience.
@uberdan08 11 месяцев назад ⁺¹
Im restricted to windows because alot of software i use for work requires it. Give WSL a try though.
@abdulahamer6238 11 месяцев назад ⁺¹
Agreed get WSL
@jabeztadesse 6 месяцев назад
O! Lama!

Следующие

Автовоспроизведение