Run your Own Private Chat GPT, Free and Uncensored, with Ollama + Open WebUI
HTML-код
- Опубликовано: 29 июн 2024
- Run an Uncensored Private ChatGPT Clone on your Computer for Free with Ollama and Open WebUI
In this video, we'll see how you can use Ollama and Open Web UI to run a private ChatGPT copycat locally for free using open source large language models (LLMs) such as Meta's Llama2 and Mistral.
This lets you try out different models, and even use uncensored models.
Don't send your private data to OpenAI's ChatGPT or Anthropic's Claude.ai, keep it private on your pc or mac.
👍 Please like if you found this video helpful, and subscribe to stay updated with my latest tutorials. 🔔
❤️ You can support this channel by buying me a ☕: buymeacoffee.com/codesfinance
For written instructions and additional details, check out my blog post: vincent.codes.finance/posts/o...
🔖 Chapters:
00:00 Intro
00:50 Ollama
03:54 Ollama CLI
07:15 Open WebUI
07:48 Open WebUI & Docker Install
10:26 Open WebUI Chat
13:23 Modelfiles and Prompts
16:10 Outro
🍺 Homebrew installation commands:
brew install ollama
brew install docker # For the background process
brew install --cask docker # For Docker Desktop
🔗 Video links:
Ollama: ollama.com/
Open WebUI GitHub: github.com/open-webui/open-webui
Open WebUI: openwebui.com/
Docker: www.docker.com/
🐍 More Vincent Codes Finance:
- ✍🏻 Blog: vincent.codes.finance
- 🐦 X: / codesfinance
- 🧵 Threads: www.threads.net/@codesfinance
- 😺 GitHub: github.com/Vincent-Codes-Finance
- 📘 Facebook: / 61559283113665
- 👨💼 LinkedIn: / vincent-codes-finance
- 🎓 Academic website: www.vincentgregoire.com/
#ollama #llama3 #openwebui #chatgpt #llm #gpt #opensourceai #llama2 #mistral #bigdata #research #researchtips #uncensored #private #claude #anthropic #llama #openai #opensource - Наука
👍 Please like if you found this video helpful, and subscribe to stay updated with my latest tutorials. 🔔
❤ You can support this channel by buying me a ☕: buymeacoffee.com/codesfinance
Thanks for this in depth video on this. So far you are the only one who has explained the whys and how to download Docker. Will subscribe for more.
Thanks, glad you found it useful!
Great video, very informative! Thank you for sharing!
Glad you enjoyed it!
Thanks this is exactly what I was looking for.
Happy to help! Make sure to try the new Llama 3 that was recently released, it's a big step up from llama2 in the video.
Justo lo que estaba buscando. Muchas gracias
¡De nada, qué bueno que te sirvió!
Keep the good work going 5 Star!!!
Thanks! Glad you enjoyed!
THIS IS SO AWESOME TY SO MUCH!!
Glad you enjoyed!
Thank you this is the best video explaining how to do this?
Thanks for sharing! Running your own private chat with Ollama and Open WebUI sounds interesting!
It is! In my latest video I show a new even simpler UI called Msty, which is a simple desktop app that comes bundled with Ollama (or can use your existing Ollama). If you don't mind using a closed-source UI, it's the nicest one I've seen so far: ruclips.net/video/REEYqYEtqAc/видео.html
@@VincentCodesFinance That sounds cool! It's awesome that you're checking out simpler UI options. Thanks for sharing!
Thank you for sharing, looking forward to testing this on my Debian workstation :-)
Glad you enjoyed! Did you manage to get it working?
awesome, it works on my main win-10 pc which is now getting a little old. (i7 4790k, rx 580, 16gb ddr3). gonna build a homeserver pc running linux with 64gb ram soon, hopefully that can handle it better. definitely slower than chat gpt rn. runs a little faster if i run it directly out of the terminal instead of using docker/openwebui
For a linux pc you'll also want to have a gpu with a lot of ram. On mac the ram is shared between cpu and gpu so there is only one ram number to care about.
Thanks for the video. Just a question about your keyboard, what are keycap, switch, and lube steps?
I have a few different keyboards. The one I was using while recording this video is a Keychron K8 with Gateron brown switches (no custom lube).
Thank you for this walk-through.
At 01:43 you spoke of ‘chat’ tags:
Two things:
1. There was a ‘text’ line three lines above ‘chat,’ what benefits come with ‘text.’ command line instruction?
2. I did not notice at what point you copied and pasted the ‘chat’ command line instruction.
Text models are optimized for text completion instead of chat-style querying. In most uses cases à la ChatPGT that we usually think of, the chat variant is the one you want.
2.You can copy any of the ones that are there (you should use llama3 no, not llama2, or one of the newest uncensored models.) The only thing to be aware of if that the command that gets copied is "ollama run modelname" instead of "ollama pull modelname". The run command will trigger a pull if necessary and the model will be downloaded, but run will also load the model in memory and make it available in the command line.
Hi. Amazing video! WOW! By the way a guy on RUclips called Network Chuck copied your video and idea and tutorial 😠
Thanks, glad you enjoyed! I saw his video, it's the same topic but I can't claim that he "copied" my video. In any case, looking at my stats, a lot of my views are coming from people watching his video, so I'm winning as well 😀
@@VincentCodesFinance ❤️
These models are trained with a limited amount of information. So it doesn't learn from chatting with users, from their questions, their prompts or tasks they give him to solve, does it. It's trained once and then is given out to the public. It means that we are actually still very far away from real artificial inteligence that gathers own experience, learns from its own errors, etc. To train it to make it better you still need a company like meta and their comp farms, at least when it comes to big models like LLMs. Can someone correct me if I am wrong here?
You need large infrastructure to train a base model like llama2, but what you are describing can be achieved (in theory) with fine-tuning. I'm still early in my fine-tuning experiments. If you want to try for yourself and are on mac, this is a pretty decent tutorial: apeatling.com/articles/simple-guide-to-local-llm-fine-tuning-on-a-mac-with-mlx/
I have found tweaking how you use documents in owu you can have it learn (sorta) it knows the data, and use it with its existing in and long running interaction.
You can’t run it anyway
This does work! Pretty nerdy! Umm... so I tried the ollama2-uncensored (because why would I choose to keep information from myself?? duh). Yeah, it's...not very bright:
>>> How many grams in 1L of water?
There are approximately 1000 milliliters in one liter of pure water, so one liter contains about 4200 g (or approximately 27.5 kg) at room
temperature.
>>> You are very wrong. In 1000 milliliters there are 1000 grams of water which is exactly 1 kg.
Thank you for correcting me! You are absolutely right that one liter of pure water at room temperature contains exactly one kilogram (or
approximately 2.2 pounds) by mass. However, the initial question asked about grams in a specific volume unit, so I responded accordingly.
>>> How many grams in 1 cubic centimeter of water?
As mentioned earlier, there are approximately 42 g (or approximately 2 mg) in one cubic centimeter of pure water at room temperature.
Yes, llama2-uncensored in based on the llama2 7B q4 model, which is fast but not very powerful. They also have the wizard-vicuna-uncensored, also based on llama2, that has a 30B version which should be a bit better.
@@VincentCodesFinanceThank you! Will try it!
>>> Give the weight of 1L of water, in grams.
The weight of 1 liter (or 1000 milliliters) of water is approximately 981 grams.
>>> Does 1cc of water weight 1g?
No, 1 cubic centimeter (cc) of water does not weigh exactly 1 gram. The weight of 1 cc of water depends on its temperature and pressure. At standard temperature and pressure
(STP), which is 0°C (32°F) and 1 atmosphere (atm), 1 cc of water weighs approximately 1.007 grams.
>>> How many cc's are in 1L?
There are 1000 cubic centimeters (cc) in 1 liter (L) of water. So, if you have 1 liter of water, you also have 1000 cc of water.
>>> What is 1000 x 1.007g?
If you multiply 1000 by 1.007 grams, you get 1007 grams. Therefore, the weight of 1000 cc of water at standard temperature and pressure (STP) would be approximately 1007 grams.
>>> Give the weight of 1L of water, in grams.
The weight of 1 liter (or 1000 milliliters) of water is approximately 981 grams.
...Well, it did say "approximately". :)
👏👏
Hi and thanks for the video. Will there be a way to parse full documents to get a summary of a particular section, for example? Any ideas?
I haven't had any success with this kind of task within Open WebUI. It should be possible to do it programmatically through the Ollama API, but it's definitely not as user friendly...
While I haven't found a way with Open WebUI, in my latest video I show how you can build your own app to do it: ruclips.net/video/Tnu_ykn1HmI/видео.html
I have been running llava on my Pi5 too. This might be away to run ollama on another Pi5. These LLMs give the Pi5 a serious workout so running it on another headless networked Pi5 would be better.
I can imagine that this is stretching the Pi5 to its capacity. I only have a Pi4 so I never even considered it. What do you use it for on your Pi5, as a server for chat like this or more of an OpenAI API replacement?
thank you
You're welcome, glad you found it useful!
nice one,
A question was how to move it to production OR (after proof of concept) give it to client as project ??
Thanks
I'm only using it locally so I haven't considered that bit yet. Open WebUI would be the easy part because it comes as a container (their github pages has instructions on setting it up with Ollama on a remote server github.com/open-webui/open-webui). As for Ollama, they do have an official Docker image as well, but I haven't tried it yet: ollama.com/blog/ollama-is-now-available-as-an-official-docker-image
An odd question for someone wanting to sell a product to someone. It seems like you should know more first
@@jerbear7952 How conceited. Ai is a new market with new players joining every day. What made you think others had it all figured out already?
Hi, I like your video. As an educator, I am newly acquainted with this Open Web UI. Kindly provide the minimum hardware specifications and RAM requirements necessary to enhance the speed of the outcome on my PC. Is having a graphics card and a stable internet connection necessary for achieving optimal program performance?
Hi, glad you found the video useful! As for requirements, you will need a good internet connection for downloading the models, but after that you should be able to work completely offline. I don't think you need a powerful graphic card to run the smallest models, but you will have to try the different models sizes to see which one run at a decent speed on your machine.
i work for a dental business and i want to import all their documents, like spreadsheets, pdfs, docs, text file, and ect into open ui as knowledge base. can i do that and have it save locally so that when i restart it that i do not have to import them again?
I haven't tried to build a setup as involved as this. I think you can save the loaded documents, but I only tried it as a single user, so I'm not sure if sharing is possible, and what types of files are supported besides pdf. Be aware however that this is a recent open source project under active development, I would be careful before loading any sensitive medical data in a server software that has not undergone a security audit
When I installed Docker it had some error associated with WSL, then when I try to run it ( 10:00 ) it shows me an error and I can't proceed with it.
Hum, I don't have much experience with WSL. Could it be related to this issue? github.com/docker/for-win/issues/13845
I even downloaded recently updated models, but they contain data from at least a year ago. and the data I need came out at the end of last year. how do I find out the approximate time of accumulation of data and their relevance?
When available, that info should be on the model card (either on ollama.com, or in the references). But even then it's not always the case. It's not an easy answer because they tend to use multiple datasets with different cutoff dates and there are multiple stages (training, fine-tuning, etc.) that can rely on more recent data. For example, from what I found online Llama2 has a training cutoff date of Sept 2022, but fine-tuning cutoff is July 2023...
@@VincentCodesFinance thanks. I also find models with dates of the end of 2023 and even 2024, but with the data up-to-date, the best is for the beginning of 2023, which is very late for me.
It won't let me run llama2 and mistral with ollama. It says each one is a "racial slur used against people of African descent." Is there a workaround for this?
Base model safeguards can be quite sensitive. Have a look at uncensored models.
@@VincentCodesFinance Any suggestions? It seems every one I try garners similar results.
@@bobbaganush1 dolphin-mixtral is the most popular one currently. It is based on Mixtral so it is somewhat heavy on RAM usage. Have you tried that one?
Is there any way to summarize documents using your method?
Not that I have been able to figure out. As far as I know, Open WebUI supports semantic search within your documents, but not operations on whole documents such as summary. That is definitely something I will be looking into, if I find a decent solution I'll make a video about it.
While I haven't found a way with Open WebUI, in my latest video I show how you can build your own app to do it: ruclips.net/video/Tnu_ykn1HmI/видео.html
🙏🙏
How can I have a beautiful terminal like yours?
I'm using iTerm2 with Oh My Zsh github.com/ohmyzsh/ohmyzsh with the agnoster theme (another popular theme is powerlevel10k). You will also need to set your terminal font to a Nerd Font www.nerdfonts.com/ . I think I have the Fira Code font in that video.
@@VincentCodesFinance thanks a lot 👍👍
I want a method for an Android phone
I'm sure it's coming eventually, but with the current state of these models, that would drain your battery very quickly.
@@VincentCodesFinance OK thanks
@8:22 how do billions of webservers run around the world without docker? You do NOT need docker to run a webserver...
I totally agree, I was simplifying a bit there. But using Docker does make things easier to manage, especially when the package maintainers (Open WebUI in this case) push that as their preferred installation method.
but before you need to tell us that to run these models on own computer we need ram of more than 46gb and GPU ....that costs expensive
You're right, large models like Mixtral have heavy on RAM and perform better on GPUs. However, requirements for running smaller models like Gemma or Llama2 are much lower, and they can run on CPU (at lower speed). If you want to run everything locally your hardware will dictacte what you can run.