Secrets to Self-Hosting Ollama on a Remote Server

Mervin Praison

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 28 май 2024
👋 Hey Tech Enthusiasts! Today, I'm thrilled to share a complete guide on self-hosting the Llama 3 language model using Google Cloud! Whether you're using GCP, AWS, or Azure, the concepts remain the same. 🌐💻 Secrets to Self-Hosting Ollama on a Remote Server
🔧 What You'll Learn:
Creating a Linux VM: How to set up a virtual machine with GPU support on Google Cloud.
Installing Llama: Step-by-step instructions on installing and activating Llama 3 on your VM.
Remote Access Activation: Tips on how to make your Llama server accessible and secure.
UI Integration: How to build and integrate a chatbot user interface to interact with your Llama model.
🎬 In this video, I take you through each step, from VM creation to the exciting part of chatting with your own AI. Don't miss out on learning how to fully control your AI’s environment and keep your data in-house. Perfect for developers and tech enthusiasts looking for hands-on AI deployment experience!
👍 Like this video if you find it helpful and subscribe to stay updated with more content on artificial intelligence and technology. Ring that bell for notifications on new uploads!
🔗 Resources:
Sponsor a Video: mer.vin/contact/
Do a Demo of Your Product: mer.vin/contact/
Patreon: / mervinpraison
Ko-fi: ko-fi.com/mervinpraison
Discord: / discord
Twitter / X : / mervinpraison
Code: mer.vin/2024/05/ollama-remote...
📌 Timestamps:
0:00 - Introduction
1:32 - Setting Up a Virtual Machine with GPU
2:01 - Adjusting VM Settings and Storage
2:17 - Installing NVIDIA Drivers
4:02 - Installing Ollama
4:30 - Enabling Remote Access
6:53 - Final Steps: App Integration and Testing
Key Points for Description:
Highlight the self-hosting aspect of Llama 3 using Google Cloud, which can be adapted for other platforms like AWS or Azure.
#OLlama #Host #RemoteServer #HowToInstallOLlama #InstallingOLlama #PublishOLlama #OLLamaDeploy #DeployOLlama #DeployOLlamaServer #DeployOLlamaGoogleCloud #GCPOllama #GCloudOLlama #GoogleCloudOLlama #OLlamaGoogleCloud #OLlamaAzure #OLlamaAWS #OLlamaLlama3 #Llama3 #Llama3Deploy #Llama3Publish #Llama3GoogleCloud #SelfHostLlama3 #SelfHost #Host #Hosting #HostOLLama #HostLlama #HostingOLLama #OLlamaHosting
Хобби

Комментарии • 57

@thebluefortproject 3 дня назад
great video, thanks a lot
@AlloMission 17 дней назад
Thanks really amazing! As always. You do a great job.
@MervinPraison 17 дней назад
Thank you :)
@60pluscrazy 17 дней назад
Mervin keep up the good work. Significant videos 🎉🎉🎉
@MervinPraison 17 дней назад
Thanks :)
@christophercelaya 17 дней назад
I love these type of projects!
@MervinPraison 16 дней назад ⁺¹
Thank you
@christophercelaya 16 дней назад
@@MervinPraison likewise for all you do
@rccmhalfar 17 дней назад
so much for self-hosting! GCloud!
@MervinPraison 17 дней назад
Choose Google cloud as an example. Same procedure you can follow on any computer to set it up
@rccmhalfar 17 дней назад ⁺¹
Still respect for delivering such content. What I am after is to create an automatic ollama deployment on cloud provider which bills per minute and after the prompt is consumed the machine is shut off so to save on consumption and this way you would be billed for the minutes used - would like to see that
@JoeSmith-kn5wo 17 дней назад
I will say automating the deployment is pretty straightforward, but shutting down the server after an API call does not make much sense. You would no longer be able to send API request to the Ollama server after the server is shutdown. You would manually need to restart the server that is hosting Ollama.
@jmsdvs 17 дней назад
Great video! Would love a tutorial on installing a server on your home PC to access from anywhere! Thanks again!
@MervinPraison 17 дней назад ⁺¹
You could just follow the same steps on your home PC. To make your home PC as the server.
I choose Google cloud just as an example.
@jmsdvs 17 дней назад ⁺¹
@@MervinPraison I figured that was the case, I'm a little bit of a noob when it comes to servers, I appreciate the response!
@sophiedelavelle5958 17 дней назад ⁺¹
This is amazing ahah
@MervinPraison 17 дней назад
:) 🙏
@jeetendrachauhan3236 17 дней назад
I did same experiment yesterday with aws EC2 instance 32 GB Memory (without GPU), And output was amazing.
@MervinPraison 16 дней назад
Nice
@classictablet9149 3 дня назад ⁺¹
how many concurrent calls does this accept? can you please comment on this topic?
thanks
@ben_stace 17 дней назад
What version of llama are you running 8 or 70? Thank you as well for the great video.
@MervinPraison 17 дней назад ⁺¹
I used llama 8B
@mikew2883 17 дней назад
This is great! Have you been able to setup the Ollama Web UI remotely as well?
@MervinPraison 17 дней назад ⁺²
Yes it should be easy. May be I’ll plan to create a video about ollama web UI and remote setup
@mikew2883 17 дней назад
@@MervinPraisonThat would be awesome! 👍
@farexBaby-ur8ns 17 дней назад
Think better to buy or make a machine to host the ai server. But didn’t know there is costlier way to do this via Google cloud . Also was familiar with using openui, but never seen chainlit option. So good value with this vid.. kudos
@MervinPraison 17 дней назад
Yes you can buy a machine yourself and configure it. But the Graphic Cards might cost a lot and managing an AI server is Tedious. Also it is not easily scaleable.
@ckgonzales16 11 дней назад
Can I use firebase instead of gcloud
@hasstv9393 17 дней назад ⁺¹
Is it possible to make it sass ready?
@MervinPraison 16 дней назад ⁺¹
This is a starting point. Also implement enough security to make it production ready
@hasstv9393 16 дней назад ⁺¹
@@MervinPraison Can you show how to do that, So that we can provide this software as a service!
@jiuvk8393 16 дней назад
is the "$204" exactly what you have to pay regardless of how many people use the app per day or per month?
@MervinPraison 16 дней назад
Once the number of users increases massively then you might need to increase the spec. That would result in costing more.
But this is a good starting point
@BamiCake 14 дней назад
So in essence it would cost up $2400 a year to self host an LLM on GCP?
@collinsalomon 17 дней назад ⁺¹
amazing video ! but you need to turn off your vm !!! :)🤣
@MervinPraison 17 дней назад
Thanks for letting me know. I did turn off after recording the video :)
@Epirium 17 дней назад
Make video on hosting ollama for Free Remotely ?
@phutrinh686 16 дней назад
how much per month for host it? commercial deficit will kill your wallet. no tks.
@williamwong8424 17 дней назад
is the API key really just fake api key? we dont need to find where to get the api key?
@williamwong8424 17 дней назад
and we just need the base url will do?
@MervinPraison 16 дней назад
Just any key is fine. It’s not locked based on api key
@basilbrush7878 17 дней назад
Nice idea, but surely, it's cheaper, more efficient to use Groq?
@MervinPraison 17 дней назад ⁺¹
Yes, possibly. But some people prefer to have more control end to end for more security. Also this can be used for testing, for users within a private network and for more privacy
@impactsoft2928 17 дней назад
google cloud free
@MervinPraison 17 дней назад
They should provide approx $300 credits to use in Google cloud to get started.
@jobasti 17 дней назад ⁺⁶
I really wanted this video to be traching how to host your own Ollama instance for non tech savy people - I really tried to like this video for what i could be. But sadly the video ist like you deliverd it to us ..... Honest feedback: This video feels more like a G-Cloud ad video with no explanation what so ever .... You leave a lot of unanswered questions: WHY Google-Cloud when the Title is Self-Hosted? (Self-Hosted means YOU host something YOURSELF) Why show 204$ a month first then add options and then place your camera over the final price? Why Ubuntu 22.04 LTS when 23.10 will have never gpu drivers? Why say Copy and Pste 5 times in 15 Seconds ... why not just explain what you do or dont say anything? Why not give a "security warning" that copy pasting that ollama command as root could be dangerous, read the script code first? Why choose a Tesla T4? How many Tokens does it have? WHY did you choose this gpu which makes it so expensive? Why did you change for 10GB to 100GB? How many models do you want to host there? How big are the models? Where is the information for the user to make an informed and education decision from you? If you use Chainlit, explain beforehand what this is please. - Nice Security warning around the firewall and network topic, thanks for that! - I really dont want to come over as an a**hole but other videos of you have been much more detail oriented and better planned and excuted for me as a viewer and an DevOps person. Thanks for the time!
@d.d.z. 17 дней назад ⁺¹
I'd like to see Marvin's response
@MervinPraison 17 дней назад ⁺⁷
Thanks for your feedback. All your questions are very valid. My original intent is to show how to host Ollama on remote server for beginners.
I could have just showed the remote server and focused on setting up Ollama. But just to show each and every step I did end-end, I had to cover even Google Cloud setup.
Google cloud is an option, it could be any server. Even it could be your own Local PC as a server, all other steps remains similar.
Why Google Cloud when the title is "Self-Hosted"?
"Self-hosted" generally means running services on servers you control, but it can include cloud servers that you manage, like Google Cloud.
Why show $204 a month first then add options and then place your camera over the final price?
The initial price shown is the base rate, with added options increasing the cost.
Why Ubuntu 22.04 LTS when 23.10 will have newer GPU drivers?
Ubuntu 22.04 LTS is a Long-Term Support release, offering stability and support for 5 years, which is preferable for servers over newer, less tested versions. This is based on my various research and some testings. This will surely change in the next few months.
Why say "Copy and Paste" multiple times in a short period without further explanation?
This aims to keep the tutorial short and to focus on Ollama, rather than on Nvidia Driver Setup. Here are the things I copied to install Nvidia Driver cloud.google.com/compute/docs/gpus/install-drivers-gpu#secure-boot
Why no security warning about copying and pasting commands as root?
This is a valid concern. Running scripts as root can be risky. Always read and understand scripts before executing them, especially with root privileges.
Why choose a Tesla T4 GPU? How many tokens does it have?
I choose Tesla T4 because it was one among the cheapest and costs less to produce this video demo. A100, H100's are the best.
Why did the storage change from 10GB to 100GB?
Lllama3 8B takes approx 5GB for itself, and other tools will take 5GB approx.
So had to increase to 100GB to have enough storage if required.
Where is the information for the user to make an informed and educated decision?
I understand that Video doesn't explain about Graphic cards, Detailed Nvidia driver setup because when I started recording the video my intension is just to show how to setup Ollama on a Remote server and integrate it with a local application.
Yes, I should have explained what is Chainlit, valid point. Thanks for letting me know and for your detailed feedback.
@jobasti 15 дней назад ⁺¹
@@MervinPraison "Thanks for letting me know and for your detailed feedback." Sure anytime! - Dont get the wrong impression of my many questions, i like what you are doing! Please keep up the good work! I just like bit more detail if that is possible =)

Следующие

Автовоспроизведение