Need a heavy GPU machine? Check out this video on setting AWS EC2 GPU instance. If you like this one check out my video on setting up a full RAG API with Llama3, Ollama, Langchain and ChromaDB - ruclips.net/video/7VAs22LC7WE/видео.html
OMG!!! I freaking love you, I've been struggling with deployment on AWS with llama and you've made it crystal clear. I'll do anything to support ur channel. UR THE BEST!!!
Thanks a lot for the video !! Question : Is it possible to start the instance only if we do a request to the server ? It can be usfull to limit the costs. I think it is feasable with kubernetes and docker, but i would enjoy a video about it :) ! Thnks again, very good video
@@fastandsimpledevelopment if i correctly understand you can select the base ubuntu 22.04 image and install all yourself: nvidia driver, cuda driver, tensorflow, python etc?
The Video was awesome and prety helpful but can you cover the security point of view too like anyone with the IP and portnumber can access it So how can we avoide that?
By itself it is not, you need to add a front end like Nginx and then have several Ollama servers running, that is the only way that I am aware today. There is new updates all the time to keep track of Ollama updates
Yes, if the OS has support and you have an AMD or Nvidia GPU installed and the latest version does auto-detect. You can also set it to NOT use the GPU in the Ollama config files but by default it does auto-detect.
@@adityanjsg99 Thanks for your input. I have not tried anything other than NVidia GPU's, I've finally decided to get a few 4090 boards and see how they run. I'm trying to build an on-prem system since there is no affordable cloud solution. I'll externalize the LLM API via ngrok, not what I wanted :(
Need a heavy GPU machine? Check out this video on setting AWS EC2 GPU instance. If you like this one check out my video on setting up a full RAG API with Llama3, Ollama, Langchain and ChromaDB - ruclips.net/video/7VAs22LC7WE/видео.html
Bro ❤ great tutorial. Quick and easy
OMG!!! I freaking love you, I've been struggling with deployment on AWS with llama and you've made it crystal clear. I'll do anything to support ur channel. UR THE BEST!!!
Thanks for the comments
Cannot wait for part two with LangChain! This video was fantastic
What a simple way to setup Ollama LLM with GPU support in only a few minutes, thanks!
Briliant! Its that simple only because you explained it simply :). Thank you!
Thanks, glad you enjoyed it!
I'm not sure if you mentioned in the video or not but you need to allow traffic to port 11434 on the AWS security group
Good catch. Thanks
Thank you so much! Your video helps me a lot. I am looking forward to your new video.
Thanks man a lot! Great video!
Glad you enjoyed it
Excellent. Thank you very much for sharing.
Thanks a lot for the video !!
Question : Is it possible to start the instance only if we do a request to the server ? It can be usfull to limit the costs.
I think it is feasable with kubernetes and docker, but i would enjoy a video about it :) !
Thnks again, very good video
How to add openwebui to it, and expose the openwebui to be accessible from macbook browser?
Thank you. This was helpful
can you also use the ubuntu 22.04 image and install cuda etc? why use this deep learning image?
I only select this AMI since it already has teh other code I need like Python
@@fastandsimpledevelopment if i correctly understand you can select the base ubuntu 22.04 image and install all yourself: nvidia driver, cuda driver, tensorflow, python etc?
The Video was awesome and prety helpful but can you cover the security point of view too like anyone with the IP and portnumber can access it So how can we avoide that?
How do you make it scalable ?
By itself it is not, you need to add a front end like Nginx and then have several Ollama servers running, that is the only way that I am aware today. There is new updates all the time to keep track of Ollama updates
So ollama detects and uses the GPU automatically?
Yes, if the OS has support and you have an AMD or Nvidia GPU installed and the latest version does auto-detect. You can also set it to NOT use the GPU in the Ollama config files but by default it does auto-detect.
@@fastandsimpledevelopment It detects only Nvidia GPU. I tried on AWS g4ad (AMD ) and g4dn.xlarge (Nvidia). Only the latter worked. This is FYI.
@@adityanjsg99 Thanks for your input. I have not tried anything other than NVidia GPU's, I've finally decided to get a few 4090 boards and see how they run. I'm trying to build an on-prem system since there is no affordable cloud solution. I'll externalize the LLM API via ngrok, not what I wanted :(
good vid!
thanks buddy
Cannot wait for part two with LangChain! This video was fantastic