My question is why would u use runpod and still pay their rate when you can just throw a llama 405b model or whatever model in a aws server and deploy it yourself and only being charged for hosting that aws server which would be probably cheaper and probably is what runpod is doing anyways.
Generally they are doing the same thing yes, I just made this video on runpod because it is a little simpler to setup compared to AWS. It can be daunting to some people, but I don't disagree with you. I haven't ran the prices though to see! The better the model though the higher the costs, just need to be careful of that though
How using Runpod serverless and pods differ in this use case, considering eg. costs? How can we minimize our costs eg. with stopping running the pod after usage?
Well, the idea is if you don't have a local machine that can run models well (if at all), then depending on the model you need, you can 'rent' a cheap server on this platform. The one I have in the example if it's up and running, was .79 per hour. If I stop it, then it says it costs .0006 per hour. So the cost of holding it until you want to run it again without actually TERMINATING it, is minimal. I will look into the scheduling (if its possible) of the servers so like in AWS you can have it run for a certain amount of time per day
Since you can run a python file there in runpod, I’m assuming you can also serve a gradio ui from there? Kinda like in your RUclips service video. I really appreciate all of your hard work on your channel. One of my favorite ag centric channels.
yeah so I think like, if you had an api to retrieve something from the runpod.io llm, and then bring it locally for anything, then absolutely. You would just need the url for the runpod to grab the request. Hope that made sense. I do plan on having a video where we have something more 'production' ready
My question is why would u use runpod and still pay their rate when you can just throw a llama 405b model or whatever model in a aws server and deploy it yourself and only being charged for hosting that aws server which would be probably cheaper and probably is what runpod is doing anyways.
Generally they are doing the same thing yes, I just made this video on runpod because it is a little simpler to setup compared to AWS. It can be daunting to some people, but I don't disagree with you. I haven't ran the prices though to see! The better the model though the higher the costs, just need to be careful of that though
Can you do a new guide for text gen ui as well please? The bloke doesnt work anymore.
when you close the terminal tab, the model stops running. That is not cool, the endpoint should always work unless I choose to shut down the pod
How using Runpod serverless and pods differ in this use case, considering eg. costs? How can we minimize our costs eg. with stopping running the pod after usage?
Well, the idea is if you don't have a local machine that can run models well (if at all), then depending on the model you need, you can 'rent' a cheap server on this platform. The one I have in the example if it's up and running, was .79 per hour. If I stop it, then it says it costs .0006 per hour. So the cost of holding it until you want to run it again without actually TERMINATING it, is minimal.
I will look into the scheduling (if its possible) of the servers so like in AWS you can have it run for a certain amount of time per day
Since you can run a python file there in runpod, I’m assuming you can also serve a gradio ui from there? Kinda like in your RUclips service video. I really appreciate all of your hard work on your channel. One of my favorite ag centric channels.
Yes you should be able to do that for sure
Yes you absolutely should be able to do this! Thank you I appreciate it 👍
Hi what is the difference between this method and using vllm I saw in runpod data centric video which way is better
Is it possible to host the server here, or is the run pod just used for fine tuning and training models
you can absolutely host a server here!
Just found out how and got it, apparently you need to host it on port 80 but apparently I didn’t select that option when I made the GPU.
Ah gotcha I’m glad you got it figured out 👍
This is the sauce! Thanks you! 🙏🏾
Thank you 🙌
Is it possible to use a model on the server and parse it to the local Ollama to use it in any software locally?
yeah so I think like, if you had an api to retrieve something from the runpod.io llm, and then bring it locally for anything, then absolutely. You would just need the url for the runpod to grab the request. Hope that made sense. I do plan on having a video where we have something more 'production' ready
Thanks!
You are welcome!