Build Open Source "Perplexity" agent with Llama3 70b & Runpod - Works with Any Hugging Face LLM!

Data Centric

Просмотров 5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 25 июн 2024
In this video, you'll learn how to build a custom AI agent using the powerful Llama 3 70b model deployed on Runpod using vLLM. This method is also compatible with any Hugging Face LLM, providing flexibility and scalability for your AI projects.
Need to develop some AI? Let's chat: www.brainqub3.com/book-online
Register your interest in the AI Engineering Take-off course: • Building Chatbots with...
Hands-on project (build a basic RAG app): www.educative.io/projects/bui...
Stay updated on AI, Data Science, and Large Language Models by following me on Medium: / johnadeojo
GitHub repo: github.com/john-adeojo/custom...
vLLM blog: blog.vllm.ai/2023/06/20/vllm....
Can You Run it: huggingface.co/meta-llama/Met...
Runpod Template: runpod.io/console/deploy?temp...
Custom agent deep dive: • Build your own Local "...
Chapters
Introduction: 00:00
Inference Server Schema: 01:40
Determine memory requirements: 04:50
Deploying server on Runpod: 07:32
Using the inference server with the agent: 16:30
Demoing the custom agent: 19:35
Наука

Комментарии • 36

@donconkey1 18 дней назад
Thank you for the excellent video. I appreciate all the detailed steps for setting up a vLLM (virtual Large Language Model) on RunPod. It's a cost-effective alternative to purchasing an expensive PC, which could break the bank.
@wadejohnson4542 Месяц назад
WORKED as advertised! Well done, John. Thank you.
@dierbeats Месяц назад
Good stuff as always, thank you very much.
@jeffreypaarhuis8169 29 дней назад
Great video again. Can't wait for you to try and run the coding models.
@aimademerich Месяц назад
Phenomenal
@emko6892 Месяц назад ⁺¹
Nice video !! But it seems most of your videos on AI agents is around web search.
@gileneusz 18 дней назад
please check "Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models"
@kreddy8621 Месяц назад
Thank you very much, great content
@Data-Centric Месяц назад
You're very welcome!
@Schlemma002 Месяц назад
Hey, amazing content! I was just wondering, you deploy the pods "On-Demand", does that mean you only pay the GPU time you actually needed it? Or does it cost you as long as the pod is running because the GPU is reserved for you or something like that?
@Data-Centric Месяц назад
Thank you! Regarding your question, the example in this tutorial charges hourly. However, they do also provide a serverless deployment Going the serverless route means you pay nothing when the GPU is idle. Here's the doc www.runpod.io/serverless-gpu
@SeattleShelby Месяц назад
Wow - big difference between the 8b and 70b models. Do you think the 70b models are good enough for agents?
@supercurioTube Месяц назад
Nice reault here with Llama3 70b fp16.
The whole time I was thinking "what about groq?" however. Since the inference for the same model appears to be free.
@alchemication 28 дней назад
Groq has very low rate limits atm. But yeah speed is amazing
@mikew2883 Месяц назад ⁺¹
Good stuff! 👍 So would this be considered just as secure as hosting on Azure? I mean would your company data be sequestered in its own virtual machine environment?
@Data-Centric Месяц назад ⁺³
Great question. At its core, RunPod is a platform that orchestrates GPU resources. The GPUs themselves are provided by third-party data centers. This is what I pulled from their compliance doc:
“End-to-end Encryption: Data in transit and at rest is encrypted using industry-leading protocols. This ensures that your AI workloads and associated data remain confidential and tamper-proof.”
“Compliance Adherence: Different data centers might have varying compliance certifications. While we ensure that all our partners uphold stringent standards, the specifics of each compliance are directly managed by the respective data center.”
Here’s the doc if you want to read further:www.runpod.io/compliance
@mikew2883 Месяц назад
Awesome, thanks for the info!.👍
I just found what I was looking for with your link. Here are a list of compliances regarding data security.
List of Certifications
It's vital to understand that while RunPod does not directly hold certifications like SOC 2, ISO 27001, or GDPR, many of our partner data centers do. Here's a quick snapshot of many of the certifications our data centers hold:
ISO 27001
ISO 20000-1
ISO 22301
ISO 14001
HIPAA
NIST
PCI
SOC 1 Type 2
SOC 2 Type 2
SOC 3
HITRUST
GDPR compliant
@MyrLin8 Месяц назад ⁺¹
excellent :) thanks, how much GPU do you actually need, ? other than a service.
@publicsectordirect982 Месяц назад
Same question
@jarad4621 Месяц назад
Great vid thanks. Please test the new Microsoft Phi 3 medium etc as agents that might work well as it's much better than. Llama 8b
@Data-Centric Месяц назад
I'll be doing a series of test for a variety of open source model Phi will be on the list.
@jarad4621 Месяц назад
@@Data-Centric Awesome thanks, on a recent video i saw something interesting, the lady mentioned the mistral 7b model makes a great agent for some reasons like architecture and native function calling i think, i see a new one was just released, apparently as agent it works better than other popular local ollama ones, but obviously not the 70b level
@RayWrightRayrite Месяц назад
Thanks for the video! How do you find out how much compute time/cost of the queries you run?
@Data-Centric Месяц назад ⁺¹
For this deployment pattern pods are running 24/7 until you stop them. So your compute cost is charged hourly (quoted when you deploy a pod!). You could probably work out cost per query from the hourly cost.
@RayWrightRayrite Месяц назад
@@Data-Centric so basically it is time based then where you start at the beginning on the request and then stop when the request has completed, so it goes off the elapsed time between start and stop?
@figs3284 Месяц назад
Is there anyway you can add a config option in your github for using runpod serverless? It seems like it could be better when doing inference cost wise.
@Data-Centric Месяц назад
I'll look into this!
@NoCodeFilmmaker Месяц назад
I'm curious bro why you chose Runpod over lighting ai?
@Data-Centric 29 дней назад
No reason other than I haven't used lighting ai.
@watchdog163 Месяц назад
Cost?
@octadion3274 29 дней назад
why i cant connect to http 8000 ?
@jarad4621 Месяц назад
I was wondering if this is really going to be cheaper then say using openrouter or together ai lama 70b at 80c per million tokens I been running thousands of apia calls on a quite a bit of data and userd lea the a dollar on the API I'm wondering if 2$ per hour is going to be cheaper I guess if you running agents continuously for hours they can do unlimited stuff in that house then the GPU rental maybe best and API per token will cost more right I think the only way to know is to test and compare right? Also it's possible that the APIs are quantasized more then the runpoid version do you would get better results from the on demand rental, on demand means only when you use it right so you goto turn it off when done always? Unsure how these rentals work been looking at vast can rent and run vllm there as well and it's apparently the cheapest but when I checked prices for your setup where only about 20c cheaper per hour not a huge diff and I think vast has reliability concerns have you looked at it?
@robxmccarthy Месяц назад ⁺²
Honestly, it's pretty hard to compete with the API costs unless you are saturating the GPUs. GPU rental like runpod is great for defined tasks (like summarizing 10,000 papers or something like that.
@jarad4621 Месяц назад
@@robxmccarthy Awesome thanks for confirming!

Следующие

Автовоспроизведение

Can My Ollama Local WebSearch Agent (With Llama 3 8B) Beat Perplexity AI?