Build Open Source "Perplexity" agent with Llama3 70b & Runpod - Works with Any Hugging Face LLM!

Поделиться
HTML-код
  • Опубликовано: 16 июн 2024
  • In this video, you'll learn how to build a custom AI agent using the powerful Llama 3 70b model deployed on Runpod using vLLM. This method is also compatible with any Hugging Face LLM, providing flexibility and scalability for your AI projects.
    Need to develop some AI? Let's chat: www.brainqub3.com/book-online
    Register your interest in the AI Engineering Take-off course: • Building Chatbots with...
    Hands-on project (build a basic RAG app): www.educative.io/projects/bui...
    Stay updated on AI, Data Science, and Large Language Models by following me on Medium: / johnadeojo
    GitHub repo: github.com/john-adeojo/custom...
    vLLM blog: blog.vllm.ai/2023/06/20/vllm....
    Can You Run it: huggingface.co/meta-llama/Met...
    Runpod Template: runpod.io/console/deploy?temp...
    Custom agent deep dive: • Build your own Local "...
    Chapters
    Introduction: 00:00
    Inference Server Schema: 01:40
    Determine memory requirements: 04:50
    Deploying server on Runpod: 07:32
    Using the inference server with the agent: 16:30
    Demoing the custom agent: 19:35
  • НаукаНаука

Комментарии • 36

  • @donconkey1
    @donconkey1 9 дней назад

    Thank you for the excellent video. I appreciate all the detailed steps for setting up a vLLM (virtual Large Language Model) on RunPod. It's a cost-effective alternative to purchasing an expensive PC, which could break the bank.

  • @wadejohnson4542
    @wadejohnson4542 24 дня назад

    WORKED as advertised! Well done, John. Thank you.

  • @dierbeats
    @dierbeats 23 дня назад

    Good stuff as always, thank you very much.

  • @jeffreypaarhuis8169
    @jeffreypaarhuis8169 20 дней назад

    Great video again. Can't wait for you to try and run the coding models.

  • @aimademerich
    @aimademerich 24 дня назад

    Phenomenal

  • @gileneusz
    @gileneusz 9 дней назад

    please check "Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models"

  • @emko6892
    @emko6892 24 дня назад +1

    Nice video !! But it seems most of your videos on AI agents is around web search.

  • @kreddy8621
    @kreddy8621 24 дня назад

    Thank you very much, great content

  • @SeattleShelby
    @SeattleShelby 23 дня назад

    Wow - big difference between the 8b and 70b models. Do you think the 70b models are good enough for agents?

  • @Schlemma002
    @Schlemma002 24 дня назад

    Hey, amazing content! I was just wondering, you deploy the pods "On-Demand", does that mean you only pay the GPU time you actually needed it? Or does it cost you as long as the pod is running because the GPU is reserved for you or something like that?

    • @Data-Centric
      @Data-Centric  24 дня назад

      Thank you! Regarding your question, the example in this tutorial charges hourly. However, they do also provide a serverless deployment Going the serverless route means you pay nothing when the GPU is idle. Here's the doc www.runpod.io/serverless-gpu

  • @supercurioTube
    @supercurioTube 22 дня назад

    Nice reault here with Llama3 70b fp16.
    The whole time I was thinking "what about groq?" however. Since the inference for the same model appears to be free.

    • @alchemication
      @alchemication 19 дней назад

      Groq has very low rate limits atm. But yeah speed is amazing

  • @mikew2883
    @mikew2883 24 дня назад +1

    Good stuff! 👍 So would this be considered just as secure as hosting on Azure? I mean would your company data be sequestered in its own virtual machine environment?

    • @Data-Centric
      @Data-Centric  24 дня назад +3

      Great question. At its core, RunPod is a platform that orchestrates GPU resources. The GPUs themselves are provided by third-party data centers. This is what I pulled from their compliance doc:
      “End-to-end Encryption: Data in transit and at rest is encrypted using industry-leading protocols. This ensures that your AI workloads and associated data remain confidential and tamper-proof.”
      “Compliance Adherence: Different data centers might have varying compliance certifications. While we ensure that all our partners uphold stringent standards, the specifics of each compliance are directly managed by the respective data center.”
      Here’s the doc if you want to read further:www.runpod.io/compliance

    • @mikew2883
      @mikew2883 23 дня назад

      Awesome, thanks for the info!.👍
      I just found what I was looking for with your link. Here are a list of compliances regarding data security.
      List of Certifications
      It's vital to understand that while RunPod does not directly hold certifications like SOC 2, ISO 27001, or GDPR, many of our partner data centers do. Here's a quick snapshot of many of the certifications our data centers hold:
      ISO 27001
      ISO 20000-1
      ISO 22301
      ISO 14001
      HIPAA
      NIST
      PCI
      SOC 1 Type 2
      SOC 2 Type 2
      SOC 3
      HITRUST
      GDPR compliant

  • @MyrLin8
    @MyrLin8 24 дня назад +1

    excellent :) thanks, how much GPU do you actually need, ? other than a service.

  • @jarad4621
    @jarad4621 24 дня назад

    Great vid thanks. Please test the new Microsoft Phi 3 medium etc as agents that might work well as it's much better than. Llama 8b

    • @Data-Centric
      @Data-Centric  24 дня назад

      I'll be doing a series of test for a variety of open source model Phi will be on the list.

    • @jarad4621
      @jarad4621 23 дня назад

      @@Data-Centric Awesome thanks, on a recent video i saw something interesting, the lady mentioned the mistral 7b model makes a great agent for some reasons like architecture and native function calling i think, i see a new one was just released, apparently as agent it works better than other popular local ollama ones, but obviously not the 70b level

  • @RayWrightRayrite
    @RayWrightRayrite 24 дня назад

    Thanks for the video! How do you find out how much compute time/cost of the queries you run?

    • @Data-Centric
      @Data-Centric  24 дня назад +1

      For this deployment pattern pods are running 24/7 until you stop them. So your compute cost is charged hourly (quoted when you deploy a pod!). You could probably work out cost per query from the hourly cost.

    • @RayWrightRayrite
      @RayWrightRayrite 24 дня назад

      @@Data-Centric so basically it is time based then where you start at the beginning on the request and then stop when the request has completed, so it goes off the elapsed time between start and stop?

  • @figs3284
    @figs3284 24 дня назад

    Is there anyway you can add a config option in your github for using runpod serverless? It seems like it could be better when doing inference cost wise.

  • @NoCodeFilmmaker
    @NoCodeFilmmaker 23 дня назад

    I'm curious bro why you chose Runpod over lighting ai?

    • @Data-Centric
      @Data-Centric  20 дней назад

      No reason other than I haven't used lighting ai.

  • @watchdog163
    @watchdog163 24 дня назад

    Cost?

  • @octadion3274
    @octadion3274 20 дней назад

    why i cant connect to http 8000 ?

  • @jarad4621
    @jarad4621 24 дня назад

    I was wondering if this is really going to be cheaper then say using openrouter or together ai lama 70b at 80c per million tokens I been running thousands of apia calls on a quite a bit of data and userd lea the a dollar on the API I'm wondering if 2$ per hour is going to be cheaper I guess if you running agents continuously for hours they can do unlimited stuff in that house then the GPU rental maybe best and API per token will cost more right I think the only way to know is to test and compare right? Also it's possible that the APIs are quantasized more then the runpoid version do you would get better results from the on demand rental, on demand means only when you use it right so you goto turn it off when done always? Unsure how these rentals work been looking at vast can rent and run vllm there as well and it's apparently the cheapest but when I checked prices for your setup where only about 20c cheaper per hour not a huge diff and I think vast has reliability concerns have you looked at it?

    • @robxmccarthy
      @robxmccarthy 23 дня назад +2

      Honestly, it's pretty hard to compete with the API costs unless you are saturating the GPUs. GPU rental like runpod is great for defined tasks (like summarizing 10,000 papers or something like that.

    • @jarad4621
      @jarad4621 23 дня назад

      @@robxmccarthy Awesome thanks for confirming!