How to Speed Up Large Language Models Using Groq AI Platform

Поделиться
HTML-код
  • Опубликовано: 15 окт 2024

Комментарии • 9

  • @MarryWills
    @MarryWills 5 месяцев назад +1

    Am getting real time response. Great tutorial

  • @aiforyounow
    @aiforyounow 5 месяцев назад

    This is insanely fast. Thanks for this tutorial

  • @MuhammadAdnan-tq3fx
    @MuhammadAdnan-tq3fx 5 месяцев назад +1

    This solution only for online inference. I want to run offline mode than should i do??

    • @tech_watt
      @tech_watt  5 месяцев назад +1

      LLMs are computational expensive so to rub them locally you need more resources. You can consider running them locally if you’ve more computational power to handle their needs

    • @MuhammadAdnan-tq3fx
      @MuhammadAdnan-tq3fx 5 месяцев назад

      I have enough compute power infact i have 3 rtx 8000 and total compture power is 144GB . I run llama-3 quantize model in offline mode .the size of model is just 40 gb but the problem is that my inference time is so hight i want to reduce the inference time. Is it possible to use groq offline and is an other option ls availabale?

    • @MuhammadAdnan-tq3fx
      @MuhammadAdnan-tq3fx 5 месяцев назад

      Compute power 144gb

    • @antonvinny
      @antonvinny 5 месяцев назад +2

      @@MuhammadAdnan-tq3fx Groq uses their proprietary hardware (lpu) and not gpu. The optimization is hardware level.

    • @MuhammadAdnan-tq3fx
      @MuhammadAdnan-tq3fx 5 месяцев назад +1

      @@antonvinny how can i ise groq offline??