LLM System and Hardware Requirements - Running Large Language Models Locally

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024
  • This is a great 100% free Tool I developed after uploading this video, it will allow you to choose an LLM and see which GPUs could run it... : aifusion.compa...
    Min Hardware requirements (up to 16b q4 models) (eg. Llama3.1 - 8b)
    RTX 3060 12GB VRAM : amzn.to/3M0HvsL
    Intel i5 or AMD Ryzen 5
    Intel i5 : amzn.to/3WGZtp3
    Ryzen 5 : amzn.to/46IigoC
    36GB RAM
    1TB SSD : amzn.to/4cBebEd
    Recommended Hardware requirements (up to 70b q8 models) (eg. Llama3.1 - 70b)
    RTX 4090 24GB VRAM : amzn.to/3AjIHow
    Intel i9 or AMD Ryzen 9
    Intel i9 : amzn.to/3YCeLxW
    AMD Ryzen 9 : amzn.to/3YIaUiT
    48GB RAM
    2TB SSD : amzn.to/3YFQ83A
    Professional Hardware requirements (up to 405b and more) (eg. Llama3.1 - 405b)
    Stack of A100 GPUs or A6000 GPUs
    amzn.to/3yojZ5T
    Enterprise grade CPUs
    amzn.to/3YDgByw
    amzn.to/4dEbfY2
    Welcome to our ultimate guide on running Large Language Models (LLMs) locally! In this video, we delve into the essential system and hardware requirements for setting up your own LLM workstation. Whether you’re just starting out or looking to upgrade, we’ve got you covered.
    We’ll explore the importance of GPUs and how VRAM affects your ability to run large models. Learn how different GPUs, from the entry-level RTX 3060 to the high-end RTX 4090, stack up in terms of handling LLMs. Discover how quantization techniques, including FP32, FP16, INT8, and INT4, can optimize performance and memory usage.
    We’ll also cover other critical components, such as CPUs, RAM, and storage, and explain their roles in managing LLMs. Get real-world examples of various setups, from budget-friendly options to high-performance configurations for advanced applications.
    By the end of this video, you’ll have a clear understanding of how to build an LLM workstation that fits your needs and budget. Start experimenting with powerful AI models locally and take your projects to the next level!
    Disclaimer: Some of the links in this video/description are affiliate links, which means if you click on one of the product links, I may receive a small commission at no additional cost to you. This helps support the channel and allows me to continue making content like this. Thank you for your support!
    #LargeLanguageModels #LLMs #RunningLLMsLocally #AIHardware #GPURequirements #VRAM #QuantizationTechniques #FP32 #FP16 #INT8 #INT4 #RTX3060 #RTX4060 #RTX3090 #RTX4090 #NVIDIAA100 #AIWorkstation #ModelOptimization #AIModels #Llama3.1 #AISetup #ComputingHardware #HighPerformanceComputing #DataProcessing #MachineLearning #AIResearch #TechGuide #SystemRequirements #RAMforLLMs #StorageforLLMs #NVMeSSD #HDDStorage #AIPerformance #QuantizedModels #AIHardwareSetup #AIPerformanceOptimization #opensource #llama3 #llama #qwen2 #gemma2 #largelanguagemodels #mistralai #mistral #localllm #llm #local #llama3.1 #llama3.1-8b #llama3.1-70b #llama3.1-405b #405b

Комментарии • 22

  • @YasirJilani-f1j
    @YasirJilani-f1j 9 дней назад +2

    very nice thanks, can you do a video on TPU?

  • @irocz5150
    @irocz5150 12 дней назад +1

    Apologies for the lack of knowledge..but why no AMD video cards?

    • @AIFusion-official
      @AIFusion-official  12 дней назад +1

      AMD Video cards work too, i didn't mention them in the video, but there are some good AMD GPUs for LLMs tasks.

    • @jefflane2012
      @jefflane2012 4 дня назад +1

      Nvidia GPUs have tensor cores to perform massive simultaneous calculations for AI. AMD has cuda which is more of a general processor.

  • @MonsieugarDaddy
    @MonsieugarDaddy 15 дней назад +1

    I'm glad I found this video, but I got lost at the first part: GPU memory
    I knew most ppl will refer the graphic part as a 'dedicated' component
    yet I'm curious: can we include iGPU to contribute to our setup?
    considering the latest Ryzen 880M is quite a 'capable' iGPU..

    • @AIFusion-official
      @AIFusion-official  15 дней назад +1

      Thank you for your comment! While the Ryzen 880M is a powerful iGPU for many tasks, running large language models (LLMs) typically requires a dedicated GPU with substantial VRAM (often 8GB to 24GB) due to the heavy memory and computational demands. iGPUs like the 880M share system memory and lack the dedicated resources and bandwidth needed to efficiently handle LLMs, so they wouldn't contribute significantly in this context. For best results, a discrete GPU with sufficient VRAM is essential.

  • @matthiasandreas6549
    @matthiasandreas6549 12 дней назад +1

    Hello you have the link for quantization calcukation what you showing in the video? Please thank you

    • @AIFusion-official
      @AIFusion-official  12 дней назад

      that's not a tool, that's just something i made using HTML, Css and js for the sake of the video. I have made a tool right after this video, where you can choose a large language model and see what GPUs could run it (and how many of them) in FP32, FP16, INT8 and INT4. (you can find the link to this tool in the description)

    • @matthiasandreas6549
      @matthiasandreas6549 12 дней назад

      @@AIFusion-official thanks a lot for your anwer, i look but cant find the link 🤔

    • @AIFusion-official
      @AIFusion-official  12 дней назад

      You're welcome, Here it is : aifusion.company/gpu-llm . Hope it helps!

    • @matthiasandreas6549
      @matthiasandreas6549 12 дней назад +1

      @@AIFusion-official thank you so much

    • @AIFusion-official
      @AIFusion-official  12 дней назад

      You're welcome!

  • @azkongs
    @azkongs Месяц назад +1

    Do you know if a model that 16GB in size, could it run with graphics card with 16GB VRAM?

    • @AIFusion-official
      @AIFusion-official  Месяц назад

      A model that is 16GB in size might not fit perfectly into a 16GB VRAM graphics card due to additional memory requirements for computations and overhead. While it’s theoretically possible, practical use often requires more VRAM. Techniques like quantization and reducing batch size can help manage memory usage.

  • @matthiasandreas6549
    @matthiasandreas6549 25 дней назад +1

    Hello and thanks for the video, i have a possibility to choose at the mean tim to buy a Nvidia RTX 3060 12GB or a Nvidia RTX 4060 8GB GPU to use 8b llms in private testing.
    Which one will be the better one i understand that both gpus eill be work at T4 85% accuracy is this so?
    Thank you for answerig

    • @AIFusion-official
      @AIFusion-official  25 дней назад

      I have an RTX 4060, and it works fine. However, I would recommend getting the RTX 3060 because it has more video RAM, which allows it to load larger models. Hope this helps!

    • @matthiasandreas6549
      @matthiasandreas6549 25 дней назад

      @@AIFusion-official​​⁠, yes thanks it helps a lot, mostly you stream was very informative.
      But the accurracy value with 12Gb VRAM will be the same or a bit better for understand?

    • @matthiasandreas6549
      @matthiasandreas6549 25 дней назад

      @@AIFusion-official, and how big is your RAM? About 32Gb or bigger?
      I can choose between 32 or 64gb but there is a bigger price.

    • @AIFusion-official
      @AIFusion-official  24 дня назад

      i have 32gb ram

    • @AIFusion-official
      @AIFusion-official  24 дня назад

      The accuracy would be the same, with the only difference being the speed-specifically, how many tokens per second it can generate.