Estimate Memory Consumption of LLMs for Inference and Fine-Tuning

Поделиться
HTML-код
  • Опубликовано: 29 сен 2024
  • Join me in this informative video where I dive into estimating the memory consumption of transformer models used for both fine-tuning and inference.
    In this video, I'll guide you through a step-by-step process using the latest version of Hugging Face transformers. You'll learn how to input the model names-whether they're from the Hugging Face Hub or a local source-and retrieve the model's architecture to start the estimation. Remember, this is a basic approximation without any specific optimizations.
    If you find this video helpful, please like, comment, and subscribe for more insightful content like this.
    Join this channel to get access to perks:
    / @aianytime
    To further support the channel, you can contribute via the following methods:
    Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
    UPI: sonu1000raw@ybl
    Screengrab: drive.google.c...
    Research Paper: arxiv.org/pdf/...
    GitHub: github.com/AIA...
    Author Credits:
    huggingface.co...
    AI Ketchup
    #llm #ai #genai

Комментарии • 23

  • @thepresistence5935
    @thepresistence5935 3 месяца назад

    what is bit-width model and bit-width optimizer ?

  • @DipeshPaul-x5k
    @DipeshPaul-x5k 3 месяца назад

    put calculation for Qlora in python code as well

  • @aqsa4635
    @aqsa4635 5 месяцев назад

    Please provide the links for the videos that you have mentioned for reducing the memory size

  • @PhotoshoppersStop
    @PhotoshoppersStop 5 месяцев назад +1

    Great video, plz add the blog as well for calculating the same ✌✌

    • @AIAnytime
      @AIAnytime  5 месяцев назад +2

      Blog is coming. AI Anytime website is getting launched next week.

    • @PhotoshoppersStop
      @PhotoshoppersStop 5 месяцев назад

      @@AIAnytime that's great, I was asking abt - 5:36 (loother ai? blog)

  • @andres.yodars
    @andres.yodars 4 месяца назад

    I like a lot the blackboard explanation

  • @madhukarmukkamula1515
    @madhukarmukkamula1515 4 месяца назад

    Thanks, Good explanation.

  • @PunitPandey
    @PunitPandey 5 месяцев назад

    Very useful video Sonu. Keep it up.

  • @rakeshreddy2791
    @rakeshreddy2791 5 месяцев назад

    What is the system req like ram and vram to run 70b??

    • @MrXxXx420
      @MrXxXx420 5 месяцев назад

      if running in half-precision (float16b), then 70*2 = 140GB. You will need at least 140GB VRAM to be able to run a 70B model. Also, I am ignoring other components that will require extra vram e.g, optimizers states.

    • @unh0lys0da16
      @unh0lys0da16 Месяц назад

      @@MrXxXx420 optimizers aren't necessary for inference

    • @unh0lys0da16
      @unh0lys0da16 Месяц назад

      @@MrXxXx420 Furthermore you need a sum of 140 Gb RAM + VRAM, so you could do it with 128 Gb RAM and 20 Gb VRAM, however, the more VRAM the faster the inference will be.

  • @doremicocoparis9410
    @doremicocoparis9410 5 месяцев назад

    Hey man seems very much like the Kaitchup article that was released before your video

    • @doremicocoparis9410
      @doremicocoparis9410 5 месяцев назад

      Weird to not see Kaitchup credited anywhere

    • @AIAnytime
      @AIAnytime  5 месяцев назад +1

      It's on Huggingface blog. huggingface.co/blog/Andyrasika/memory-consumption-estimation but i should give credits. Let me add that.... Thanks for the tip.

    • @madhukarmukkamula1515
      @madhukarmukkamula1515 4 месяца назад

      @@doremicocoparis9410 Hey, Are articles published in Kaitchup good, worth subscribing ?

    • @madhukarmukkamula1515
      @madhukarmukkamula1515 4 месяца назад

      hey @doremicocoparis9410 are Kaitchup articles good, worth paid subscription??

    • @doremicocoparis9410
      @doremicocoparis9410 4 месяца назад

      Yes

  • @TooyAshy-100
    @TooyAshy-100 5 месяцев назад

    Thank you,

    • @AIAnytime
      @AIAnytime  5 месяцев назад +1

      Thank you sir