Estimate Memory Consumption of LLMs for Inference and Fine-Tuning
HTML-код
- Опубликовано: 29 сен 2024
- Join me in this informative video where I dive into estimating the memory consumption of transformer models used for both fine-tuning and inference.
In this video, I'll guide you through a step-by-step process using the latest version of Hugging Face transformers. You'll learn how to input the model names-whether they're from the Hugging Face Hub or a local source-and retrieve the model's architecture to start the estimation. Remember, this is a basic approximation without any specific optimizations.
If you find this video helpful, please like, comment, and subscribe for more insightful content like this.
Join this channel to get access to perks:
/ @aianytime
To further support the channel, you can contribute via the following methods:
Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
UPI: sonu1000raw@ybl
Screengrab: drive.google.c...
Research Paper: arxiv.org/pdf/...
GitHub: github.com/AIA...
Author Credits:
huggingface.co...
AI Ketchup
#llm #ai #genai
what is bit-width model and bit-width optimizer ?
put calculation for Qlora in python code as well
Please provide the links for the videos that you have mentioned for reducing the memory size
Great video, plz add the blog as well for calculating the same ✌✌
Blog is coming. AI Anytime website is getting launched next week.
@@AIAnytime that's great, I was asking abt - 5:36 (loother ai? blog)
I like a lot the blackboard explanation
Thanks, Good explanation.
You're welcome!
Very useful video Sonu. Keep it up.
Thank you sir
What is the system req like ram and vram to run 70b??
if running in half-precision (float16b), then 70*2 = 140GB. You will need at least 140GB VRAM to be able to run a 70B model. Also, I am ignoring other components that will require extra vram e.g, optimizers states.
@@MrXxXx420 optimizers aren't necessary for inference
@@MrXxXx420 Furthermore you need a sum of 140 Gb RAM + VRAM, so you could do it with 128 Gb RAM and 20 Gb VRAM, however, the more VRAM the faster the inference will be.
Hey man seems very much like the Kaitchup article that was released before your video
Weird to not see Kaitchup credited anywhere
It's on Huggingface blog. huggingface.co/blog/Andyrasika/memory-consumption-estimation but i should give credits. Let me add that.... Thanks for the tip.
@@doremicocoparis9410 Hey, Are articles published in Kaitchup good, worth subscribing ?
hey @doremicocoparis9410 are Kaitchup articles good, worth paid subscription??
Yes
Thank you,
Thank you sir