Understanding: AI Model Quantization, GGML vs GPTQ!

1littlecoder

Просмотров 22 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 13 дек 2024

Комментарии • 79

@vamp9225 Год назад ⁺⁴
Well done mate. Thank you for your thorough and clear explanation.
@1littlecoder Год назад
Thanks for the super Thanks
@explorer945 Год назад ⁺⁹
This series(back to basics) needs a boost. Love the way you explained all fundamentals. Keep them coming
@1littlecoder Год назад ⁺²
Thanks, This is a motivation 👏🏽
@rupjitchakraborty8012 Год назад ⁺¹¹
This is such a good video but the number of likes/views don't reflect it. Thank you so much.
@1littlecoder Год назад
Appreciate the kind words man :)
@motubkchod3758 Год назад
💯 True
@designer5445 9 месяцев назад
So true
@sshandilya 2 месяца назад
Good that you captured the basics but didn't explain ggml and gptq enough!
@MonkeySimius Год назад ⁺¹
I knew the number of bits had to do with the accuracy and how powerful the hardware was required to run a LLM but beyond that i had no idea what it meant. Your explanation was super clear, so thanks.
@1littlecoder Год назад ⁺¹
Thank you
@mokanin8894 9 месяцев назад
Wonderful explanation! Keep up the great content!
@martenrauschenberg4831 8 месяцев назад ⁺¹
Great explanation!
@1littlecoder 8 месяцев назад
Glad you think so!
@tarun4705 Год назад ⁺²
Excellent and most accurate explanation. Thank you!
@1littlecoder Год назад ⁺¹
Glad it was helpful!
@fredrik-ekelund Год назад ⁺²
Good work, great explanation. Thanks!
@1littlecoder Год назад
You are welcome!
@inplainview1 Год назад ⁺²
This was excellent! Thank you!
@1littlecoder Год назад ⁺¹
Glad you enjoyed it! I wanted to try something different from my regular videos to up the editing quality and also push my boundaries. I'm glad you felt good about it :)
@Nerdimo Год назад
Never knew there was a different but now I know, thank you!
@megamehdi89 Год назад ⁺²
I was exactly wondering how quantization works this morning. Thank you, such a good video 🎉
@1littlecoder Год назад
Glad it was helpful! ☺️
@Semion. 9 месяцев назад ⁺¹
Thanks mate for the great explanation!
@1littlecoder 9 месяцев назад
You're welcome
@TheBuzzati Год назад ⁺¹
Thanks for the explanation.
@1littlecoder Год назад
Glad it was helpful!
@echofloripa Год назад
Great explanation about the differences about GGPQ and GGML, thanks once again!
@shape6093 Год назад ⁺¹
Thanks for this, Ive been wondering about this. I'd love some more explination on webui settings
@1littlecoder Год назад
Glad it was helpful! Noted!
@MaJetiGizzle Год назад ⁺¹
Great and informative video dude! Well done, and I always appreciate your content!
@1littlecoder Год назад ⁺¹
Glad to hear it! Thanks for your support and feedback 🙏🏽
@sytekd00d Год назад
Great explanation! I needed this...lol
@vivekraj9333 Год назад
Always my go to channel to understand concepts clearly. Can't thankyou enough brother. 🙌
@luis96xd Год назад
Wow, amazing video, everything was well explained and detailed, thanks!
@harry892004 Год назад ⁺²
Thanks this is a nice video. Can GPTQ models run on apple metal framework? Also, I have seen some GGML models use CPU and GPU together. How is this different from the other approach?
@mjacfardk Год назад ⁺¹
Thank you brother well understood
@1littlecoder Год назад
You are welcome brother!
@aurkom Год назад
A GPTQ quantized model inherits from the nn.Module class in pytorch? How can I integrate a GPTQ model with my pytorch code?
@kamal9991999 Год назад ⁺¹
Good explanation
@1littlecoder Год назад
Thanks for liking
@happyday.mjohnson Год назад
do you use windows? I am completely struggling trying to get auto-gpt to recognize my cuda install.
@alx8439 Год назад
It's time to extend this by quip and awq
@-RakeshDhilipB 4 месяца назад
Does it works on deberta models?
@im-notai Год назад
have you planed to do some more videos on gptq and ggml, where finetuning the quatized model or converting fp16 models to quantized model
@kalilinux8682 Год назад
How can we utilize both GPU and CPU for training a model. Like somehow break the model and store half of it in CPU RAM and the other half in GPU RAM
@himanshutanwani_ Год назад ⁺¹
Say if I want to run inference on macbook air m2, which does have decent GPU cores, although not nvidia or are built for extreme ML use case, Should I go with more cpu focused pipeline or GPTq based GPU intensive Pipeline?
@1littlecoder Год назад ⁺¹
My bad I forgot to mention GGML is optmized for Apple Silicon as well
@avkna1830 Год назад ⁺¹
are you going to make a discord server?
@1littlecoder Год назад
What do you think would be the use of it ?
@avkna1830 Год назад
@@1littlecoder A Discord server could serve as a hub for your viewers to discuss AI, machine learning, and tech. It'd be a community where like-minded people share insights, ask questions, and interact. Plus, you could directly engage with your audience and host discussions. you make really good quality informative vids :)
@debatradas1597 Год назад
Thanks
@JavArButt Год назад
Enjoyed it, thank you
@1littlecoder Год назад
Glad you enjoyed it, 😌 thanks
@kevinzhu9305 Год назад
Great content as usual! I just need someone tell me about those in that simple way
@107cdb Год назад
Can you explain how to figure out what settings to use to run models in textui. such as transformer, qlora, i usually just end up trying every combination untill it works or i give up. And usualy no instructions on huggingface repo.
@flipper71100 Год назад ⁺¹
Great explanation❤. Is there a way to contact with you ?
@1littlecoder Год назад
Thanks 1littlecoder at gmail dot com or the same on Twitter
@rewanthnayak2972 Год назад
can we fine tune a quantized model?
@chethanningappa Год назад
Why they use dataset for quatizing in autogptq?
@1littlecoder Год назад
GPTQ algorithm requires calibrating the quantized weights of the model by making inferences on the quantized model
@chethanningappa Год назад
@@1littlecoder thanks i am thinking gptq will quantize weight to particular dataset.
from transformers import AutoModelForCausalLM, AutoTokenizer
from optimum.gptq import GPTQQuantizer, load_quantized_model
import torch
model_name = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
quantizer = GPTQQuantizer(bits=4, dataset="c4", block_name_to_quantize = "model.decoder.layers", model_seqlen = 2048)
shall we connect for sometime?
@anurajms 11 месяцев назад
thank you
@1littlecoder 11 месяцев назад
you are welcome!
@pushkarbankar 9 месяцев назад
can u pls make a vid on GGUF?
@hamadandrabi554 Год назад
Beautiful
@1littlecoder Год назад
Thank you! Cheers!
@bf2825 Год назад
How about if I have a nvidia GPU but it’s not large enough to host a 70b model?😂😂😂
@twobob Год назад ⁺¹
it answered basically nothing though.
@1littlecoder Год назад
Like what ?
@twobob Год назад
@@1littlecoder Well. you described what the process of Quantization is like in the dictionary. you don't list any ways to use it or how that could apply to projects that even you have covered recently. This was a great moment to hat tip your old 4bit quantised code from a few months ago (which isnt actually working but w/e) - this did little more than descibe what the word is and how it pertains to maths. I like your work but a worked quantization of an ACTUAL MODEL would be useful. simply reiterating without any adding any value is not answering "UNDERSTANDING AI model Quantization" it's simple understanding what the WORDS mean
@1littlecoder Год назад ⁺¹
@@twobob thanks for the details. What do you mean by worked quantised model ?
@twobob Год назад
@@1littlecoder get a system that can work with quantisation and quantise a previously unquantised model and use it. would have been a more practically useful explaination of "understanding" the current contemporary steps that are required to apply to knowledge of quantisation. something tiny would be fine no one expects you to quantise gpt4 on your own but there are lots of very small well performing datsets that could be tried.
@1littlecoder Год назад ⁺¹
@@twobob 👍
@JG27Korny 10 месяцев назад
GGML ==> GGUF now that uses CPU + GPU
@thegreenxeno9430 Год назад
Imo, neurons should not communicate with every other neuron. Independent threading should lead to fast and more accurate and reliable training
@wsy987 Год назад
That is bullshit
@narendraparmar1631 8 месяцев назад
Thanks

Следующие

Автовоспроизведение

Small is GOOD - New StableCode 3B AI Coding Assistant!!!