I knew the number of bits had to do with the accuracy and how powerful the hardware was required to run a LLM but beyond that i had no idea what it meant. Your explanation was super clear, so thanks.
Glad you enjoyed it! I wanted to try something different from my regular videos to up the editing quality and also push my boundaries. I'm glad you felt good about it :)
Thanks this is a nice video. Can GPTQ models run on apple metal framework? Also, I have seen some GGML models use CPU and GPU together. How is this different from the other approach?
Say if I want to run inference on macbook air m2, which does have decent GPU cores, although not nvidia or are built for extreme ML use case, Should I go with more cpu focused pipeline or GPTq based GPU intensive Pipeline?
@@1littlecoder A Discord server could serve as a hub for your viewers to discuss AI, machine learning, and tech. It'd be a community where like-minded people share insights, ask questions, and interact. Plus, you could directly engage with your audience and host discussions. you make really good quality informative vids :)
Can you explain how to figure out what settings to use to run models in textui. such as transformer, qlora, i usually just end up trying every combination untill it works or i give up. And usualy no instructions on huggingface repo.
@@1littlecoder Well. you described what the process of Quantization is like in the dictionary. you don't list any ways to use it or how that could apply to projects that even you have covered recently. This was a great moment to hat tip your old 4bit quantised code from a few months ago (which isnt actually working but w/e) - this did little more than descibe what the word is and how it pertains to maths. I like your work but a worked quantization of an ACTUAL MODEL would be useful. simply reiterating without any adding any value is not answering "UNDERSTANDING AI model Quantization" it's simple understanding what the WORDS mean
@@1littlecoder get a system that can work with quantisation and quantise a previously unquantised model and use it. would have been a more practically useful explaination of "understanding" the current contemporary steps that are required to apply to knowledge of quantisation. something tiny would be fine no one expects you to quantise gpt4 on your own but there are lots of very small well performing datsets that could be tried.
Well done mate. Thank you for your thorough and clear explanation.
Thanks for the super Thanks
This series(back to basics) needs a boost. Love the way you explained all fundamentals. Keep them coming
Thanks, This is a motivation 👏🏽
This is such a good video but the number of likes/views don't reflect it. Thank you so much.
Appreciate the kind words man :)
💯 True
So true
Good that you captured the basics but didn't explain ggml and gptq enough!
I knew the number of bits had to do with the accuracy and how powerful the hardware was required to run a LLM but beyond that i had no idea what it meant. Your explanation was super clear, so thanks.
Thank you
Wonderful explanation! Keep up the great content!
Great explanation!
Glad you think so!
Excellent and most accurate explanation. Thank you!
Glad it was helpful!
Good work, great explanation. Thanks!
You are welcome!
This was excellent! Thank you!
Glad you enjoyed it! I wanted to try something different from my regular videos to up the editing quality and also push my boundaries. I'm glad you felt good about it :)
Never knew there was a different but now I know, thank you!
I was exactly wondering how quantization works this morning. Thank you, such a good video 🎉
Glad it was helpful! ☺️
Thanks mate for the great explanation!
You're welcome
Thanks for the explanation.
Glad it was helpful!
Great explanation about the differences about GGPQ and GGML, thanks once again!
Thanks for this, Ive been wondering about this. I'd love some more explination on webui settings
Glad it was helpful! Noted!
Great and informative video dude! Well done, and I always appreciate your content!
Glad to hear it! Thanks for your support and feedback 🙏🏽
Great explanation! I needed this...lol
Always my go to channel to understand concepts clearly. Can't thankyou enough brother. 🙌
Wow, amazing video, everything was well explained and detailed, thanks!
Thanks this is a nice video. Can GPTQ models run on apple metal framework? Also, I have seen some GGML models use CPU and GPU together. How is this different from the other approach?
Thank you brother well understood
You are welcome brother!
A GPTQ quantized model inherits from the nn.Module class in pytorch? How can I integrate a GPTQ model with my pytorch code?
Good explanation
Thanks for liking
do you use windows? I am completely struggling trying to get auto-gpt to recognize my cuda install.
It's time to extend this by quip and awq
Does it works on deberta models?
have you planed to do some more videos on gptq and ggml, where finetuning the quatized model or converting fp16 models to quantized model
How can we utilize both GPU and CPU for training a model. Like somehow break the model and store half of it in CPU RAM and the other half in GPU RAM
Say if I want to run inference on macbook air m2, which does have decent GPU cores, although not nvidia or are built for extreme ML use case, Should I go with more cpu focused pipeline or GPTq based GPU intensive Pipeline?
My bad I forgot to mention GGML is optmized for Apple Silicon as well
are you going to make a discord server?
What do you think would be the use of it ?
@@1littlecoder A Discord server could serve as a hub for your viewers to discuss AI, machine learning, and tech. It'd be a community where like-minded people share insights, ask questions, and interact. Plus, you could directly engage with your audience and host discussions. you make really good quality informative vids :)
Thanks
Enjoyed it, thank you
Glad you enjoyed it, 😌 thanks
Great content as usual! I just need someone tell me about those in that simple way
Can you explain how to figure out what settings to use to run models in textui. such as transformer, qlora, i usually just end up trying every combination untill it works or i give up. And usualy no instructions on huggingface repo.
Great explanation❤. Is there a way to contact with you ?
Thanks 1littlecoder at gmail dot com or the same on Twitter
can we fine tune a quantized model?
Why they use dataset for quatizing in autogptq?
GPTQ algorithm requires calibrating the quantized weights of the model by making inferences on the quantized model
@@1littlecoder thanks i am thinking gptq will quantize weight to particular dataset.
from transformers import AutoModelForCausalLM, AutoTokenizer
from optimum.gptq import GPTQQuantizer, load_quantized_model
import torch
model_name = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
quantizer = GPTQQuantizer(bits=4, dataset="c4", block_name_to_quantize = "model.decoder.layers", model_seqlen = 2048)
shall we connect for sometime?
thank you
you are welcome!
can u pls make a vid on GGUF?
Beautiful
Thank you! Cheers!
How about if I have a nvidia GPU but it’s not large enough to host a 70b model?😂😂😂
it answered basically nothing though.
Like what ?
@@1littlecoder Well. you described what the process of Quantization is like in the dictionary. you don't list any ways to use it or how that could apply to projects that even you have covered recently. This was a great moment to hat tip your old 4bit quantised code from a few months ago (which isnt actually working but w/e) - this did little more than descibe what the word is and how it pertains to maths. I like your work but a worked quantization of an ACTUAL MODEL would be useful. simply reiterating without any adding any value is not answering "UNDERSTANDING AI model Quantization" it's simple understanding what the WORDS mean
@@twobob thanks for the details. What do you mean by worked quantised model ?
@@1littlecoder get a system that can work with quantisation and quantise a previously unquantised model and use it. would have been a more practically useful explaination of "understanding" the current contemporary steps that are required to apply to knowledge of quantisation. something tiny would be fine no one expects you to quantise gpt4 on your own but there are lots of very small well performing datsets that could be tried.
@@twobob 👍
GGML ==> GGUF now that uses CPU + GPU
Imo, neurons should not communicate with every other neuron. Independent threading should lead to fast and more accurate and reliable training
That is bullshit
Thanks