Understanding: AI Model Quantization, GGML vs GPTQ!

Поделиться
HTML-код
  • Опубликовано: 13 дек 2024

Комментарии • 79

  • @vamp9225
    @vamp9225 Год назад +4

    Well done mate. Thank you for your thorough and clear explanation.

  • @explorer945
    @explorer945 Год назад +9

    This series(back to basics) needs a boost. Love the way you explained all fundamentals. Keep them coming

    • @1littlecoder
      @1littlecoder  Год назад +2

      Thanks, This is a motivation 👏🏽

  • @rupjitchakraborty8012
    @rupjitchakraborty8012 Год назад +11

    This is such a good video but the number of likes/views don't reflect it. Thank you so much.

  • @sshandilya
    @sshandilya 2 месяца назад

    Good that you captured the basics but didn't explain ggml and gptq enough!

  • @MonkeySimius
    @MonkeySimius Год назад +1

    I knew the number of bits had to do with the accuracy and how powerful the hardware was required to run a LLM but beyond that i had no idea what it meant. Your explanation was super clear, so thanks.

  • @mokanin8894
    @mokanin8894 9 месяцев назад

    Wonderful explanation! Keep up the great content!

  • @martenrauschenberg4831
    @martenrauschenberg4831 8 месяцев назад +1

    Great explanation!

  • @tarun4705
    @tarun4705 Год назад +2

    Excellent and most accurate explanation. Thank you!

  • @fredrik-ekelund
    @fredrik-ekelund Год назад +2

    Good work, great explanation. Thanks!

  • @inplainview1
    @inplainview1 Год назад +2

    This was excellent! Thank you!

    • @1littlecoder
      @1littlecoder  Год назад +1

      Glad you enjoyed it! I wanted to try something different from my regular videos to up the editing quality and also push my boundaries. I'm glad you felt good about it :)

  • @Nerdimo
    @Nerdimo Год назад

    Never knew there was a different but now I know, thank you!

  • @megamehdi89
    @megamehdi89 Год назад +2

    I was exactly wondering how quantization works this morning. Thank you, such a good video 🎉

  • @Semion.
    @Semion. 9 месяцев назад +1

    Thanks mate for the great explanation!

  • @TheBuzzati
    @TheBuzzati Год назад +1

    Thanks for the explanation.

  • @echofloripa
    @echofloripa Год назад

    Great explanation about the differences about GGPQ and GGML, thanks once again!

  • @shape6093
    @shape6093 Год назад +1

    Thanks for this, Ive been wondering about this. I'd love some more explination on webui settings

  • @MaJetiGizzle
    @MaJetiGizzle Год назад +1

    Great and informative video dude! Well done, and I always appreciate your content!

    • @1littlecoder
      @1littlecoder  Год назад +1

      Glad to hear it! Thanks for your support and feedback 🙏🏽

  • @sytekd00d
    @sytekd00d Год назад

    Great explanation! I needed this...lol

  • @vivekraj9333
    @vivekraj9333 Год назад

    Always my go to channel to understand concepts clearly. Can't thankyou enough brother. 🙌

  • @luis96xd
    @luis96xd Год назад

    Wow, amazing video, everything was well explained and detailed, thanks!

  • @harry892004
    @harry892004 Год назад +2

    Thanks this is a nice video. Can GPTQ models run on apple metal framework? Also, I have seen some GGML models use CPU and GPU together. How is this different from the other approach?

  • @mjacfardk
    @mjacfardk Год назад +1

    Thank you brother well understood

  • @aurkom
    @aurkom Год назад

    A GPTQ quantized model inherits from the nn.Module class in pytorch? How can I integrate a GPTQ model with my pytorch code?

  • @kamal9991999
    @kamal9991999 Год назад +1

    Good explanation

  • @happyday.mjohnson
    @happyday.mjohnson Год назад

    do you use windows? I am completely struggling trying to get auto-gpt to recognize my cuda install.

  • @alx8439
    @alx8439 Год назад

    It's time to extend this by quip and awq

  • @-RakeshDhilipB
    @-RakeshDhilipB 4 месяца назад

    Does it works on deberta models?

  • @im-notai
    @im-notai Год назад

    have you planed to do some more videos on gptq and ggml, where finetuning the quatized model or converting fp16 models to quantized model

  • @kalilinux8682
    @kalilinux8682 Год назад

    How can we utilize both GPU and CPU for training a model. Like somehow break the model and store half of it in CPU RAM and the other half in GPU RAM

  • @himanshutanwani_
    @himanshutanwani_ Год назад +1

    Say if I want to run inference on macbook air m2, which does have decent GPU cores, although not nvidia or are built for extreme ML use case, Should I go with more cpu focused pipeline or GPTq based GPU intensive Pipeline?

    • @1littlecoder
      @1littlecoder  Год назад +1

      My bad I forgot to mention GGML is optmized for Apple Silicon as well

  • @avkna1830
    @avkna1830 Год назад +1

    are you going to make a discord server?

    • @1littlecoder
      @1littlecoder  Год назад

      What do you think would be the use of it ?

    • @avkna1830
      @avkna1830 Год назад

      @@1littlecoder A Discord server could serve as a hub for your viewers to discuss AI, machine learning, and tech. It'd be a community where like-minded people share insights, ask questions, and interact. Plus, you could directly engage with your audience and host discussions. you make really good quality informative vids :)

  • @debatradas1597
    @debatradas1597 Год назад

    Thanks

  • @JavArButt
    @JavArButt Год назад

    Enjoyed it, thank you

  • @kevinzhu9305
    @kevinzhu9305 Год назад

    Great content as usual! I just need someone tell me about those in that simple way

  • @107cdb
    @107cdb Год назад

    Can you explain how to figure out what settings to use to run models in textui. such as transformer, qlora, i usually just end up trying every combination untill it works or i give up. And usualy no instructions on huggingface repo.

  • @flipper71100
    @flipper71100 Год назад +1

    Great explanation❤. Is there a way to contact with you ?

    • @1littlecoder
      @1littlecoder  Год назад

      Thanks 1littlecoder at gmail dot com or the same on Twitter

  • @rewanthnayak2972
    @rewanthnayak2972 Год назад

    can we fine tune a quantized model?

  • @chethanningappa
    @chethanningappa Год назад

    Why they use dataset for quatizing in autogptq?

    • @1littlecoder
      @1littlecoder  Год назад

      GPTQ algorithm requires calibrating the quantized weights of the model by making inferences on the quantized model

    • @chethanningappa
      @chethanningappa Год назад

      @@1littlecoder thanks i am thinking gptq will quantize weight to particular dataset.
      from transformers import AutoModelForCausalLM, AutoTokenizer
      from optimum.gptq import GPTQQuantizer, load_quantized_model
      import torch
      model_name = "facebook/opt-125m"
      tokenizer = AutoTokenizer.from_pretrained(model_name)
      model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
      quantizer = GPTQQuantizer(bits=4, dataset="c4", block_name_to_quantize = "model.decoder.layers", model_seqlen = 2048)
      shall we connect for sometime?

  • @anurajms
    @anurajms 11 месяцев назад

    thank you

  • @pushkarbankar
    @pushkarbankar 9 месяцев назад

    can u pls make a vid on GGUF?

  • @hamadandrabi554
    @hamadandrabi554 Год назад

    Beautiful

  • @bf2825
    @bf2825 Год назад

    How about if I have a nvidia GPU but it’s not large enough to host a 70b model?😂😂😂

  • @twobob
    @twobob Год назад +1

    it answered basically nothing though.

    • @1littlecoder
      @1littlecoder  Год назад

      Like what ?

    • @twobob
      @twobob Год назад

      @@1littlecoder Well. you described what the process of Quantization is like in the dictionary. you don't list any ways to use it or how that could apply to projects that even you have covered recently. This was a great moment to hat tip your old 4bit quantised code from a few months ago (which isnt actually working but w/e) - this did little more than descibe what the word is and how it pertains to maths. I like your work but a worked quantization of an ACTUAL MODEL would be useful. simply reiterating without any adding any value is not answering "UNDERSTANDING AI model Quantization" it's simple understanding what the WORDS mean

    • @1littlecoder
      @1littlecoder  Год назад +1

      @@twobob thanks for the details. What do you mean by worked quantised model ?

    • @twobob
      @twobob Год назад

      @@1littlecoder get a system that can work with quantisation and quantise a previously unquantised model and use it. would have been a more practically useful explaination of "understanding" the current contemporary steps that are required to apply to knowledge of quantisation. something tiny would be fine no one expects you to quantise gpt4 on your own but there are lots of very small well performing datsets that could be tried.

    • @1littlecoder
      @1littlecoder  Год назад +1

      @@twobob 👍

  • @JG27Korny
    @JG27Korny 10 месяцев назад

    GGML ==> GGUF now that uses CPU + GPU

  • @thegreenxeno9430
    @thegreenxeno9430 Год назад

    Imo, neurons should not communicate with every other neuron. Independent threading should lead to fast and more accurate and reliable training

  • @wsy987
    @wsy987 Год назад

    That is bullshit

  • @narendraparmar1631
    @narendraparmar1631 8 месяцев назад

    Thanks