New Tutorial on LLM Quantization w/ QLoRA, GPTQ and Llamacpp, LLama 2

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • LLM Quantization: GPTQ - AutoGPTQ
    llama.cpp - ggml.c - GGUL - C++
    Compare to HF transformers in 4-bit quantization.
    Download Web UI wrappers for your heavily quantized LLM to your local machine (PC, Linux, Apple).
    LLM on Apple Hardware, w/ M1, M2 or M3 chip.
    Run inference of your LLMs on your local PC, with heavy quantization applied.
    Plus: 8 Web UI for GTPQ, llama.cpp or AutoGPTQ, exLLama or GGUF.c
    koboldcpp
    oobabooga text-generation-webui
    ctransformers
    lmstudio.ai/
    github.com/mar...
    github.com/gge...
    github.com/rus...
    huggingface.co...
    github.com/Pan...
    cloud.google.c...
    huggingface.co...
    h2o.ai/platfor...
    #quantization
    #ai
    #webui

Комментарии • 22

  • @jacehua7334
    @jacehua7334 11 месяцев назад +1

    Have been busy with work but it's so great on the weekend to see absolute great content from you like always!

  • @ctejada-0
    @ctejada-0 11 месяцев назад +2

    Happy to see llama.cpp taking off. Since the beginning of this new wave of AI as a consequence of LLM advancements I've been rooting for llama.cpp as it is (in my opinion) the best approach to enable everyone to have their own LLM and enable a plethora of software solutions (open and closed source) that were never possible before. Thank you for this video focused on it.

    • @code4AI
      @code4AI  11 месяцев назад +1

      Thank you for your comment. Maybe I'll do another video on the latest llamacpp ...

  • @ViktorFerenczi
    @ViktorFerenczi 11 месяцев назад +2

    Excellent video, as always! Thank you. - It would be nice to have a video comparing AWQ with the quantization methods discussed here.

    • @code4AI
      @code4AI  11 месяцев назад +1

      Activation-aware Weight Quantization (AWQ)? Great idea!

  • @henkhbit5748
    @henkhbit5748 11 месяцев назад

    Great explanation of the different quatizations methods. Would be nice if we can compare for example llma2 7b models: normal, qlora 4b, qptq 4b, gguf 4b format with different inference questions with an without RAG...

  • @hoangnam6275
    @hoangnam6275 11 месяцев назад

    U r the best, best content everyweek

  • @akashkarnatak3014
    @akashkarnatak3014 11 месяцев назад +1

    Okay, so gqtq is a quantization technique and gguf is a format to store quantized weights, can't we quantize a model using gptq algorithm and store it in gguf format and run using llama.cpp?

    • @junzhengge407
      @junzhengge407 5 месяцев назад

      I have the same question😢 need help

  • @ChrisBrock-mh8qq
    @ChrisBrock-mh8qq 6 месяцев назад

    Really Great Videos!

  • @AK-ox3mv
    @AK-ox3mv 5 месяцев назад

    What does k mean in q4_km?
    What's difference between q4 and 4bit? Are they same thing?

  • @amparoconsuelo9451
    @amparoconsuelo9451 11 месяцев назад

    Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model?

  • @devyanshrastogi
    @devyanshrastogi 9 месяцев назад +1

    Trust me after 20 seconds of your intro I was about to skip this video 🤣🤣 the intro was terrific (Literally).

  • @spencerfunk6697
    @spencerfunk6697 6 месяцев назад

    need a tutorial on quantizing vision models

  • @yusufkemaldemir9393
    @yusufkemaldemir9393 11 месяцев назад

    Thanks. Does llama2 cpp 4 bit quantized provide back propagation while running it on m2 MacBook? If yes, do you mind provide ref notebook?

    • @surajrajendran6528
      @surajrajendran6528 5 месяцев назад

      Quantised models cannot be back-propagated. All training should be done in floating point precision.

  • @gileneusz
    @gileneusz 11 месяцев назад

    0:08 oh... so maybe I'll watch your next video, sorry....

    • @code4AI
      @code4AI  11 месяцев назад +1

      You are the lucky one ...

    • @gileneusz
      @gileneusz 11 месяцев назад +1

      @@code4AI no, no that's just my dream 😢

  • @ernestoflores3873
    @ernestoflores3873 3 месяца назад