Quantize any LLM with GGUF and Llama.cpp

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024
  • In this tutorial, I dive deep into the cutting-edge technique of quantizing Large Language Models (LLMs) using the powerful llama.cpp tool. I'll guide you through the entire process of converting any LLM to GGUF format, a revolutionary approach that enables efficient inference on both CPUs and consumer GPUs. 🖥️💡
    🔍 What You'll Learn:
    The basics of LLM quantization and why it's important for making AI more accessible.
    A step-by-step walkthrough on using llama.cpp for model conversion.
    How to utilize the GGUF format to run your LLMs efficiently on different hardware.
    Exclusive tips on pushing your 4-bit quantized model to Hugging Face, making it available for the global AI community.
    👨‍💻 Who This Is For:
    AI enthusiasts looking to optimize their models for better performance.
    Developers seeking to deploy LLMs on various hardware platforms.
    Anyone curious about the latest advancements in AI model efficiency.
    🛠️ Tools and Platforms Used:
    Google Colab for hands-on coding and model conversion.
    Hugging Face for model sharing and community engagement.
    By the end of this video, you'll be able to optimize your models for enhanced accessibility and performance, regardless of the hardware at your disposal.
    👍 If you find this video helpful, please hit the Like button, as it helps me reach more people with this content. Don't forget to Subscribe for more tutorials like this and hit the Bell icon to get notified every time I upload a new video.
    Let's embark on this exciting journey together and unlock the full potential of AI, making it more efficient and accessible to all. Thank you for watching, and let's innovate together!
    GitHub Repo: github.com/AIA...
    Join this channel to get access to perks:
    / @aianytime
    To further support the channel, you can contribute via the following methods:
    Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
    UPI: sonu1000raw@ybl
    #ai #llm #generativeai

Комментарии • 70

  • @sheikhakbar2067
    @sheikhakbar2067 4 месяца назад +1

    One of the best videos on quantization, thanks a lot for this great upload and content.

  • @rs832
    @rs832 5 месяцев назад

    ​@AIAnytime your content is excellent, thank you very much for taking the time to provide us with content that actually helps to advance our Ai knowledge and understanding.

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      Glad it was helpful!

  • @DataScienceandAI-doanngoccuong
    @DataScienceandAI-doanngoccuong 3 месяца назад

    - Process of converting any LLM to GGUF format that we'll be able to rent on CPU.
    - You can do this task even on T4 or even on a CPU machine as well because I'm going to take a small LLM and then do that. (0:02:00)
    - Now for quantization you can quantize up to two bits. (0:06:00)

  • @user-ff3yn7uq5v
    @user-ff3yn7uq5v 5 месяцев назад +4

    i reached out to you, how come you ignored me?

  • @sigurdmagnusson1533
    @sigurdmagnusson1533 Месяц назад

    спасибо большое мужик

  • @cloudsystem3740
    @cloudsystem3740 5 месяцев назад +3

    thank you can i have the colab file ?

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      GitHub repo is in description

  • @rewatiramansingh3534
    @rewatiramansingh3534 2 месяца назад

    it says, methods is not defined

  • @adityasharma8714
    @adityasharma8714 2 месяца назад

    Why did you first convert to FP16 and not directly to Q4_K_M?

  • @ayushjadia6527
    @ayushjadia6527 Месяц назад

    How can I host this model on llam.cpp server

  • @sqlsql4304
    @sqlsql4304 2 месяца назад

    hi, I have been watching your videos for quite some time now. found them very useful and appreciate your effort and dedication to come with great videos. Just a question on GPU's. Now that most of the Laptops and desktops come with AMD Graphics card will they work well for Quantized models, your inputs please

  • @user-rs2pb5ci3v
    @user-rs2pb5ci3v 5 месяцев назад +1

    for m in methods:
    qtype = f"{quantized_path}/{m.upper()}.gguf"
    os.system("./llama.cpp/quantize "+quantized_path+"/FP16.gguf "+qtype+" "+m)
    when i run this line nothing happen :(

    • @ueka24
      @ueka24 4 месяца назад

      same happen to me, anyone can help?

    • @soumyaneelsarkar2901
      @soumyaneelsarkar2901 3 месяца назад

      Did you figure out this issue.??
      I am facing the same issue.

    • @ueka24
      @ueka24 3 месяца назад

      ​@@soumyaneelsarkar2901 I dont really know where I am wrong but when I use his .ipynb file from the github it work just fine

    • @olqub
      @olqub 2 месяца назад

      @@soumyaneelsarkar2901 @weka5286 Check whether you have a GPU connection established. I was facing the same issue, but connecting to a GPU in Colab (T4) did the trick

    • @rewatiramansingh3534
      @rewatiramansingh3534 2 месяца назад

      @@soumyaneelsarkar2901 was it solved?

  • @devangpagare976
    @devangpagare976 5 месяцев назад

    Hey, I tried to use this on starcoder-7B, but I got an error while converting FP16.gguf to our quant type. Can you please help?

  • @prabalkuinkel4893
    @prabalkuinkel4893 Месяц назад

    In llama.cpp there is no executable filed named as quantize . So when we run this code :
    for m in methods:
    qtype = f"{quantized_path}/{m.upper()}.gguf"
    os.system("./llama.cpp/quantize "+quantized_path+"/FP16.gguf "+qtype+" "+m)
    Nothing happens and the Q4_K_M.gguf file is not being created ..
    Any solutions ???????????

  • @matthewchung284
    @matthewchung284 5 месяцев назад

    I agree. Excellent.

  • @user-xs7ve3uu7n
    @user-xs7ve3uu7n 5 месяцев назад

    Last part of the vidoe you told that we have to make changes for other models...Can u specify where should we change?...i dont mean the basic things like model name and stuff...

  • @Pets_products
    @Pets_products 3 месяца назад

    thanks a lot Sir, what if i use pre-quantized LLM and fine tuned for specific purpose , is we need to quantized after fine tuning

  • @rewatiramansingh3534
    @rewatiramansingh3534 2 месяца назад

    i wanna learn about these in depth, can u tell me the sources pls?

  • @karthikgvs5608
    @karthikgvs5608 4 месяца назад +1

    i am getting assertion error while executing (!python llama.cpp/convert-hf-to-gguf.py ./original_model/ --outtype f16 --outfile ./quantized_model/FP16.gguf
    ) can you please help me with this

  • @NahaSanla
    @NahaSanla 23 часа назад

    Martinez Sharon Harris Jeffrey Anderson Anthony

  • @kidsmania573
    @kidsmania573 2 месяца назад

    can i quantize llama 3 using it of t4 colab?

  • @akjj24
    @akjj24 3 месяца назад

    Does this notebook work for my fine tunned Pegasus model ?

  • @_miranHorvat
    @_miranHorvat 5 месяцев назад

    You should focus on doing tutorials how to run everything locally. Does anyone really want to upload their data to this cloud providers?

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      Where do you see cloud providers in this case?

    • @_miranHorvat
      @_miranHorvat 5 месяцев назад

      @@AIAnytime It is hard for an AI noob to follow along not using google colab.

  • @i6od
    @i6od 5 месяцев назад +1

    Link?

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      GitHub repo is in description.

  • @mcmarvin7843
    @mcmarvin7843 5 месяцев назад

    Nice one

  • @user-iu4id3eh1x
    @user-iu4id3eh1x 5 месяцев назад +1

    Can I do this on my local cpu machine?

  • @balasrinivas8646
    @balasrinivas8646 5 месяцев назад

    @AIAnytime can you suggest the changes neded for T4 GPU to run on Colab

  • @sauravmohanty3946
    @sauravmohanty3946 5 месяцев назад

    The github repo is not opening . Can you check once ?

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      Could see. It was private. Made it public now.

  • @abhinav__pm
    @abhinav__pm 5 месяцев назад

    Bro, I want to fine-tune a model for a translation task. However, I encountered a ‘CUDA out of memory’ error. Now, I plan to purchase a GPU from AWS ec2 instance. How is the payment processed in AWS? They asked for card details when I signed up. Do they automatically process the payment?

    • @dossantos4415
      @dossantos4415 5 месяцев назад

      Are you using lora or any peft technique for fine-tuning ? Cos usually using this you can fine tune your model on a free colab t4 gpu

    • @abhinav__pm
      @abhinav__pm 5 месяцев назад

      But lora and qlora doesn't giving a better result when it comes in my translation task. @@dossantos4415

    • @abhinav__pm
      @abhinav__pm 5 месяцев назад

      I want to use the p3.8xlarge instance for its 64 GB GPU memory. However, it has 4 GPUs, each with 16 GB of memory, totaling 64 GB. So, my question is: Can I use the entire 64 GB to train my LLM model, or is only 16 GB available for training?@@dossantos4415

  • @IIT_YTT
    @IIT_YTT 5 месяцев назад

    Please make video on how to use llama factory on TPU

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      Llama factory video is on the channel already. TPU not recommended.

    • @IIT_YTT
      @IIT_YTT 5 месяцев назад

      @@AIAnytime is TPU not recommended for Fine Tuning? If Yes then what is the reason ?

  • @andres.yodars
    @andres.yodars 5 месяцев назад

    awesome

  • @Jeganbaskaran
    @Jeganbaskaran 5 месяцев назад

    convert the original llm to gguf, are we losing any data during compression and any accuracy degradation?

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      Performance tradeoff is common.

    • @Jeganbaskaran
      @Jeganbaskaran 5 месяцев назад

      @@AIAnytime Thank you.. Waiting for any Knowledge graph usecase and implementations

  • @user-cc3ev7de9v
    @user-cc3ev7de9v 5 месяцев назад

    sir please provide your code with every video so that we can experiment too

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      GitHub repo is in description...

  • @shantanugote
    @shantanugote 5 месяцев назад

    Thanks you so much can you share colab file

    • @AIAnytime
      @AIAnytime  5 месяцев назад +1

      GitHub repo is in description.

  • @parkersettle460
    @parkersettle460 5 месяцев назад

    Hey how do I get into contact with you?

    • @AIAnytime
      @AIAnytime  5 месяцев назад

      LinkedIn? Sonu Kumar

    • @parkersettle460
      @parkersettle460 5 месяцев назад

      @@AIAnytime I added you; Parker Settle. I am working on a project for a big client, and could use your help -thanks!