QLoRA paper explained (Efficient Finetuning of Quantized LLMs)

Поделиться
HTML-код
  • Опубликовано: 20 янв 2025

Комментарии • 37

  • @pierluigiurru962
    @pierluigiurru962 8 месяцев назад +2

    Your videos on LoRA finally made the concepts click for me. It was clearly explained! Thank you for the content you make

    • @AIBites
      @AIBites  8 месяцев назад

      Glad it helped. Welcome 😊

  • @haz5248
    @haz5248 11 месяцев назад +1

    That was really well explained with intuitative diagrams and explination. Thanks for the video , just subscribed .

    • @AIBites
      @AIBites  11 месяцев назад

      thank you! :)

  • @IgorAherne
    @IgorAherne 5 месяцев назад +3

    Thank you, that's a beautiful explanation!
    One thing I struggle understanding, is the term quantization blocks in 4:30 - why we need several of them.
    In my understanding from the video, we ponder about using 3 blocks of 16 bits to describe a number. Which is 48 bits and is more expensive than 32-bit float.
    But couldn't we just use 16*3 = 48 bits per number instead? Using 48 bits (without splitting it) would give us a very high precision within [0,1] range, due to powers-of-two
    I did ask GPT, and it responded that there exists a 'Scale Factor' and a 'Zero-Point', which are constants that shift and stretch the distribution in 6:02
    Although I do understand these might be those quantization constants, - I am not entirely sure what the 64 blocks described in the video are 6:52
    Is this because of the Rank of Matrix-Decompositions is 1, with 64 entries in both vectors?

  • @prathameshdinkar2966
    @prathameshdinkar2966 2 месяца назад +1

    Nicely explained! Keep the good work going!! 🤗

    • @AIBites
      @AIBites  2 месяца назад

      Thank you 🙂

    • @AIBites
      @AIBites  2 месяца назад +1

      Are you interested in more of theory or hands on implementation style videos? Your input will be very valuable 👍

    • @prathameshdinkar2966
      @prathameshdinkar2966 2 месяца назад

      @@AIBites I'm interested in more videos on concept understanding as the implementations are easily available

    • @iamrubel
      @iamrubel 2 месяца назад

      @@AIBites Yes. We all want

  • @cacamaricano
    @cacamaricano 10 месяцев назад +2

    Thanks for connecting the dots!

    • @AIBites
      @AIBites  10 месяцев назад

      glad you liked! :)

  • @vuluu4942
    @vuluu4942 Год назад +1

    Thank you for the explanation! I find that it's very helpful!

  • @SudarakaYasindu
    @SudarakaYasindu 5 месяцев назад +1

    Awesome explanation! ❤

    • @AIBites
      @AIBites  4 месяца назад

      glad you think so and thank you indeed :)

  • @chadyonfire7878
    @chadyonfire7878 16 дней назад +1

    Neat explanation

    • @AIBites
      @AIBites  11 дней назад

      Glad you think so!

  • @huitangtt
    @huitangtt 8 месяцев назад +1

    Very well explained

    • @AIBites
      @AIBites  8 месяцев назад

      Thanks so much 😊

  • @JaishreeramCoder
    @JaishreeramCoder 9 месяцев назад +1

    amazing explanation

    • @AIBites
      @AIBites  9 месяцев назад

      Glad you think so!

  • @wilfredomartel7781
    @wilfredomartel7781 Год назад +1

    Waiting to see it.

  • @yeduniya657
    @yeduniya657 Год назад +2

    Hey. I need your help. I have a curated set of notes and books and I wish to use it to finetune a model. How can it be done?

    • @AIBites
      @AIBites  11 месяцев назад +1

      would you like to see a fine-tuning video on text data? Would that be useful? Do you have any suggestions on the dataset I can show fine-tuning on?

    • @yeduniya657
      @yeduniya657 11 месяцев назад

      @@AIBites Yes, I have suggestion. Finetuning a model on my journal in which I have written about the truth of nonduality and illusionary nature of reality. I am also actively curating books on truth, and would love your help.

    • @haz5248
      @haz5248 11 месяцев назад

      @@AIBites That would be very help. There arent many good videos on fine tune out there.

    • @AIBites
      @AIBites  10 месяцев назад

      hope the fine-tuning video was of some help

    • @AIBites
      @AIBites  10 месяцев назад

      hope the fine-tuning video was of some help

  • @yayasy1362
    @yayasy1362 5 месяцев назад +3

    I don’t understand why you say that LoRA is fast for inference… in any case you need to forward through the full rank pretrained weights + low-rank finetuned weights.

    • @AIBites
      @AIBites  4 месяца назад

      ah yes. If only we could quantize the weights, we can do better than the pre-trained weights. You are making a fair point here. Awesome and thank you! :)

    • @yayasy1362
      @yayasy1362 4 месяца назад

      @@AIBites Yeah, if only we could replace the pretrained Full-Rank weights by the Low-Rank Weights... really nice video and illustrations! Thanks a lot!

  • @rahul.vpoojari6553
    @rahul.vpoojari6553 6 месяцев назад +1

    Thank you sire

    • @AIBites
      @AIBites  4 месяца назад

      my pleasure Rahul! :-)

  • @bharatbhusansau2996
    @bharatbhusansau2996 3 месяца назад +1

    Bro, your statement from 05:22 is completely wrong and misguiding.
    LoRA is used for finetuning LLM models, when full-finetuning is not possible. It does so by freezing all model weights, and incorporating and training low-rank matrices(A*B) in Attention modules.
    LoRA speeds up training and reduces memory requirements but does not provide a speedup during inference. If LLM model is too large to be handled by LoRA due to GPU memory limitations, Quantized LoRA is used to finetune the model. Overall, QLoRA is a more advanced solution when LoRA alone cannot handle large models for finetuning.

    • @AIBites
      @AIBites  2 месяца назад

      Thanks for your feedback. I think we are pretty much on the same page. Can you be more specific what I am wrong with? Unfortunately I won't be able to edit the video but can at leaset pin a message to viewers pointing the errata