Low-rank Adaption of Large Language Models Part 2: Simple Fine-tuning with LoRA

Поделиться
HTML-код
  • Опубликовано: 4 окт 2024

Комментарии • 81

  • @gnostikas
    @gnostikas Год назад +14

    You seem to be the kind of ai expert that I am trying to become. Very impressive.

  • @LordKelvinThomson
    @LordKelvinThomson Год назад +8

    At least as good and at times better that every other equivalent tutorial on the subject at this time.

  • @datasciencetoday7127
    @datasciencetoday7127 Год назад +3

    Mind blown into 3 billion pieces

  • @Umuragewanjye
    @Umuragewanjye Год назад +2

    Hi Chriss! Thanks for the course. i want to learn more. May God bless you 🤲

  • @television9233
    @television9233 Год назад +8

    Very Cool
    Huggingface has done so much of the heavylifting for us, they are actually amazing.
    Also, when I first heard about LoRa I thought the implementation was complicated (utilizing some efficient SVD or other numerical methods to achieve the decomposition of the full weight update matrix) turns out it literally just starts with the two smaller matrices and backprop does all the work lol

    • @chrisalexiuk
      @chrisalexiuk  Год назад +1

      Backprop coming to the rescue again!

  • @waynesletcher7470
    @waynesletcher7470 11 месяцев назад +1

    Oh, Wise Wizard, I bow before your might. Please, continue to guide me.

  • @РыгорБородулин-ц1е

    Ok, you got me absoutely amused by the results.
    Also, thanks for showing that there's lora library out there : I tried to do it on my own

  • @afifaniks
    @afifaniks Год назад

    Very intuitive! I didn't even yawn throughout the whole video lol. Keep up the good work! :)

  • @andriihorbokon2015
    @andriihorbokon2015 Год назад +2

    Great video! So much passion, love it.

  • @danraviv7393
    @danraviv7393 Год назад +2

    Thanks for the video, it was very useful and clear

  • @DreamsAPI
    @DreamsAPI Год назад +3

    Subscribed and Thumbs up, appreciate the videos.

  • @MasterBrain182
    @MasterBrain182 Год назад +1

    Astonishing content Man 🚀

  • @tech-talks-with-ali
    @tech-talks-with-ali Год назад +1

    WoW! You are amazing man!

  • @nothing_is_real_0000
    @nothing_is_real_0000 Год назад +6

    Hi Chris! Really thank you so much for such a detailed tutorial. Loved every bit of it. In the time of big corporations trying to monopolise the technology, people like you give hope and knowledge to so many others! Really appreciate it. You've made the lora tutorial easy to understand.
    Just had a question. I guess you have answered it in someway already, but just wanted to confirm. GPT-2 is somewhat old, so does this method apply to GPT-2 also? I mean can we use GPT-2 model instead of Bloom?

  • @ENJI84
    @ENJI84 Год назад +2

    Amazing set of videos!
    Can you please update on the model that is doing text-to-SQL that you've mentioned? This is very important to me :)

  • @ryanbthiesant2307
    @ryanbthiesant2307 Год назад +1

    Who, what, where, why and when. I am grateful for your video. Please can you give a use case. And start your videos with the end in mind.

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Absolutely! I'll seek to do that going forward!

    • @ryanbthiesant2307
      @ryanbthiesant2307 Год назад +1

      @@chrisalexiuk thanks for not taking offence. I have ASD and ADHD. Super hard to focus without an idea of what you are making and what problem you are trying to solve. apologies for the directness.

  • @sagardesai1253
    @sagardesai1253 Год назад +2

    informative video,
    can suggest some GPU compute resource. Aim is to implement the learnings. would like to know cheapest possible resource.

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Lambda Labs has great prices right now, otherwise Colab Pro is an affordable and flexible option.

  • @PCPTMCROSBY
    @PCPTMCROSBY Год назад +1

    trying to get Some people interested in product development and modification but they have requirements for material can't leave the building that means no internet everything has to be done in our machines in house we can't share it with collab or anybody it would be nice if you did more shows related to that subject of keeping complete control material because there are so many people that are just scared to death of breaches

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +3

    what is meant by causal language model? I assume it has nothing to do with the separate field of Causal AI.

    • @chrisalexiuk
      @chrisalexiuk  Год назад +5

      A causal language model is a model that predicts the next token in the series. It only looks at tokens on the "left" or "backward" and cannot see future tokens.
      It's confusing because, as you noted, it has nothing to do with Causal AI.

  • @mchaney2003
    @mchaney2003 Год назад +1

    What are the ways you mentioned to more efficiently teach a model new knowledge rather than new structures?

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      You'd be looking at something like continued pre-training. I perhaps misspoke by saying "more efficient", I meant to convey that LoRA might not be the best solution for domain-shifting a model - and so there are more *effective* ways to domain-shift.

  • @vita24733
    @vita24733 Год назад +2

    Hi Chris, the block of code with "model.gradient checkpointing enabled()" which increases stability of model. Have you made any previous videos where I can read and learn about this. If not, are there any resources you would reccomend to understand this.

    • @chrisalexiuk
      @chrisalexiuk  Год назад +1

      Basically, you can think of it this way:
      As we need to represent tinier and tinier numbers - we need more and more exponents. There are a number of layers which tend toward very tiny numbers, and if we let those layers stay in 4bit/8bit it might have some unintended side-effects. So, we let those layers stay in full precision so as to not encounter those nasty side-effects!

    • @vita24733
      @vita24733 Год назад

      @@chrisalexiuk ohhh ok understood. This was by far the clearest explanation abt this. Thank you!

  • @omercelebi2012
    @omercelebi2012 Год назад +2

    Thanks for sharing this tutorial. I get 'IndexError: list index out of range' when reading from hub, I just copied and pasted code, it happens 6th progress bar. Any solution? Model: bloom-1b7

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Could you share with me your notebook so I can determine what the issue is?

  • @datasciencetoday7127
    @datasciencetoday7127 Год назад +2

    hi chris can you make a video on this or give me some pointers?
    scaling with langchain, how to have multiple sessions with LLM, meaning how to have a server with the LLM and serve to multiple people concurrently. What will be the system requirements to run such a setup. I believe we will be needing kubernetes for the scaling

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      You'll definitely need some kind of load balancing/resource balancing. I'll go over some more granular tips/tricks in a video!

  • @UmarJawad-w4e
    @UmarJawad-w4e Год назад +2

    Amazing work. Can you put up something similar for fine-tuning MPT-7B model?
    I switched the model to MPT-7B but I keep getting this error during training "TypeError: forward() got an unexpected keyword argument 'inputs_embeds'". I am scratching my head but cant seem to figure out what went wrong.

  • @honglu679
    @honglu679 3 месяца назад +1

    Thx for great video! so what is the better way to teach a model new knowledge, if FT is somehow only good for structure? thx much!

    • @chrisalexiuk
      @chrisalexiuk  Месяц назад

      Continued Pre-Training or Domain Adaptive Pre-Training!

  • @shaw5698
    @shaw5698 Год назад +3

    Sir, Is it possible to share the colab notebook? For Extractive QA, How we will evaluate and compare with other models? Like, EM and F1, how we will implement those and compare with other Bert or llm? models

    • @chrisalexiuk
      @chrisalexiuk  Год назад +1

      Yes, sorry, I will be sure to update the description with the Notebook used in the video!

    • @shaw5698
      @shaw5698 Год назад

      @@chrisalexiuk Thank you, it will be very much appreciated.

    • @prospersteph
      @prospersteph Год назад

      @@chrisalexiuk we will appreciate it

    • @chrisalexiuk
      @chrisalexiuk  Год назад +1

      colab.research.google.com/drive/1GzHdbIarvnRee_Ix9bdhx1a1v0_G_eqo?usp=sharing

  • @Robo-fg3pq
    @Robo-fg3pq 9 месяцев назад +2

    Getting "ValueError: Attempting to unscale FP16 gradients." when running the cell with trainer.train(). Any idea?

    • @shashankjainm5009
      @shashankjainm5009 5 месяцев назад +1

      Even i'm getting the same error for "bloom-1b7". Did your problem resolved ?

    • @Jithendra0001
      @Jithendra0001 3 месяца назад +1

      @@shashankjainm5009 I am getting the same error. did you fix that??.

  • @sarabolouki
    @sarabolouki Год назад +1

    Thank you for the great tutorial! How do we set that we only want to fine-tune query_key_value and the rest of the weights are frozen?

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      By using the adapter method, you don't need to worry about that! The base model will remain frozen - and you will not train any model layers.

  • @98f5
    @98f5 11 месяцев назад +2

    any chance you can make an example of fine tuning code llama like this

    • @chrisalexiuk
      @chrisalexiuk  11 месяцев назад +1

      I might, yes!

    • @98f5
      @98f5 11 месяцев назад

      @chrisalexiuk itd be greaty appreciated. There is almost no implementation docs or examples around for using lora 😀

  • @Neuralbench
    @Neuralbench Год назад +1

    Hey Chris, Awesome video! Thank you for it. Can you please help me out here. I am using your notebook but when it do the model.push_to_hub then , adapter_config.json and adapter_model.bin are not being uploaded to the hugging face , instead i only see
    1. generation_config.json
    2. pytorch_model.bin
    3. config.json
    What am i doing wrong here?

    • @Neuralbench
      @Neuralbench Год назад +1

      I figured out the problem , it was this line
      model = model.merge_and_unload() after the training

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Yes! Sorry, Adil!
      We are only pushing the actual *LoRA* weights to the hub - and merging the model back will mean that the entire model will be pushed to hub.
      Great troubleshooting!

  • @kartikpodugu
    @kartikpodugu Год назад +1

    Amazing.
    I tried this on my desktop which has NVIDIA GeForce 3060. And, I was able to run only 6 steps.
    On windows I wasn't able to run at all as i am facing some issues with bitsandbytes library.
    Also, I used bloom1b7.
    But, after doing all the exercise, i see that the output generated doesn't stop after CONTEXT, QUESTION and ANSWER, it keeps generating some text which includes EXAMPLE and so on.
    Though the notebook adds bitsandbytes at the start using "import bitsandbytes as bnb", bnb is not used anywhere.
    So, I thought commenting that line out will make my script work on windows, but no, even without the line the script that i wrote mimicking your colab notebook, didn't work on windows.
    Can you tell me how the notebook depends on bitsandbytes?

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Bitsandbytes is leveraged behind the scenes through the HuggingFace library.

  • @РыгорБородулин-ц1е

    Tried this and it's interesting that 3b/7b1 bloom models perform WORSE on my test questions after this training, than bloom 1b1

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Hmmmm. That's very interesting!

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      I wonder specifically why, it would be interesting to know!

    • @РыгорБородулин-ц1е
      @РыгорБородулин-ц1е Год назад

      @@chrisalexiuk I didn't change other parameters though. maybe rank and batch size should be higher for higher param count models

    • @РыгорБородулин-ц1е
      @РыгорБородулин-ц1е Год назад

      @@chrisalexiuk man, it gets more weird now. I tried doing more steps with smaller learning rate, smaller batch size, on a bigger model. It started adding explanation sections and generating, well, explanations.
      bloom 3b

    • @gagangayari5981
      @gagangayari5981 Год назад

      @@РыгорБородулин-ц1е What was the learning rate you were using? Is it the same as mentioned in BLOOM paper? Also what is the current learning rate ?

  • @maxlgemeinderat9202
    @maxlgemeinderat9202 11 месяцев назад +1

    Great video! What wpuld be different if i do download the model not on colab but locally? Which lines do change in the code?

    • @chrisalexiuk
      @chrisalexiuk  11 месяцев назад

      You should be able to largely recreate this process locally - but you would need to `pip install` a few more dependencies. You can find which by looking at what the colab environment has installed - or using a tool like pipreqs!

  • @alexandria6097
    @alexandria6097 10 месяцев назад

    Do you know how much GPU RAM the meta-llama/Llama-2-70b-chat model would take to fine-tune?

  • @akashdeepsoni
    @akashdeepsoni Год назад

    Thanks for explaining the implementation in such an easy way.
    I wanted to play around with this and I used the free tier google colab with TU-GPU and used the smaller "bigscience/bloom-1b7" model. The inference method make_inference(context, question) is giving me below error. Is this because of using the free-tier GPU, though training and all the previous steps were executed without any issues. Would be great if you can shed some light on this !
    Error :
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

  • @davidromero1373
    @davidromero1373 11 месяцев назад

    Hi a question, can we use lora to just reduce the size of a model and run inference, or we have to train it always?

    • @chrisalexiuk
      @chrisalexiuk  11 месяцев назад

      LoRA will not reduce the size of the model during inference. It actually adds a very small amount extra - this is because the memory savings come from reduced number of optimizer states.

  • @chrism315
    @chrism315 Год назад +1

    The notebook linked doesn't match the one used in the video. Is the notebook in the video available somewhere?
    Thanks, great video!

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Ah, so sorry! I'll resolve this ASAP.

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      I've updated the link - please let me know if it doesn't resolve your issue!
      Sorry about that!

  • @ilya6889
    @ilya6889 Год назад

    Please don't scream 😬

  • @ArunKumar-bp5lo
    @ArunKumar-bp5lo Год назад

    facing KeyError: 'h.0.input_layernorm.bias' when downloading from the hub

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Hmmm.
      Are you using the base notebook?

    • @ArunKumar-bp5lo
      @ArunKumar-bp5lo Год назад

      @@chrisalexiuk yeah just changed the model to 1b7

    • @chrisalexiuk
      @chrisalexiuk  Год назад

      Could you try adding `device_map="auto"` to your `.from_pretrained()` method?
      Also, are you using a GPU enabled instance for the Notebook?