Fine-tuning LLMs with PEFT and LoRA

Поделиться
HTML-код
  • Опубликовано: 4 июн 2024
  • LoRA Colab : colab.research.google.com/dri...
    Blog Post: huggingface.co/blog/peft
    LoRa Paper: arxiv.org/abs/2106.09685
    In this video I look at how to use PEFT to fine tune any decoder style GPT model. This goes through the basics LoRa fine-tuning and how to upload it to HuggingFace Hub.
    My Links:
    Twitter - / sam_witteveen
    Linkedin - / samwitteveen
    Github:
    github.com/samwit/langchain-t...
    github.com/samwit/llm-tutorials
    00:00 Intro
    00:04 - Problems with fine-tuning
    00:48 - Introducing PEFT
    01:11 - PEFT other cool techniques
    01:51 - LoRA Diagram
    03:25 - Hugging Face PEFT Library
    04:06 - Code Walkthrough
  • НаукаНаука

Комментарии • 113

  • @impolitevegan3179
    @impolitevegan3179 Год назад +25

    This is great. Not so many channels on YT that do this kind of stuff. Would appreciate more like this, other frameworks like deepspeed, useful datasets, training parameters experiments, etc. so many interesting stuff that is not covered on YT.

  • @christopheprotat
    @christopheprotat Год назад +10

    Perfect balance of theory and hands-on with a colab attached to most of your videos. Much Much apreciated. I recommend this channel to all people who wants to follow this crazy trend of LLM releases. the best path to keep all of us up to date! I learn so much thanks to you Sam. Thanks a ton. Keep moving forward.

  • @autonomousreviews2521
    @autonomousreviews2521 Год назад +3

    You continue to make videos on exactly the things I'm trying to understand more deeply! Fantastic! There are a lot of detailed parameters in this video that you could certainly continue to elaborate on for those of us who aren't programmers...yet :) Looking forward to more of your vids!

  • @kennethleung4487
    @kennethleung4487 Год назад +4

    Awesome! been waiting for your take on this topic

  • @victarion1571
    @victarion1571 Год назад

    Sam, thanks for giving your audience their requests! The alpaca training video you made makes much more sense now

  • @briancase6180
    @briancase6180 Год назад +18

    So this seems like the basis for a business: offer to train a custom model for product documentation, FAQ, etc with a specific product or company focus. Cool!

    • @Hypersniper05
      @Hypersniper05 Год назад

      Or close domain semantic search with summarization

    • @handsanitizer2457
      @handsanitizer2457 Год назад

      ​@E Marrero can you explain that a bit more. I'm new to the machine learning space

    • @Hypersniper05
      @Hypersniper05 Год назад +2

      @@handsanitizer2457 It's a bit too much to explain here but search in youtube for "openai embeddings" or "embedding searches" and you will have a general idea of how models can be used for searches, not only for open ai but other open source models as well. Fine tuning a model on close domain will help it understand your company's data better. You can also fine tune it to reply back in a certain way which opens the door to many options. Chatgpt was trained this way but more in conversational outputs

    • @ArjunKrishnaUserProfile
      @ArjunKrishnaUserProfile Год назад

      Does chatbase use this technique? It does the training on website or file data very fast.

    • @Hypersniper05
      @Hypersniper05 Год назад +2

      @@ArjunKrishnaUserProfile I am pretty sure it doesn't train the model , that would be way more expensive than embedding

  • @nacs
    @nacs Год назад +1

    Many have said it but I'll reiterate -- your LLM videos are really great to watch, both the pace and the way you go from high level overviews to the detailed info.
    I also appreciate that it's not just focused on ChatGPT/GPT-4/hosted-models all the time and talks more about local training/finetuning/inferencing.

  • @PattersML
    @PattersML 10 месяцев назад

    Awesome explanation, this is exactly what I was looking for. Thank you!

  • @notanape5415
    @notanape5415 9 месяцев назад

    Thanks for the awesome explanation. Going to binge your videos.

  • @coolmcdude
    @coolmcdude Год назад

    i would love to see more videos about this and showing people how we could adapt this to our own projects and maybe even a video about 4bit tuning.

  • @saracen9
    @saracen9 Год назад +1

    Awesome stuff Sam. I’m in the process of using langchain to build a vector store and - whilst it’s fine for now - would be really interested in understanding the best way to then take this and use to generate a LORA. Feels like the logical next step.

  • @kaiman99919
    @kaiman99919 11 месяцев назад

    Thank you! Be great to see more on the data section - everyone always seems to gloss over that part, despite the fact that is clearly the most important part. Seen a lot of (from diff youtubers) 20-40 min vids on the configuration, barely mentioning the actual use of the data?

  • @autonomousreviews2521
    @autonomousreviews2521 Год назад +3

    I would love a vid covering examples of the differently formatted types of datasets that can be used to train a lora and the types of abilities that the different kinds of dataset training will allow - or put another way - what kinds of behavioral changes in abilities can we use lora to fine-tune for in a model, and how do we then know what types of data formatting to use in order to get a chosen outcome. :D

  • @sundarramanp3057
    @sundarramanp3057 Год назад +2

    Can you create more videos on instruction-prompt-tuning as well, as a further extension to this video? Amazing work!

  • @Secplavory-Wei
    @Secplavory-Wei Год назад

    This really useful, Thank you!

  • @quebono100
    @quebono100 Год назад +1

    Wow thank you for you work

  • @JonathanYankovich
    @JonathanYankovich Год назад +2

    I’d love a quick video like this on how to use checkpoints from PEFT training to do inference.
    When I’m training, I’m never sure how much is too much, and I can save checkpoints automatically easily to resume in case training stops.
    What I need to learn is how to use these checkpoints with the base model to do inference so I can test output quality against several checkpoints.
    Ideally I’d like to be able to do inference on a base model plus checkpoint, and then once I find a good result, merge the checkpoint into the base model so I can use it in production and keep VRAM low. (I am assuming inference on base model + checkpoint will use more vram)

  • @geekyprogrammer4831
    @geekyprogrammer4831 11 месяцев назад

    Very underrated channel. You deserve more viewers and subs.

  • @chavita4321
    @chavita4321 Год назад

    So badass. Thanks!

  • @abhirj87
    @abhirj87 11 месяцев назад

    very useful!! thanks a ton

  • @caiyu538
    @caiyu538 7 месяцев назад

    Great lectures.

  • @wilfredomartel7781
    @wilfredomartel7781 Год назад

    Excellent!

  • @definty
    @definty 8 месяцев назад

    Hey Sam, Thanks for the great informative video as always!
    Do you know of a way to see which neurons get activated during training? I am because I was thinking of ways to reduce the big models and the most obvious way I could think of would be to view which neurons are getting activated when training especially with falcon 170b, even 32b is to big for me and considering I don't need multiple languages I was hoping this would be a good approach to reduce the size of models?
    It would be cool to see a Brain Surgeon type debugger for LLMs. It would be good to run a different training datasets through different llms to see which neurons get activated and which ones do not and ideally have a way to disable them during inference to test and measure the differences of the output.

  • @user-lw1zt4ov3i
    @user-lw1zt4ov3i 7 месяцев назад

    This is a great way to understand how we can fine-tune a text classification task using an LLM. I want to know if there is a method through which we can make the LLM learn from data in JSON format, where there are multiple labels for information retrieval or conversational recommendation tasks.

  • @JonathanYankovich
    @JonathanYankovich Год назад +6

    These fine-tuning-related topics are especially relevant to me right now. Currently training llama-30b variants at 4-bit. I’m very interested in how to roll adapters/checkpoints back into base models to keep VRAM usage down during inference (under 24GB)

    • @MridulSharmaMID
      @MridulSharmaMID Год назад

      Hi I am also interested. Can we connect via email?

    • @PavanAtGrowexx
      @PavanAtGrowexx 5 месяцев назад

      Hey, I am also facing the same issue, did you find any update and could help me out please?

  • @joshmabry7572
    @joshmabry7572 Год назад

    This is gold! Thank you!

    • @joshmabry7572
      @joshmabry7572 Год назад

      I'm looking to train the Wizard-Vicuna models but run into `ValueError: The following `model_kwargs` are not used by the model: ['token_type_ids']`

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      This could be because they have already folded a LoRA in there or the base model setup is different.

  • @pawe460
    @pawe460 Год назад

    How does LoRA differ from transfer learning? If I understand correctly TL means adding additional layers onto frozen pre-trained network and training it on new dataset, right?

  • @nguyenanhnguyen7658
    @nguyenanhnguyen7658 11 месяцев назад

    Great quick tutorial. This is good for English-only pretraining/fine-tuning. What is about non-English ? What are steps should we take to (1) extend vocab (2) pretraining (with or without LoRA) free-non-structure-text corpus (3) fine-tune with LoRA for each task ... ! Would love to have your tutorial on this road, it would be great. Thanks, Steve.

  • @bookfastorg
    @bookfastorg Год назад

    How to re-train it with additional data? Great video!

  • @rajivmehtapy
    @rajivmehtapy Год назад +2

    Very rare videos found on youtube for this topic.

  • @yth2011
    @yth2011 10 месяцев назад

    Thanks a lot~

  • @edd36
    @edd36 Год назад

    Hey, sorry I late to the party. I tried to load my LoRA model but when I checked the weights, the weights are the same with the original model. Is it supposed to do that? I already checked with my after-trained model and yes the weights is different.

  • @clementvanpeuter1742
    @clementvanpeuter1742 Год назад

    Love It.

  • @selinatian6607
    @selinatian6607 11 месяцев назад

    very great tutorial! with the saved pretrained model, how do we make prediction for classification problems?

    • @samwitteveenai
      @samwitteveenai  11 месяцев назад

      You can do that with a much simper model like BERT etc. or a T5 or structure the data to do it with the causal LM

  • @Aldraz
    @Aldraz Год назад

    How many examples is necessary in dataset for it to learn the certain pattern? With OpenAI you are fine with just 200 examples, which I don't think would work here.

  • @ronyosef3806
    @ronyosef3806 Год назад

    Hi Sam, thanks for the great video. I got a general question you might know the answer to. If I freeze pre-trained model weights (for example, BERT) and then train a classifier on top of its embeddings, does that called fine-tuning? If the weights are unfrozen, I know this can be called fine-tuning.

    • @samwitteveenai
      @samwitteveenai  Год назад

      you can freeze some of the weights and tune the top layer etc and it is fine tuning yes.

  • @MariuszWoloszyn
    @MariuszWoloszyn Год назад

    LoRa is not adding additional weights. Although it might seems so while training, at inference there are no additional parameters. It acts more like diff and patch (though in vector space).

  • @tawnkramer
    @tawnkramer Год назад

    Does anyone know the proper settings for generation with the story model? Mine tends to start ok and then becomes word spew halfway though.

  • @ArunkumarMTamil
    @ArunkumarMTamil Месяц назад

    how is Lora fine-tuning track changes from creating two decomposition matrix? How the ΔW is determined?

  • @theunknown2090
    @theunknown2090 Год назад

    Hey man great video, hd a question do.u think a 500 m or 1b model could give good results similar to alpaca. What would be the smallest size a model can follow instructions?

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      Its a really interesting question and something I am currently doing research on. 500 is probably too small. 1.5B things get a bit more interesting. The big challenge with smaller models is you can't expect them to know facts correctly. So you want to use them more as retrieval generation models. They can do language but need to have the facts and context fed in at generation time etc.

    • @theunknown2090
      @theunknown2090 Год назад

      The cerebras-gpt models are really fast compared to gpt2, gpt-neo in inference like a cerebras2.7b inference speed is almost equal to gpt1.5b and gptneo 1.3B

  • @ifeanyiidiaye1889
    @ifeanyiidiaye1889 9 месяцев назад

    How do you handle "CUDA out of memory" error in free Colab notebook?

  • @returncode0000
    @returncode0000 10 месяцев назад

    5:23 is it possible to train this on a Nvidia RTX 4090 FE (24GB RAM)?

  • @ShlomiSchwartz
    @ShlomiSchwartz 10 месяцев назад

    Hi Sam, thank you for the video. I'm getting
    RuntimeError: expected scalar type Half but found Float running in Colab with GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-a971aa0c-5408-727a-3b72-48b1926b5f66)
    On the training loop, what am i missing?

    • @ShlomiSchwartz
      @ShlomiSchwartz 10 месяцев назад

      It was a GPU issue, switching GPUs fixed it

    • @samwitteveenai
      @samwitteveenai  10 месяцев назад +1

      YeahI don't think the Bitsnbytes fully supports v100 GPUs I have had issues with it in the past.

  • @tisajokt7676
    @tisajokt7676 Год назад +1

    If the only difference is in these added-on weights, is it possible to run multiple distinct finetuned models at the same time without duplicating the shared base pretrained model in memory?

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      Yes this is trick we are working on for production. You have multiple LoRA weights for different tasks etc. Very much beyond the scope of here though.

  • @wilfredomartel7781
    @wilfredomartel7781 Год назад

    Maybe a tutorial to integrate langchain with flan but accesing an api rest to query data.

  • @richardrgb6086
    @richardrgb6086 10 месяцев назад

    Hello! Can you fine-tuning T5?

  • @haticeobuz9081
    @haticeobuz9081 Год назад +1

    Hi, I have a question for you. When/or will you be uploading the video about seq2seq models? I would like to see that one as well!

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      Yes I promised this and I will get to it. Will try to do it this week. Please remind me if I don't. Too many new LLMs and cool papers being released :D

    • @haticeobuz9081
      @haticeobuz9081 Год назад

      @@samwitteveenai Okay, thank you so much.

  • @yashjain6372
    @yashjain6372 10 месяцев назад

    Very informative!!!! does fine tunning with qlora/lora does support this kind of dataset? If not, what changes should i make in my output dataset?
    Review(col1)
    Nice cell phone, big screen, plenty of storage. Stylus pen works well.
    Analysis(col2)
    [{“segment”: “Nice cell phone”,“Aspect”: “Cell phone”,“Aspect Category”: “Overall satisfaction”,“sentiment”: “positive”},{“segment”: “big screen”,“Aspect”: “Screen”,“Aspect Category”: “Design”,“sentiment”: “positive”},{“segment”: “plenty of storage”,“Aspect”: “Storage”,“Aspect Category”: “Features”,“sentiment”: “positive”},{“segment”: “Stylus pen works well”,“Aspect”: “Stylus pen”,“Aspect Category”: “Features”,“sentiment”: “positive”}]

  • @JosePablo2008
    @JosePablo2008 9 месяцев назад

    What is the minimum GPU RAM Memory to run this code?
    I think I need a new GPU to run this on my local machine

  • @Fearfulful
    @Fearfulful Год назад

    can you edit LoRa to LoRA in the tiitle? I was really confused for a second saying to myself what does long range radio do with LLMs

  • @ojaskulkarni8138
    @ojaskulkarni8138 16 часов назад

    11:43 max_steps here are 200, how much max_steps do we usually set for proper finetuning? Please someone help me

  • @micbab-vg2mu
    @micbab-vg2mu Год назад +1

    In the past, I tried fine-tuning some GPT models, but the results weren't good. Maybe this new technique will give me a better outcome

    • @samwitteveenai
      @samwitteveenai  Год назад +3

      fine-tuning comes down a lot to what you are tuning on and how much etc. LoRa has a lot of advantages and certainly worth a try.

  • @yth2011
    @yth2011 8 месяцев назад

    what is the difference between lora and embedings?

  • @debashisghosh3133
    @debashisghosh3133 Год назад

    In LoraConfig() method r is not the number of attention head instead it is the rank of the matrix that your are decomposing. From High Rank to LowRank. Here rank is 16.

  • @ranu9376
    @ranu9376 Год назад

    Great! Can we merge the peft weights with the actual weights and use it for inference? any downside to it other than the size? Also, wouldn't the weights get tampered if we save them locally instead and use it for inference?

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      yes you can do that. I might show that in a future video. no big downside for most use cases. Saving the LoRa weights locally as when you load them they will load the original weights as well. Not sure what you mean by tampered.

  • @nayakdonkey
    @nayakdonkey Год назад +1

    @samwitteveenai I encounter RuntimeError: expected scalar type Half but found Float while running the training script specified in the colab notebook. Can you please helpme with pointers to solve the error. I am running in Colab (GPU 0: Tesla V100-SXM2-16GB)

    • @samwitteveenai
      @samwitteveenai  Год назад +1

      Ok v100s had some problems with the 8bit part in the past, so it could be that.

    • @nayakdonkey
      @nayakdonkey Год назад

      @@samwitteveenai Thanks for the acknowledgement

    • @SubhamKumar-eg1pw
      @SubhamKumar-eg1pw Год назад

      @@nayakdonkey Were you able to solve the above RuntimeError? I am facing the same with V100 machine

  • @user-rw5sk8fv4s
    @user-rw5sk8fv4s 10 месяцев назад

    I have two sample dataset like bello
    1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...]
    2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ]
    so how can i fine tune above dataset using falcon llm model
    Please help me

  • @thisurawz
    @thisurawz 4 месяца назад

    Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?

  • @IzittoCh
    @IzittoCh 11 месяцев назад

    Would it be practical to train a small model on a 1660 super 6gb? I just want to add a personality for a home voice assistant

    • @samwitteveenai
      @samwitteveenai  11 месяцев назад +1

      probably not train it, you might be able to do some inference with that but training it on something with more VRAM etc

  • @shivamkumar-qp1jm
    @shivamkumar-qp1jm Год назад

    Can I train any llm model from hugging face like llama model

  • @caiyu538
    @caiyu538 8 месяцев назад

    👍

  • @desrucca
    @desrucca 6 месяцев назад +1

    I finetuned BART, but the model output was extactly the same as the input ids. Whats possibly wrong ?

  • @kutilkol
    @kutilkol 11 месяцев назад

    Awesome video!
    12:10 The loss was not goin down tho brother..., try to update the video with model training converging. This one clerly did not

  • @biswachat8521
    @biswachat8521 Год назад

    At 10:19, why did you pass in data['train'] as train_dataset? How is the training process going to know that data['train']['quote'] is the feature and data['train']['prediction'] is the target?

    • @PavanAtGrowexx
      @PavanAtGrowexx 5 месяцев назад

      Did you find any solution? I have the same query

    • @dolby360
      @dolby360 19 дней назад

      I also have the same query

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад

    Why is it called causal?

  • @hilmiterzi3847
    @hilmiterzi3847 Год назад

    Hey Sam is there a chance I can reach to out to you personally?

  • @anhluunguyen2869
    @anhluunguyen2869 10 месяцев назад

    how do you customize your dataset?

    • @samwitteveenai
      @samwitteveenai  10 месяцев назад

      I am planning to make some vids on fine tuning LLaMA2 so I will go more into that there. Basically you just want to feed it strings.

  • @hosseinaboutalebi9998
    @hosseinaboutalebi9998 6 месяцев назад

    Hi Sam,
    Can you provide more videos on fine tunning? Especially with Mistra-Orca model.
    I like your videos very much. Thanks for sharing them.

    • @samwitteveenai
      @samwitteveenai  6 месяцев назад +1

      Yeah I have been meaning to do this for a while. Next week will do some new ones.

    • @hosseinaboutalebi9998
      @hosseinaboutalebi9998 6 месяцев назад

      ​@@samwitteveenai Thanks so much Sam.

  • @Dygit
    @Dygit Год назад

    bitsandbytes seems to have lots of issues in terms of compatibility with various CUDA versions and outright doesn't support windows directly

    • @samwitteveenai
      @samwitteveenai  11 месяцев назад

      Yes they don't support the older GPUs that well either

  • @bilalpenbegullu2851
    @bilalpenbegullu2851 10 месяцев назад

    Finally something real...

  • @limitlesslife7536
    @limitlesslife7536 10 месяцев назад

    great video! I actually was hitting an error while trying to finetiune Dolly 2.0 model :
    RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes 2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
    this was fixed by commenting: model.gradient_checkpointing_enable()
    do you know why that might be the issue?

    • @samwitteveenai
      @samwitteveenai  10 месяцев назад

      That video is quite old now, I think they have updated the library. I will try to take a look at it at some point. I am currently making some new Fine tuning vids so they should be out within a week.

  • @vortechksm
    @vortechksm Год назад

    This should be a seq2seq model, because you are tagging (classifying) text. Actually a Sequence to Tag (Sequence Classification)

  • @ericlawrence9060
    @ericlawrence9060 Год назад

    LORA is a low power wireless data transmission...