Fine-tuning LLMs with PEFT and LoRA

Sam Witteveen

Просмотров 111 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 июн 2024
LoRA Colab : colab.research.google.com/dri...
Blog Post: huggingface.co/blog/peft
LoRa Paper: arxiv.org/abs/2106.09685
In this video I look at how to use PEFT to fine tune any decoder style GPT model. This goes through the basics LoRa fine-tuning and how to upload it to HuggingFace Hub.
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t...
github.com/samwit/llm-tutorials
00:00 Intro
00:04 - Problems with fine-tuning
00:48 - Introducing PEFT
01:11 - PEFT other cool techniques
01:51 - LoRA Diagram
03:25 - Hugging Face PEFT Library
04:06 - Code Walkthrough
Наука

Комментарии • 113

@impolitevegan3179 Год назад ⁺²⁵
This is great. Not so many channels on YT that do this kind of stuff. Would appreciate more like this, other frameworks like deepspeed, useful datasets, training parameters experiments, etc. so many interesting stuff that is not covered on YT.
@christopheprotat Год назад ⁺¹⁰
Perfect balance of theory and hands-on with a colab attached to most of your videos. Much Much apreciated. I recommend this channel to all people who wants to follow this crazy trend of LLM releases. the best path to keep all of us up to date! I learn so much thanks to you Sam. Thanks a ton. Keep moving forward.
@autonomousreviews2521 Год назад ⁺³
You continue to make videos on exactly the things I'm trying to understand more deeply! Fantastic! There are a lot of detailed parameters in this video that you could certainly continue to elaborate on for those of us who aren't programmers...yet :) Looking forward to more of your vids!
@kennethleung4487 Год назад ⁺⁴
Awesome! been waiting for your take on this topic
@victarion1571 Год назад
Sam, thanks for giving your audience their requests! The alpaca training video you made makes much more sense now
@briancase6180 Год назад ⁺¹⁸
So this seems like the basis for a business: offer to train a custom model for product documentation, FAQ, etc with a specific product or company focus. Cool!
@Hypersniper05 Год назад
Or close domain semantic search with summarization
@handsanitizer2457 Год назад
@E Marrero can you explain that a bit more. I'm new to the machine learning space
@Hypersniper05 Год назад ⁺²
@@handsanitizer2457 It's a bit too much to explain here but search in youtube for "openai embeddings" or "embedding searches" and you will have a general idea of how models can be used for searches, not only for open ai but other open source models as well. Fine tuning a model on close domain will help it understand your company's data better. You can also fine tune it to reply back in a certain way which opens the door to many options. Chatgpt was trained this way but more in conversational outputs
@ArjunKrishnaUserProfile Год назад
Does chatbase use this technique? It does the training on website or file data very fast.
@Hypersniper05 Год назад ⁺²
@@ArjunKrishnaUserProfile I am pretty sure it doesn't train the model , that would be way more expensive than embedding
@nacs Год назад ⁺¹
Many have said it but I'll reiterate -- your LLM videos are really great to watch, both the pace and the way you go from high level overviews to the detailed info.
I also appreciate that it's not just focused on ChatGPT/GPT-4/hosted-models all the time and talks more about local training/finetuning/inferencing.
@PattersML 10 месяцев назад
Awesome explanation, this is exactly what I was looking for. Thank you!
@notanape5415 9 месяцев назад
Thanks for the awesome explanation. Going to binge your videos.
@coolmcdude Год назад
i would love to see more videos about this and showing people how we could adapt this to our own projects and maybe even a video about 4bit tuning.
@saracen9 Год назад ⁺¹
Awesome stuff Sam. I’m in the process of using langchain to build a vector store and - whilst it’s fine for now - would be really interested in understanding the best way to then take this and use to generate a LORA. Feels like the logical next step.
@kaiman99919 11 месяцев назад
Thank you! Be great to see more on the data section - everyone always seems to gloss over that part, despite the fact that is clearly the most important part. Seen a lot of (from diff youtubers) 20-40 min vids on the configuration, barely mentioning the actual use of the data?
@autonomousreviews2521 Год назад ⁺³
I would love a vid covering examples of the differently formatted types of datasets that can be used to train a lora and the types of abilities that the different kinds of dataset training will allow - or put another way - what kinds of behavioral changes in abilities can we use lora to fine-tune for in a model, and how do we then know what types of data formatting to use in order to get a chosen outcome. :D
@sundarramanp3057 Год назад ⁺²
Can you create more videos on instruction-prompt-tuning as well, as a further extension to this video? Amazing work!
@Secplavory-Wei Год назад
This really useful, Thank you!
@quebono100 Год назад ⁺¹
Wow thank you for you work
@JonathanYankovich Год назад ⁺²
I’d love a quick video like this on how to use checkpoints from PEFT training to do inference.
When I’m training, I’m never sure how much is too much, and I can save checkpoints automatically easily to resume in case training stops.
What I need to learn is how to use these checkpoints with the base model to do inference so I can test output quality against several checkpoints.
Ideally I’d like to be able to do inference on a base model plus checkpoint, and then once I find a good result, merge the checkpoint into the base model so I can use it in production and keep VRAM low. (I am assuming inference on base model + checkpoint will use more vram)
@geekyprogrammer4831 11 месяцев назад
Very underrated channel. You deserve more viewers and subs.
@samwitteveenai 11 месяцев назад
Thanks for the kind words.
@chavita4321 Год назад
So badass. Thanks!
@abhirj87 11 месяцев назад
very useful!! thanks a ton
@caiyu538 7 месяцев назад
Great lectures.
@wilfredomartel7781 Год назад
Excellent!
@definty 8 месяцев назад
Hey Sam, Thanks for the great informative video as always!
Do you know of a way to see which neurons get activated during training? I am because I was thinking of ways to reduce the big models and the most obvious way I could think of would be to view which neurons are getting activated when training especially with falcon 170b, even 32b is to big for me and considering I don't need multiple languages I was hoping this would be a good approach to reduce the size of models?
It would be cool to see a Brain Surgeon type debugger for LLMs. It would be good to run a different training datasets through different llms to see which neurons get activated and which ones do not and ideally have a way to disable them during inference to test and measure the differences of the output.
@user-lw1zt4ov3i 7 месяцев назад
This is a great way to understand how we can fine-tune a text classification task using an LLM. I want to know if there is a method through which we can make the LLM learn from data in JSON format, where there are multiple labels for information retrieval or conversational recommendation tasks.
@JonathanYankovich Год назад ⁺⁶
These fine-tuning-related topics are especially relevant to me right now. Currently training llama-30b variants at 4-bit. I’m very interested in how to roll adapters/checkpoints back into base models to keep VRAM usage down during inference (under 24GB)
@MridulSharmaMID Год назад
Hi I am also interested. Can we connect via email?
@PavanAtGrowexx 5 месяцев назад
Hey, I am also facing the same issue, did you find any update and could help me out please?
@joshmabry7572 Год назад
This is gold! Thank you!
@joshmabry7572 Год назад
I'm looking to train the Wizard-Vicuna models but run into `ValueError: The following `model_kwargs` are not used by the model: ['token_type_ids']`
@samwitteveenai Год назад ⁺¹
This could be because they have already folded a LoRA in there or the base model setup is different.
@pawe460 Год назад
How does LoRA differ from transfer learning? If I understand correctly TL means adding additional layers onto frozen pre-trained network and training it on new dataset, right?
@nguyenanhnguyen7658 11 месяцев назад
Great quick tutorial. This is good for English-only pretraining/fine-tuning. What is about non-English ? What are steps should we take to (1) extend vocab (2) pretraining (with or without LoRA) free-non-structure-text corpus (3) fine-tune with LoRA for each task ... ! Would love to have your tutorial on this road, it would be great. Thanks, Steve.
@bookfastorg Год назад
How to re-train it with additional data? Great video!
@rajivmehtapy Год назад ⁺²
Very rare videos found on youtube for this topic.
@samwitteveenai Год назад ⁺⁴
this is the first of a few on the topic
@yth2011 10 месяцев назад
Thanks a lot~
@edd36 Год назад
Hey, sorry I late to the party. I tried to load my LoRA model but when I checked the weights, the weights are the same with the original model. Is it supposed to do that? I already checked with my after-trained model and yes the weights is different.
@clementvanpeuter1742 Год назад
Love It.
@selinatian6607 11 месяцев назад
very great tutorial! with the saved pretrained model, how do we make prediction for classification problems?
@samwitteveenai 11 месяцев назад
You can do that with a much simper model like BERT etc. or a T5 or structure the data to do it with the causal LM
@Aldraz Год назад
How many examples is necessary in dataset for it to learn the certain pattern? With OpenAI you are fine with just 200 examples, which I don't think would work here.
@ronyosef3806 Год назад
Hi Sam, thanks for the great video. I got a general question you might know the answer to. If I freeze pre-trained model weights (for example, BERT) and then train a classifier on top of its embeddings, does that called fine-tuning? If the weights are unfrozen, I know this can be called fine-tuning.
@samwitteveenai Год назад
you can freeze some of the weights and tune the top layer etc and it is fine tuning yes.
@MariuszWoloszyn Год назад
LoRa is not adding additional weights. Although it might seems so while training, at inference there are no additional parameters. It acts more like diff and patch (though in vector space).
@tawnkramer Год назад
Does anyone know the proper settings for generation with the story model? Mine tends to start ok and then becomes word spew halfway though.
@ArunkumarMTamil Месяц назад
how is Lora fine-tuning track changes from creating two decomposition matrix? How the ΔW is determined?
@theunknown2090 Год назад
Hey man great video, hd a question do.u think a 500 m or 1b model could give good results similar to alpaca. What would be the smallest size a model can follow instructions?
@samwitteveenai Год назад ⁺¹
Its a really interesting question and something I am currently doing research on. 500 is probably too small. 1.5B things get a bit more interesting. The big challenge with smaller models is you can't expect them to know facts correctly. So you want to use them more as retrieval generation models. They can do language but need to have the facts and context fed in at generation time etc.
@theunknown2090 Год назад
The cerebras-gpt models are really fast compared to gpt2, gpt-neo in inference like a cerebras2.7b inference speed is almost equal to gpt1.5b and gptneo 1.3B
@ifeanyiidiaye1889 9 месяцев назад
How do you handle "CUDA out of memory" error in free Colab notebook?
@returncode0000 10 месяцев назад
5:23 is it possible to train this on a Nvidia RTX 4090 FE (24GB RAM)?
@ShlomiSchwartz 10 месяцев назад
Hi Sam, thank you for the video. I'm getting
RuntimeError: expected scalar type Half but found Float running in Colab with GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-a971aa0c-5408-727a-3b72-48b1926b5f66)
On the training loop, what am i missing?
@ShlomiSchwartz 10 месяцев назад
It was a GPU issue, switching GPUs fixed it
@samwitteveenai 10 месяцев назад ⁺¹
YeahI don't think the Bitsnbytes fully supports v100 GPUs I have had issues with it in the past.
@tisajokt7676 Год назад ⁺¹
If the only difference is in these added-on weights, is it possible to run multiple distinct finetuned models at the same time without duplicating the shared base pretrained model in memory?
@samwitteveenai Год назад ⁺¹
Yes this is trick we are working on for production. You have multiple LoRA weights for different tasks etc. Very much beyond the scope of here though.
@wilfredomartel7781 Год назад
Maybe a tutorial to integrate langchain with flan but accesing an api rest to query data.
@richardrgb6086 10 месяцев назад
Hello! Can you fine-tuning T5?
@haticeobuz9081 Год назад ⁺¹
Hi, I have a question for you. When/or will you be uploading the video about seq2seq models? I would like to see that one as well!
@samwitteveenai Год назад ⁺¹
Yes I promised this and I will get to it. Will try to do it this week. Please remind me if I don't. Too many new LLMs and cool papers being released :D
@haticeobuz9081 Год назад
@@samwitteveenai Okay, thank you so much.
@yashjain6372 10 месяцев назад
Very informative!!!! does fine tunning with qlora/lora does support this kind of dataset? If not, what changes should i make in my output dataset?
Review(col1)
Nice cell phone, big screen, plenty of storage. Stylus pen works well.
Analysis(col2)
[{“segment”: “Nice cell phone”,“Aspect”: “Cell phone”,“Aspect Category”: “Overall satisfaction”,“sentiment”: “positive”},{“segment”: “big screen”,“Aspect”: “Screen”,“Aspect Category”: “Design”,“sentiment”: “positive”},{“segment”: “plenty of storage”,“Aspect”: “Storage”,“Aspect Category”: “Features”,“sentiment”: “positive”},{“segment”: “Stylus pen works well”,“Aspect”: “Stylus pen”,“Aspect Category”: “Features”,“sentiment”: “positive”}]
@JosePablo2008 9 месяцев назад
What is the minimum GPU RAM Memory to run this code?
I think I need a new GPU to run this on my local machine
@Fearfulful Год назад
can you edit LoRa to LoRA in the tiitle? I was really confused for a second saying to myself what does long range radio do with LLMs
@samwitteveenai Год назад ⁺¹
lol done, thanks for pointing it out.
@ojaskulkarni8138 16 часов назад
11:43 max_steps here are 200, how much max_steps do we usually set for proper finetuning? Please someone help me
@micbab-vg2mu Год назад ⁺¹
In the past, I tried fine-tuning some GPT models, but the results weren't good. Maybe this new technique will give me a better outcome
@samwitteveenai Год назад ⁺³
fine-tuning comes down a lot to what you are tuning on and how much etc. LoRa has a lot of advantages and certainly worth a try.
@yth2011 8 месяцев назад
what is the difference between lora and embedings?
@debashisghosh3133 Год назад
In LoraConfig() method r is not the number of attention head instead it is the rank of the matrix that your are decomposing. From High Rank to LowRank. Here rank is 16.
@ranu9376 Год назад
Great! Can we merge the peft weights with the actual weights and use it for inference? any downside to it other than the size? Also, wouldn't the weights get tampered if we save them locally instead and use it for inference?
@samwitteveenai Год назад ⁺¹
yes you can do that. I might show that in a future video. no big downside for most use cases. Saving the LoRa weights locally as when you load them they will load the original weights as well. Not sure what you mean by tampered.
@nayakdonkey Год назад ⁺¹
@samwitteveenai I encounter RuntimeError: expected scalar type Half but found Float while running the training script specified in the colab notebook. Can you please helpme with pointers to solve the error. I am running in Colab (GPU 0: Tesla V100-SXM2-16GB)
@samwitteveenai Год назад ⁺¹
Ok v100s had some problems with the 8bit part in the past, so it could be that.
@nayakdonkey Год назад
@@samwitteveenai Thanks for the acknowledgement
@SubhamKumar-eg1pw Год назад
@@nayakdonkey Were you able to solve the above RuntimeError? I am facing the same with V100 machine
@user-rw5sk8fv4s 10 месяцев назад
I have two sample dataset like bello
1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...]
2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ]
so how can i fine tune above dataset using falcon llm model
Please help me
@thisurawz 4 месяца назад
Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?
@IzittoCh 11 месяцев назад
Would it be practical to train a small model on a 1660 super 6gb? I just want to add a personality for a home voice assistant
@samwitteveenai 11 месяцев назад ⁺¹
probably not train it, you might be able to do some inference with that but training it on something with more VRAM etc
@shivamkumar-qp1jm Год назад
Can I train any llm model from hugging face like llama model
@samwitteveenai Год назад
yes with most of them.
@caiyu538 8 месяцев назад
👍
@desrucca 6 месяцев назад ⁺¹
I finetuned BART, but the model output was extactly the same as the input ids. Whats possibly wrong ?
@RushikeshTade Месяц назад
Did you merge weights?
@kutilkol 11 месяцев назад
Awesome video!
12:10 The loss was not goin down tho brother..., try to update the video with model training converging. This one clerly did not
@biswachat8521 Год назад
At 10:19, why did you pass in data['train'] as train_dataset? How is the training process going to know that data['train']['quote'] is the feature and data['train']['prediction'] is the target?
@PavanAtGrowexx 5 месяцев назад
Did you find any solution? I have the same query
@dolby360 19 дней назад
I also have the same query
@user-wr4yl7tx3w Год назад
Why is it called causal?
@hilmiterzi3847 Год назад
Hey Sam is there a chance I can reach to out to you personally?
@samwitteveenai Год назад
just reach on Linkedin is easiest
@anhluunguyen2869 10 месяцев назад
how do you customize your dataset?
@samwitteveenai 10 месяцев назад
I am planning to make some vids on fine tuning LLaMA2 so I will go more into that there. Basically you just want to feed it strings.
@hosseinaboutalebi9998 6 месяцев назад
Hi Sam,
Can you provide more videos on fine tunning? Especially with Mistra-Orca model.
I like your videos very much. Thanks for sharing them.
@samwitteveenai 6 месяцев назад ⁺¹
Yeah I have been meaning to do this for a while. Next week will do some new ones.
@hosseinaboutalebi9998 6 месяцев назад
@@samwitteveenai Thanks so much Sam.
@Dygit Год назад
bitsandbytes seems to have lots of issues in terms of compatibility with various CUDA versions and outright doesn't support windows directly
@samwitteveenai 11 месяцев назад
Yes they don't support the older GPUs that well either
@bilalpenbegullu2851 10 месяцев назад
Finally something real...
@limitlesslife7536 10 месяцев назад
great video! I actually was hitting an error while trying to finetiune Dolly 2.0 model :
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the forward function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes 2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple checkpoint functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases yet.
this was fixed by commenting: model.gradient_checkpointing_enable()
do you know why that might be the issue?
@samwitteveenai 10 месяцев назад
That video is quite old now, I think they have updated the library. I will try to take a look at it at some point. I am currently making some new Fine tuning vids so they should be out within a week.
@vortechksm Год назад
This should be a seq2seq model, because you are tagging (classifying) text. Actually a Sequence to Tag (Sequence Classification)
@ericlawrence9060 Год назад
LORA is a low power wireless data transmission...

Следующие

Автовоспроизведение

StableVicuna: The New King of Open ChatGPTs?