Fine-tuning LLM with QLoRA on Single GPU: Training Falcon-7b on ChatBot Support FAQ Dataset

Venelin Valkov

Просмотров 56 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 авг 2024

Комментарии • 124

@venelin_valkov Год назад ⁺¹¹
Full text turorial (requires MLExpert Pro): www.mlexpert.io/prompt-engineering/fine-tuning-llm-on-custom-dataset-with-qlora
@ko-Daegu Год назад ⁺¹
is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
@mariocuezzo8027 Год назад ⁺¹
@@ko-Daegu i wanna know this too!
@shivamkapoor7634 Год назад ⁺¹
I pushed my model to hugging face can you please tell me how can i deploy that model
@sithlordi5170 11 месяцев назад
Wow, finally a working guide on how to finetune LLM's. Thank you very much 🙏
@dataflex4440 Год назад ⁺⁵
Please make a video on how to increase the inference speeds that is the major problem every one is facing
@LifeTravelerAmmu Год назад ⁺¹⁸
Hello Veneline can you please provide the colab notebook (falcon-qlora-fine-tuning.ipynb)…..if possible
@thevitorialima Год назад ⁺¹
I just subscribed!! Your tutorials are straightforward and to the point. Love your content. Keep up with the amazing content! 🙌 ✨✨✨
@tptodorov123 Год назад ⁺¹
Браво, Венелине!
@IchSan-jx5eg Год назад ⁺¹
Hello, Great video so far. Let me ask some questions here:
1. What should I do if my training loss is not decrease consistently (sometimes up, sometimes down) ?
2. How to use multiple GPU? I always get OOM if I use Falcon-40B, so I rented 2 GPUs in cloud provider. Unfortunatelly, it ran just for 1 GPU.
@yashjain6372 Год назад
Read about deepspeed packaage
@meenalpatidar9405 Год назад ⁺²
Can someone please share the code that has been used in this tutorial
@user-ew8ld1cy4d Год назад ⁺⁴
I watch all of your videos, they are wonderful. This one is BY FAR my fav. I know it must have taken a lot of time but THANK YOU so much for doing it! It is so thorough, can we do same thing with MTP-7B?
@venelin_valkov Год назад ⁺¹
I would guess the training process can be similar for MTP-7B, but can't be sure. Try it and let me know.
Thank you for watching!
@user-ew8ld1cy4d Год назад ⁺²
@@venelin_valkov I will try and let you know!
@tadificilaxalogin Год назад ⁺¹
@@user-ew8ld1cy4d Did it work? :D
@user-ew8ld1cy4d Год назад ⁺¹
@@tadificilaxalogin Idk what Im doing wrong here but I have tried to reply to this 4 times and after a day or so it gets removed... It does not work with mtp-7b
@tadificilaxalogin Год назад
@@user-ew8ld1cy4d Thanks !! I have had progress with falcon 40b and redpajama. Unfortunately, it seems to be difficult to use this algorithm with more than one GPU with. Have you set your prompt style for training? I am doing these tests now.
@maidacundo3471 Год назад ⁺³
when adding new special token like and shouldnt you add that tokens to the tokenizer, resize the embedding layer of the model and finetune it? I think this should help the model during the training but also increase the number of trainable paramenters.
@ggximenez Год назад ⁺²
Does anyone knnow how to fine tune a QLoRA over another LoRA on a specific model? There is a LoRA that fine-tunes the original Llama model with a translated and cleaned version of Alpaca dataset for Brazilian Portuguese. I would like to fine-tune another LoRA over that.
@Jeong5499 Год назад ⁺¹
My model generates multiple redundant answers e.g. : xxxx : xxxx : xxxx : xxxx. How to solve it?
@PavPetukhov Год назад ⁺¹
Wow, thanks a lot for the video!
@shivamkapoor7634 Год назад ⁺¹
I pushed my model to hugging face can you please tell me how can i deploy that model
@shivamkapoor7634 Год назад ⁺¹
I pushed my model to hugging face can you please tell me how can i deploy that model please!
@quachhengtony7651 Год назад ⁺²
Is the model multilingual? Can I fine tune it in another language?
@user-do1gu7hw4s Год назад ⁺⁴
For the tokenizer, I think we should set padding_side="left", because it is a causal llm. What do you think of it?
@priyabnsl Год назад ⁺³
Please share the notebook
@henkhbit5748 Год назад ⁺³
Great video, and very interesting if you want to find tune with your own dataset 👍 a pity that the response took a long time… any idea how to get it faster?
@bolarinwarahmonismail8248 Год назад ⁺²
Does it work without the high RAM, I'm using a free version
@chanderbalaji3539 Год назад ⁺¹
I followed the code above and got following output
return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
RuntimeError: The size of tensor a (24) must match the size of tensor b (19) at non-singleton dimension 1
kindly help a newbie, only change I made was removing #device_map="auto" when loading the base model as I have dual gpu and it was throwing error with 8 bit
@LinPure Год назад ⁺¹
I'm facing this error: mat1 and mat2 shapes cannot be multiplied (26x4544 and 1x10614784) while running this codeblock
with torch.inference_mode():
outputs = model.generate(
input_ids=encoding.input_ids,
attention_mask=encoding.attention_mask,
generation_config=generation_config,
)
Does anyone have any ideas how I could solve this? Not sure if the problem was caused because I'm using 'prepare_model_for_int8_training' instead of 'prepare_model_for_kbit_training" since I got an error of 'cannot import name 'prepare_model_for_kbit_training' from 'peft'' even on the latest version of peft library
@ghezalahmad Год назад ⁺¹
Thank you so much
@thisurawz 7 месяцев назад
Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?
@shivamkapoor7634 Год назад ⁺¹
how to deploy this chat bot model after pushing it to hugging face? i'm talking about qlora fine tuned model
@venelin_valkov Год назад ⁺¹
I made a video on this topic: ruclips.net/video/HI3cYN0c9ZU/видео.html
Thank you for watching!
@sumitmamoria Год назад ⁺¹
why is the inference consistently slower? Do we know how to speed it up ?
@user-ty5fu1or6u Год назад ⁺¹
It generates the answer and then adds more questions and answers until max token limit is reached. What am I doing wrong? How does the model know when to stop? I check the generation config and both padding and eos are set.
@brijeshkaran5369 Год назад ⁺¹
You're the Best 💯thanks a lot for the video! Can you please upload a video implementing this tutorial using langchain framework.🥺
@venelin_valkov Год назад ⁺²
You mean use the trained model with LangChain?
Thank you for watching!
@brijeshkaran5369 Год назад
@@venelin_valkov yes so it'll be useful for the community "end to end" implementation 🙂
@user-yy1vp4md9t Год назад ⁺¹
Try example, stuck on training part, having error IndexError: Invalid key: 78 is out of bounds for size 0. Does anyone faced with similar?
@gokhanersoz5239 Год назад
Can you solve that ?
@ikjb8561 Год назад ⁺³
Great video. Would the response times be faster with a better GPU?
@minhducha8574 9 месяцев назад
How do we compute metrics of this model? When I add compute_metric into trainer and it was error. Can you please add the compute_metric?
@subhamchoudhary4091 Год назад
I loaded the trained model and it downloaded the whole model again. When I tried generating text according to my use-case with the trained weights, it didn't provide the correct result.
@josephtsangko3558 8 месяцев назад
Really nice! Thanks for the clearance of the explanation! I wonder, what is the loss function's input here? What is there being compared? Is this self-supervised? So opaque!
@pvlr1788 Год назад ⁺²
I don't get why inference is so slow.
It should be at least as fast as the training. It's true that each "generate" means the model does inference multiple times, does beam search etc... but the same thing happens when you train the model. What am I missing?
@Timotheeee1 Год назад
when you train the model, it gets trained on every token in the text batch at once (it outputs logits at every step)
@pvlr1788 Год назад
@@Timotheeee1 ok, I see. You mean that during the training the model DOES NOT beam search. Am I right?
It Just tries to minimize cross entropy loss on next token. I guess beam search is not even differentiable...
@AnimeOtakuArt Год назад ⁺¹
Can you make a QLoRA for text-summarization task on Falcon7B. That would be very much helpful. Cheers 🍻🍻
@Mohith7548 Год назад ⁺²
I get this error: Any idea on how to resolve this:
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 63 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.
@TheKizoch Год назад
I get this same error. could you resolve it?
@user-cg8ee8tm8v Год назад ⁺¹
Thanks for the great video, can we merge back the adapter.bin to it's original model ? can you make a video onit ?
@user-jy2tu4qb2p 9 месяцев назад
Can we Train model with context (Question: " ", Context: " ", Answer:" " ) . So model will answer from context, Like a RAG ???
@yusufkemaldemir9393 Год назад ⁺²
Some of the models recently published/released are not working on M2 MacOS. Any idea if you could make it feasible for M2 Max MacOS? Thanks
@venelin_valkov Год назад
No idea at the moment, there is still no paper with details on the model. You might try the "quickstart" with the transformers library here: huggingface.co/tiiuae/falcon-7b-instruct
@gokhanersoz5239 Год назад ⁺²
"IndexError: Invalid key: 78 is out of bounds for size 0" do you have this error ? I try everything but not solving @venelin_valkov
@cryptojointer Год назад
what does bnb_4bit_use_double_quant=True do? tried searching for answers, coming up with nothing! lol
@sharathgilla4412 Год назад
can someone help me out!
my issue is I am trying to fine tune dolly V2 using above method but im getting output which it was giving before fine tuning in the video, Im not getting single response as output
If anyone faced this issue and fixed it please let me know, do i need to change any config or model ?
suggestions are welcome!
thanks
@prakaashsukhwal1984 Год назад ⁺¹
great video Venelin..thanks for sharing! will you be sharing any such training video with dialogue datasets for contextual conversations?
@venelin_valkov Год назад ⁺¹
Do you have a dataset in mind?
Thanks for watching!
@prakaashsukhwal1984 Год назад
@@venelin_valkov somehow i am unable to paste the URLs of the datasets (tried multiple times :( ).. i have shared a suggestive list in in this google doc and thanks again for the wonderful set of videos.
docs.google.com/document/d/1wqCKudZnx0XMsJ8J2n1wfOpG68M9chP_8-zeaU7s53g/edit?usp=sharing
@prakaashsukhwal1984 Год назад
@@venelin_valkov do you think any of the above datasets are useful ? :)
@ashioyajotham Год назад ⁺¹
Thank you so much ! Just curious, can it run on a free colab?
@safihaider6715 Год назад
I am getting error while executing trainer.run() saying: "can't copy out of meta tensor, no data!"
@aimaven Год назад ⁺¹
How do we add our own data? Just change the link in the jupyter notebook?
@kaihaoliu7869 Год назад
can you share the link to your notebook?
@mariocuezzo8027 Год назад ⁺¹
Excelente video! I need to configure and train a local gpt for chat with SQL database, which one is the better option for fine tunning with single GPU for that?
@weystrom Год назад ⁺³
How much VRAM did you end up using?
@venelin_valkov Год назад ⁺¹
The Google Colab showed 6.9GB VRAM and 4.6GB RAM, during the training (with parameters shown in the video). Not sure how accurate it is, though.
@ko-Daegu Год назад ⁺¹
is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
@nourghaliaabassi931 Год назад ⁺¹
is the notebook available ?
@riyajatar6859 9 месяцев назад
if its assistant model , doesn't it should respond only when human asks the questions to him?
here it generate the question and answers on its own.
@user-rw5sk8fv4s Год назад
I have two sample dataset like bello
1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...]
2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ]
so how can i fine tune above dataset using falcon llm model
Please help me
@SAVONASOTTERRANEASEGRETA Год назад ⁺¹
Hello, since you are very good can you explain two simple things to me? 1- why do Assistants find less than half of what they have in the file? Example: search for Julius Caesar (it is stored 1000 times, but they only find it 10/20 times) question 2 are there any ggml templates specialized in history? Thanks Claudiio
@amnasherafal Год назад ⁺¹
Nice video Venelin Valkov, I wanted to ask if I have an input size of 4k+ tokens can I train it on a single GPU?
@joaoalmeida4380 Год назад ⁺¹
Hi, thank you for the video! If I want a small model like falcon 7b or other model like t5, to make bots for QA or FAQ, but I need to use and tune for my own language, ex. Portuguese or Spanish. What’s your suggestion? Because I don’t need a large multi language model for this, I think 😅
@lifeofcode Год назад ⁺¹
I was getting an error from the trainer "paged_adamw_8bit is not a valid optimizer names" though I used the same git urls with commit short hashes as shown in the video for pip install command. I ended up having to clone and install transformers from source to get the proper transformers library with the "paged_adamw_8bit" option.
@venelin_valkov Год назад ⁺¹
Strange, just reran the notebook (without changes) and training started as usual.
@lifeofcode Год назад
I must of messed up my pip install commands somehow though I'm not sure how since I was able to find the commit hash in the GitHub logs. Still pip gave "did not find branch or tag 'e03a9cc' assuming revision or ref" error. Luckily I was able to get past it and everything worked beautifully thank you!
@amparoconsuelo9451 11 месяцев назад
Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model? Can you modify a GPT model?
@AIwithParissan 9 месяцев назад
many thanks , shall we have colab link or file?
@d_b_ Год назад ⁺¹
Fantastic tutorial.
Does the training data need to be in Question/Answer format? Would this work if instead this data was a single large block of text and not as structured?
Do the models need to be on the Hugging Face servers for inference?
@enggm.alimirzashortclipswh6010 Год назад
never finetune your model on raw data, however, you can do pre-training on raw text.
@d_b_ Год назад
@@enggm.alimirzashortclipswh6010 So there's no concept of something like "unsupervised fine tuning"? If I wanted to adapt a LLM on emails I've sent to sound more like me, I would not want to train from scratch would I?
@yashjain6372 Год назад
@d_b
@enggm.alimirzashortclipswh6010 How to fine tune if data look like this?
Review(col1)
Nice cell phone, big screen, plenty of storage. Stylus pen works well.
Analysis(col2)
[{“segment”: “Nice cell phone”,“Aspect”: “Cell phone”,“Aspect Category”: “Overall satisfaction”,“sentiment”: “positive”},{“segment”: “big screen”,“Aspect”: “Screen”,“Aspect Category”: “Design”,“sentiment”: “positive”},{“segment”: “plenty of storage”,“Aspect”: “Storage”,“Aspect Category”: “Features”,“sentiment”: “positive”},{“segment”: “Stylus pen works well”,“Aspect”: “Stylus pen”,“Aspect Category”: “Features”,“sentiment”: “positive”}]
@sathvikreddy4807 Год назад
hey there,
how do I create a generative AI chatbox with my own data?
let us say I have data regarding a company and I want to create a "chatgpt" kinda thingy which can answer the questions which I have related to that data
I have juggled through the internet today and found
1) Data collection
2) Data preprocessing
3) Selecting a pre trained model(cause it is easy than creating one)
4) Fine tuning the model
5) Iteration
This is my understanding as of now
so basically how do I have preprocess the data?
do I have to learn NLP for that?
@sherryhp10 Год назад ⁺¹
wow wow wow man
@flaviovitoriano2429 Год назад
Can anyone help me please? i get the following error on the Training Part: IndexError: Invalid key: 78 is out of bounds for size 0
@flaviovitoriano2429 Год назад
The error occur in the following line:
trainer.train()
@Ryan-yj4sd Год назад
Deploying this model as an API endpoint on hugging face currently fails. Do you know how to fix it?
RuntimeError(f\"weight {tensor_name} does not exist\")
RuntimeError: weight transformer.word_embeddings.weight does not exist
"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
@alyssonmach Год назад
A doubt a little out of the context of the video... are Deep Learning models as used as machine learning models in tabular data?
@georgetarida5653 Год назад ⁺²
Does the custom dataset needs to be in english or It could be in any language?
@venelin_valkov Год назад ⁺³
The Common Crawl dataset (used for this model) contains 40+ languages, so you should be able to use different languages. I haven't tried it myself, though. More info here: commoncrawl.org/
That being said their dataset "RefinedWeb" contains primarily English: huggingface.co/datasets/tiiuae/falcon-refinedweb
@AI_ML_DL_LLM Год назад ⁺¹
Thanks for the video, the masked language model MLM is set to be "False", then how the model is fine-tuned?
@venelin_valkov Год назад ⁺²
Using "just" language modelling (predict next token). More info here: paperswithcode.com/task/language-modelling
@MattJonesYT Год назад ⁺²
With CUDA you can launch many threads at the same time for a single kernel to solve a problem. Is there a way to do something similar with GPT models? I asked chatgpt and it basically said the limiting factor would probably be the memory needed for each thread might take up about .5 gb. So for instance, if you have 4 gb free GPU RAM after loading the model you should in theory be able to run 8 queries through the gpu at a time. How would that be done with a local gpt?
@pvlr1788 Год назад ⁺²
As far as I know, if you have free GPU memory, you simply do batched inference, I guess some kind of cuda multi threading takes place there. You can see that training batch size is 1. I guess that bigger batch would cause GPU OOM error.
@MattJonesYT Год назад
@@pvlr1788 Thank you!!! "Batched inference" is exactly the term I was looking for. I see there are scripts for getting that working on various GPT models so it is correct.
@pvlr1788 Год назад
@@MattJonesYT it should work for every model, as long as you have enough cuda memory. In case of 7B model, you probably need some top-tier GPU to inference a batch bigger than 1.
@odev6764 Год назад ⁺¹
I followed your video but I'm struggling with repeated answer. Only modification I did was not send model to huggingface after trained, and it is repeating end text after . I tried to change dataset to a larger one I have in portuguese, and set it to max_steps=5000 but same issue. could you give me a tip to avoid this repeation like you showed in inference before training?
@zorbat5 Год назад
You should fine tune it, so less data. It is pretrained with a huge amount of data.
@zorbat5 Год назад
Other than that, it's playing around with different parameters. Try to learn how the parameters affect the behaviour. If it doesn't give you the desired result, go to the plain downloaded model en train it again.
You'll discover a lot of funny behaviour of the AI with different settings. Also, the parameters are sensitive so keep that in mind. Don't change too much, take it slow.
@pawancreation2311 Год назад
Hi, I'm struggling with the same issue from 2 days, I have used falcon sharded version and fine tunned it with 2000 custom QA dataset, developed by me. Answer coming is this
: How JP Morgan help me?
: JP Morgan helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available.
Can you please suggest what to do as you can see clearly that text is repeating. Please help me 🙏
@pawancreation2311 Год назад
@@zorbat5please help me what can I do please 😭
@zorbat5 Год назад
@@pawancreation2311 Play around, learn what everything does and feel how the AI reacts to certain parameters or finetunes. Oh, read some books abount machine learning to get a better understanding.
@user-nk6ey7kg5e Год назад ⁺¹
great video Venelin. I tried to implement qlora using your code but I am getting this error "RuntimeError: unscale_() has already been called on this optimizer since the last update(). "
@LifeTravelerAmmu Год назад
where you can get the code ? ..... are you typing manually ??
@kaihaoliu7869 Год назад
I have that too, how did you solve it
@IchSan-jx5eg Год назад
@@kaihaoliu7869 I have to install transformers==4.30.1 instead of newest dev transformers to get rid the error.
@gokhanersoz5239 Год назад ⁺¹
T4 enough for tranining ?
@venelin_valkov Год назад ⁺¹
The QLoRA adapter is trained using T4, yes!
@oncelscu8089 Год назад
abi ben bu LLM islerine yeni girdim de bana yardimci olabilir misin birkac soru sorsam
@gokhanersoz5239 Год назад
@@oncelscu8089 elbette
@Purulence-bw7nt Год назад ⁺¹
Hi bro. Amazing tutorial. I am getting this error:
"ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True'
'truncation=True' to have batched tensors with the same length. Perhaps your features (`question` in this case)
have excessive nesting (inputs type `list` where type `int` is expected)."
I tried suggested fixed from huggingface and github but can't solve the issue. Any idea how to fix it?
@user-ty5fu1or6u Год назад
I think you need to revisit the data preparation part again, you need to json dump data['questions'] which is the excess nesting layer.
@Purulence-bw7nt Год назад
@@user-ty5fu1or6u Thanks for replying.I am following the code line by line. I have tried it on the same dataset he is using. Still getting the same error. Any idea?
@gokhanersoz5239 Год назад
@@Purulence-bw7nt solve problem ?
@Purulence-bw7nt Год назад
@@gokhanersoz5239 No, I couldn't solve it. Have you solved it?
@gokhanersoz5239 Год назад
@@Purulence-bw7nt No, I couldn’t solve it, I did the 8-bit version for opt without including the same method 4 bits. However, with the newly received updates, there have been changes and different errors occur. opt does not work in the codes I write.
@user-pu2xj7tj8i Год назад
Жаль на русском не делаешь видео...
@shivamkapoor7634 Год назад ⁺¹
I pushed my model to hugging face can you please tell me how can i deploy that model

Следующие

Автовоспроизведение

Fine-tuning Tiny LLM on Your Data | Sentiment Analysis with TinyLlama and LoRA on a Single GPU