This was AWSOME!!!! Thank you, Thank you, Thank you, this was a total mystery to me you managed to clear it up completely. I truly thank you for this video, have a blessed year!
I am very hopeful for local OpenSource AI this year in particular. Having LoRA's fine tuned to certain characters would be incredible rather than just using the old prompt method.
Was thinking the same. You could put a deep backstory in a lora without slowing down generation. Or put conversations in a lora so the character won't forget things, if that's possible.
I can't make this work, I created a file with just one instruction, the training passes successfully, but if asked in the chat, the model continues to respond incorrectly, is there a specific model that works? I realized that you need to apply the created lora to the model, you didn't do this in the video, would you make another more detailed video? :)
Thanks for also explaining math "basics" in a nutshell. Unfortunately I've forgotten about everything in the regards what I learned some 20 years ago when I studied because I never had to use it. Who would have thought that one day I actually coule make use of it? But it's as one of my teachers once said: Used it or lose it.
Really helpful and informative. Thank you! Could you talk about using these for query, key, and value matrices in attention? Also, explanations of where in particular it can be used in models that aren’t transformers would be interesting 😊
Great work, you are doing a great job at helping the community with your videos. I had one question or confusion. At the start in the architecture diagram, the talk was about fine tuning using Lora on historical medical appeals however during the hands on part the example was to generate question and answer on some other book dataset. I wasn’t able to relate to it.
My AI keeps repeating words. "Okay, okay, okay, okay, okay, okay, okay, okay"... this is caused by training somehow, reducing what setting would fix this?
After pressing "Start Lora Training" i get "ValueError: Target modules [‘q_proj’, ‘v_proj’] not found in the base model. Please check the target modules and try again."
Thank you so much. I have a question though. I have three kinds of datasets, one is the "rules data set", the other is the "symbol and their meaning dataset" and the last is a dataset with example input and output. How do I go ahout fine-tuning my model based on these datasets? Thanks 🙏
what models does this work with? everytime i try to start training i get an error: KeyError: 'mistral' yet i don't have any mistral models installed and am trying to use it with HuggingFaceH4_zephyr-7b-beta
Did you ever find models to use which trained properly? I still have not found one, every model is something other than the types that web-ui wants. Thank you.
do you think if i have a very small training set with something like "hey whats your name", "my name is name X", which i only have 3 entries for in the training/ validation set, that thats going to be enough? i tried it on tinylama and prompted the model without success, it did not give a coherent reply for my training data
Thanks for the informative video. One Question: Why is the audio and video all of you new videos not in sync? Even the shorts it lags like 200ms or more.
This was due to how I was adjusting my audio, I was accidentally introducing a small lag between the audio and video. I’m still learning how to do it correctly, though I now know how to not introduce the lag! :)
@@AemonAlgiz great! I can't wait for it! Already subscribed and turned notification on! Will you cover also hardware and free/cheap options to fine tune these smaller models?
There is so little information online about LoRA's for LLM's. I have downloaded and made my own LoRA's for Stable Diffusion, and I have used multiple LLM models in oobabooga, but nobody is talking about LoRA's in oobabooga. This is the first video I've found. Can you make more please and explain specific use cases and examples of them in practical use. Could I teach Wizard LM to role play like Pygmalion by training a LoRA on roleplay context? Can I teach it facts about my life and ask it to recall information about me afterwards? Can I embed a personality into it more strongly than the default character process in oobabooga? I haven't seen anything done with a LoRA, so Idk...
Hey Glen! Thanks for the comment! LoRA’s in LLM’s are the same as Stable Diffusion in that you’re trying to influence the model’s actions. So, you could definitely teach it facts about yourself or to do role-play, though the caveat is that you’re influencing the completion probabilities. There was a great article on someone using LoRA’s to get LLaMA if I remember correctly to impersonate his friends. He did this by downloading all of his Facebook messages and using them as a training set. The model became pretty convincingly his friends and could even pretend to be individuals. I am going to be doing a video on validation so I will sneak in some additional information on how to perform different fine tunes!
Could you make a video going through an example from start to finish? For example, expanding on a base model by creating a Lora with specific knowledge that it can only know with the Lora trained (e.g. the contents of a certain book that you feed it) and have it answer questions about the knowledge that you added. A lot of people are searching for this information, especially now that models have come on the market that can be used for commercial use (e.g. MPT-7B). Easy 10.000+ subscribers if you were to make such a video where you for example take MPT-7B or other commercially usable model and show how to make the data set and instructions and feed it, I don't know, scripts on every Simpsons episode and then ask the model to make a list of all the characters featured in episode "Marge Vs The Monorail".
@@A.D.A.M.411im also interested in this, 100% agree we need a noob friendly vid that takes us by hand and shows the full proces so we can try it with the same data and compare it to our approach
I also can't get this thing to work. Most likely it’s broken and no one wants to fix it. And this documentation... "Its goal is to become the AUTOMATIC1111"? Ha ha ha...
It really is an “it depends.” If you have structured data, like XML/JSON it becomes much easier. If you have flat text, it becomes much more challenging and you may have to find creative solutions.
this is the explanation I was looking for, thanks for that. I see oobabooga has is own LoRA loader but I can't find anywhere how to load these LoRAs using torch. is it possible for you to explain how to load these LoRAs with the model to use it with other libraries like langchain?
It's interesting ... still a bit in the dark as to what goal is expected from those inputs and what it does in the final results. Is the goal to style the answers in a certain way (which is how the formatted input seems to focus on) or is the goal to supplement content (a correct Turing biography)? If content is involved, can we train a model to follow a fictional lore, get characters to express behaviors that appear "normal" to its world, or redirect behavior (if someone speaks ill of the king, characters will always respond negatively - or - reacting according to political alignment). And if it's possible, is a large explanatory text like the Turing biography a valid input format to do so ?
Can you suggest any guide to build fine tune a model with domain knowledge and personality like Samantha llm model. Is the apporach best with fine tuning for tone and than use embeddings to influence response itself?
This would be difficult due to speed, since to fine tune you have to dequantize the weights back to floating point. CPU’s will be very slow, though in theory it should work.
What model do you use? I am having so much issues as ooba does not support 4-bit Lora training and I can't seem to gert any 8 bit running on it? Got 24gig of VRam so open for many options. Would really like to use Ooba
@@AemonAlgiz I used LLama13B : llama-2-13b-ensemble-v6.Q8_0.gguf and TheBloke_Llama-2-7B-LoRA-Assemble-GPTQ. In both cases, I am seeeing an error in tokenizer. Can you please share the model you have used for training?
Let's say I train some LoRA's and I'm using ModelA. Later I'm not happy with ModelA so I switch over to another Model, B. WIll the LoRA's be gibberish to the new model? Next question. Let's say I want to do a LoRA on a text file dealing primarily with how to program a VCR. It's a lot of boring information but anyone having read this document would know how to program a VCR. Does this LoRA training make the model able to answer questions pertaining to programming a VCR? I know that this may be a Duh question... It's just there's so little out there for what LoRA's are that I feel I'd need a degree in this stuff to figure it out sometimes. Anyhow, thanks if you can answer these or set me straight.
LoRA’s are VERY broad and can be used for a multitude of use cases, though training a language model for Q/A on a VCR is absolutely possible. As for switching LoRA’s between models, I imagine if you’re switching it between the same model type (LLaMa, Vicuna, etc…) and not trying to switch it between different model types it should be fine. Though I’m just making an educated guess, so I wouldn’t quote me on it.
It is being reported that the EU is trying to ban LoRa and open source generative models. What would you recommend a rsearcher do right now to secure the assets required to continue research before these draconian rules goes in place and it is impossible to gain access to them?
It’s unfortunate that these regulations are going to serve only to hurt the open source community and research, but that’s the entire point. OpenAI and the other mega corporations want to have their moat and the only way to get it is through costing everyone else out through regulations. For the moment, I would look at how LLaMA open source repositories append the LoRA’s since it’s not very difficult to do from a technical perspective. That way, we can retain the knowledge of how it’s done. Thanks for the comment!
@@AemonAlgiz Good point. I guess I was more considering, for example, to get ooga booga set up and download the models and such before they get taken offline in one way or another. I'm trying to understand, if I get it set up and running today with LoRA capability, is there any way it could be retroactively disabled? Like does it call home anywhere during inference, or is it completely offline once everything is set up. Theoretically, even if they were to "ban" this stuff later I would already have all the elements to continue doing my own research and improve the model via fine-tuning/LoRA for personal use.
I wish you had just shown what model you were using! I get different errors depending on what model I load, but I've never found a single one that can successfully launch training. I want to post a GitHub issue, but there are different errors.
thanks for the tutorial. any hints on how much text is needed for a sucessfull training, is there a risk of overtraining with too much data? a suggestion to maybe talk a little more about the datasets and what to look for in them? thanks
That’s a difficult question and it depends on what LoRA dimension you want to use and how much data you have. If you could tell me what you’re looking to train that would help!
This may not be possible, at least Sanskrit proper. I would be surprised if the tokens the network requires to learn Sanskrit on in the tokenizer. You may want to check and see if the Sanskrit tokens are valid tokens, either that or convert between those tokens to other tokens that are in the tokenizer…
@@AemonAlgiz not sure how to do that. I'm using vicuns7b, which already knows some hindi and sanskrit. I have various sized datasets. Itihasa.txt is about 47mb and sanskrit_corpus.txt is 630mb. I've trained itihasa before and it does seem to teach it, but it isn't very good. Sanskrit_corpus is too big for me to figure out how to train it. It takes so long to train itihasa on cpu, it's hard to experiment so lately I've been playing with ooba's deepspeed, xformers and auto-devices to use gpu and cpu together.
@@AemonAlgiz I asked chatgpt3 to translate and I know understand the tolkenizer might not work for sanskrit. Still, if I can get ooba working again, I'll try.
oh no! the video ended before telling us what model you were using! None of the 4 bit models I've tried work for it apparently. I tried the Monkey Patch thing, but I get still get errors. If you could name at least one model that works with Lora training, that would be awesome thanks! Apparently it's not designed for 4 bit models.. but all of the models are 4 bit models... sooooo yeah not sure what model to use for training. I have 12 GB of VRAM.
12 gigs of RAM is going to be pretty tight, but you could try training with that. If that fails Lambdalabs (not sponsored) is giving away free training time to open source models.
Thanks Aemon, that was a brilliant video. Can Claude or the OpenAI GPT models be fine-tuned with LoRA or is it just LLaMa based models that have an interface for this?
I believe the format of the training data set has changed, but they have not updated it in the docs. From what I understand, it should now be like this: {'output': '...', 'instruction': '...'}
Would be great if we can just give it a PDF and tell it to learn. We would have our personal trained AI for special applications. Same like Chat PDF but on steroids.
This should be doable with the right pipeline. You could create a service that extracts the text from PDF’s (OCR if they’re scanned) and then use them for training.
did you not have to first load a base model? I feel like Generative Ai is filled with a bunch of people pretending to really understand concepts and application but not showing proper application.... It's the crypto explosion all over again...
Your audio is slightly out of sync with your image. It makes it hard to follow along, when you speak, the movements of your lips is delayed. This takes attention away from focusing on the content, to instead focusing on the out-of-sync issue. Otherwise good presentation.
The fact that you can create a big matrix by multiplying two smaller matrices has nothing to do with loras. The lora is not a clever compression method that compresses a big matrix down into two smaller matrixes. There is no point when the matrix multiplication in the lora layers produces a big matrix which can then be used for further calculations. Also, the lora layers have more than just two matrices. A rank 2 lora (rank 2 = 2 matrices) would be very weak and incapable.
This is not true, LoRA matrices are defined as rank decomposition matrices. Where your weight matrix, W, has dimensions d x k and your rank decompositions, A and B, have dimensions d x r and r x k. The adjusted network output for an input x, is equal to: Wx + ABx. If the weights matrix and LoRA decomposition dimensions didn’t match, this wouldn’t make sense. Otherwise, you would pushing gradients to matrices with the same number of weights as your network, which would not open the door to less powerful hardware. Most of the fine tuned models you’ve used have very low dimensions, such as 32/64, so no it’s also not true that low rank is incapable. Source (page 4, equation 3): arxiv.org/pdf/2106.09685.pdf
Nice video. I have a problem, I am getting an error at the end of the Lora training, at saving phase I get torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 1; 22.02 GiB total capacity; 20.06 GiB already allocated; 61.19 MiB free; 20.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF I am using 4xA10 GPUs each with 24Gb VRAM Any pointers?
The error is unfortunately what it says on the tin, you’re running out of GPU memory. Fortunately, QLoRA will help resolve the majority of these issues.
@@AemonAlgiz Still, I have 96 Gb VRAM accross all 4 GPUs, should be more than enough to train a LORA. I solved it in the end, the problem was starting Oobabooga with --auto-devices parameter. When I dropped that it started working like a charm :)
Thank you for this. I am sick of seeing click bait or low quality content when Im trying to self educate about llms and get some good knowledge .
That is exactly what I was waiting for!! Thank you so much for the insight!
I’m glad it was helpful :D
Just keep these coming. Tons of helpful information.
This was AWSOME!!!! Thank you, Thank you, Thank you, this was a total mystery to me you managed to clear it up completely. I truly thank you for this video, have a blessed year!
I’m glad it was helpful :D
Hey thanks for sharing your time and knowledge with us!
Thank you! I appreciate you watching it!
Nice. Will check it out over the weekend.
I am very hopeful for local OpenSource AI this year in particular. Having LoRA's fine tuned to certain characters would be incredible rather than just using the old prompt method.
Was thinking the same. You could put a deep backstory in a lora without slowing down generation.
Or put conversations in a lora so the character won't forget things, if that's possible.
Just found your channel. Excellent Content - Another sub for you sir!
I can't make this work, I created a file with just one instruction, the training passes successfully, but if asked in the chat, the model continues to respond incorrectly, is there a specific model that works? I realized that you need to apply the created lora to the model, you didn't do this in the video, would you make another more detailed video? :)
I love your content, very tangible, but still ambitious with theory, thanks a lot!
Thank you, I’m glad it’s helpful!
Thanks for also explaining math "basics" in a nutshell. Unfortunately I've forgotten about everything in the regards what I learned some 20 years ago when I studied because I never had to use it. Who would have thought that one day I actually coule make use of it? But it's as one of my teachers once said: Used it or lose it.
I hope it was helpful!
I'd love to see a video on building these data sets. I am currently working on an AI safety and morality project, and this is super helpful.
This is a very interesting topic, especially when we talk about open source data. Is GPL a virus for LLM’s or is it not? For example. Great idea too!
Any useful insight regarding building data sets? I'm strugling with that same question ATM
Yes please!!
@@redbaron3555 So I'm pretty sure they did this already.
Awesome tutorial!
I think this video was very informative! It just boils down to math which is kind of cool
Really helpful and informative. Thank you! Could you talk about using these for query, key, and value matrices in attention? Also, explanations of where in particular it can be used in models that aren’t transformers would be interesting 😊
That is coming this week! It’s incredibly fascinating! Thanks for the comment and I’m glad you enjoyed it!
Didn't know I wanted Kurt Cobain to teach me ML but I'm here for it!
I have never seen this resemblance lol, but I’m told I look him all the time. Especially when I have long hair
@@AemonAlgiz you most definitely look like Kurt cobain
Great work, you are doing a great job at helping the community with your videos. I had one question or confusion. At the start in the architecture diagram, the talk was about fine tuning using Lora on historical medical appeals however during the hands on part the example was to generate question and answer on some other book dataset. I wasn’t able to relate to it.
My AI keeps repeating words. "Okay, okay, okay, okay, okay, okay, okay, okay"... this is caused by training somehow, reducing what setting would fix this?
Okay, okay, okay, okay, okay, okay, okay, okay
Just subscribed. Would love a video on training and fine-tuning to clone characters or yourself.
Great video! Do expand on the parameters section. We are all out of ram.
After pressing "Start Lora Training" i get "ValueError: Target modules [‘q_proj’, ‘v_proj’] not found in the base model. Please check the target modules and try again."
Can you give examples of training data and results?
that would help a lot
If you check the description they’re already there!
do you know how to finetune a model (like mixstral) with telegram exported msg (.json)
Thank you so much.
I have a question though.
I have three kinds of datasets, one is the "rules data set", the other is the "symbol and their meaning dataset" and the last is a dataset with example input and output.
How do I go ahout fine-tuning my model based on these datasets?
Thanks 🙏
This was great. I would love to see you do this in python code at a simple level to set these parameters instead of the GUI
what models does this work with? everytime i try to start training i get an error: KeyError: 'mistral' yet i don't have any mistral models installed and am trying to use it with HuggingFaceH4_zephyr-7b-beta
Are you running some type of app during the latter half of the video? It is the first time to see such a screen.
It's called Oobabooga! The details for it are in the description. Thanks for the comment and hope to see you again!
Does anyone have this problem, oogabooga cant see raw text files in the datasets folder?
What was the model you were using to train this? I've tried a number of them so far with no luck.
Did you ever find models to use which trained properly? I still have not found one, every model is something other than the types that web-ui wants.
Thank you.
do you think if i have a very small training set with something like "hey whats your name", "my name is name X", which i only have 3 entries for in the training/ validation set, that thats going to be enough? i tried it on tinylama and prompted the model without success, it did not give a coherent reply for my training data
same for other models like 7b llama, read in with 4 bit precision :/ regardless of how many epochs i configure/ adjust the alpha rank
Can I download and use the fine tuned model in other projects in the place of the original model. Could you provide instructions on how to do that?
Thanks for the informative video.
One Question: Why is the audio and video all of you new videos not in sync? Even the shorts it lags like 200ms or more.
This was due to how I was adjusting my audio, I was accidentally introducing a small lag between the audio and video. I’m still learning how to do it correctly, though I now know how to not introduce the lag! :)
very well explained
Thank you!
Very cool video! We want more about fine tuning and maybe self instruct! 😅
Thanks for the comment! I’m working on a more detailed video on fine-tuning examples which will be out later this week :)
@@AemonAlgiz great! I can't wait for it! Already subscribed and turned notification on! Will you cover also hardware and free/cheap options to fine tune these smaller models?
Indeed! I was going to cover different configurations of LoRA’s, context lengths, and the such.
There is so little information online about LoRA's for LLM's. I have downloaded and made my own LoRA's for Stable Diffusion, and I have used multiple LLM models in oobabooga, but nobody is talking about LoRA's in oobabooga. This is the first video I've found. Can you make more please and explain specific use cases and examples of them in practical use. Could I teach Wizard LM to role play like Pygmalion by training a LoRA on roleplay context? Can I teach it facts about my life and ask it to recall information about me afterwards? Can I embed a personality into it more strongly than the default character process in oobabooga? I haven't seen anything done with a LoRA, so Idk...
Hey Glen! Thanks for the comment!
LoRA’s in LLM’s are the same as Stable Diffusion in that you’re trying to influence the model’s actions. So, you could definitely teach it facts about yourself or to do role-play, though the caveat is that you’re influencing the completion probabilities.
There was a great article on someone using LoRA’s to get LLaMA if I remember correctly to impersonate his friends. He did this by downloading all of his Facebook messages and using them as a training set. The model became pretty convincingly his friends and could even pretend to be individuals.
I am going to be doing a video on validation so I will sneak in some additional information on how to perform different fine tunes!
@@AemonAlgiz Thanks, I would really appreciate that!
Could you make a video going through an example from start to finish? For example, expanding on a base model by creating a Lora with specific knowledge that it can only know with the Lora trained (e.g. the contents of a certain book that you feed it) and have it answer questions about the knowledge that you added. A lot of people are searching for this information, especially now that models have come on the market that can be used for commercial use (e.g. MPT-7B). Easy 10.000+ subscribers if you were to make such a video where you for example take MPT-7B or other commercially usable model and show how to make the data set and instructions and feed it, I don't know, scripts on every Simpsons episode and then ask the model to make a list of all the characters featured in episode "Marge Vs The Monorail".
I am! Todays video is ok positional encoding, but this weekend’s video is on how we prepare and use our own datasets :)
This ^^
@@A.D.A.M.411im also interested in this, 100% agree we need a noob friendly vid that takes us by hand and shows the full proces so we can try it with the same data and compare it to our approach
Can you please share the model and model loader that you have used for the training?
I am using LLama13b and Llama.cpp and it is not working...
I also can't get this thing to work. Most likely it’s broken and no one wants to fix it. And this documentation... "Its goal is to become the AUTOMATIC1111"? Ha ha ha...
Do you have any videos on how to use already existing Loras? I'm running 7-10b Exl2 models, but I have no idea how to use Loras.
You are an Angel. Thank You~!
I’m glad this was helpful :)!
it really isn’t feasible to reformat for Q&A if have a large dataset or multiple files are there tools to help prepare large custom datasets?
It really is an “it depends.” If you have structured data, like XML/JSON it becomes much easier. If you have flat text, it becomes much more challenging and you may have to find creative solutions.
this is the explanation I was looking for, thanks for that. I see oobabooga has is own LoRA loader but I can't find anywhere how to load these LoRAs using torch. is it possible for you to explain how to load these LoRAs with the model to use it with other libraries like langchain?
just one thing !!!! WOUAOUUUUUUUUUUUU THX from france
I get "can only concatenate list (not "Tensor") to list" and "Error None"
It's interesting ... still a bit in the dark as to what goal is expected from those inputs and what it does in the final results. Is the goal to style the answers in a certain way (which is how the formatted input seems to focus on) or is the goal to supplement content (a correct Turing biography)? If content is involved, can we train a model to follow a fictional lore, get characters to express behaviors that appear "normal" to its world, or redirect behavior (if someone speaks ill of the king, characters will always respond negatively - or - reacting according to political alignment). And if it's possible, is a large explanatory text like the Turing biography a valid input format to do so ?
Can you suggest any guide to build fine tune a model with domain knowledge and personality like Samantha llm model. Is the apporach best with fine tuning for tone and than use embeddings to influence response itself?
Should the formatted validation file be in the same format at the formatted training file?
Yes, I’ll upload an example to the git. Thanks for the comment :)
@@AemonAlgiz great vid :) I've been waiting for someone to go over this! Please don't forget to upload that validation file!
Thanks for catching that, it was added :)
how can we see the progress of training?
I dont have GPU in my laptop can I train via my Ryzen 7 CPU only ?
This would be difficult due to speed, since to fine tune you have to dequantize the weights back to floating point. CPU’s will be very slow, though in theory it should work.
I am wondering how to you store the new model, and how to reuse it every time?
The LoRA is in the lora folder, and can be added to models to influence them
What model do you use? I am having so much issues as ooba does not support 4-bit Lora training and I can't seem to gert any 8 bit running on it? Got 24gig of VRam so open for many options. Would really like to use Ooba
I usually stick with LLaMA 13/30B :D
@@AemonAlgiz I used LLama13B : llama-2-13b-ensemble-v6.Q8_0.gguf and TheBloke_Llama-2-7B-LoRA-Assemble-GPTQ. In both cases, I am seeeing an error in tokenizer. Can you please share the model you have used for training?
How to upload the training data ?
How long should training take? My first attempt stated that it would take several days, so I assume I did something wrong.
Training can take a long time, depending on a multitude of factors including your hardware. Can you give me some detail on your setup?
Depending on your hardware and dataset, several days could be feasible. What is your hardware and how large is your dataset?
@@AemonAlgiz I have a RTX 3060 with 64GB of RAM. I was using a text file that is about 14Mb in size.
For that volume of text with a 3060, that’s pretty reasonable. For scale, OpenAI had to spend several months with 10,000 A100’s to train ChatGPT.
@@AemonAlgiz 😱🤯
Let's say I train some LoRA's and I'm using ModelA. Later I'm not happy with ModelA so I switch over to another Model, B. WIll the LoRA's be gibberish to the new model? Next question. Let's say I want to do a LoRA on a text file dealing primarily with how to program a VCR. It's a lot of boring information but anyone having read this document would know how to program a VCR. Does this LoRA training make the model able to answer questions pertaining to programming a VCR? I know that this may be a Duh question... It's just there's so little out there for what LoRA's are that I feel I'd need a degree in this stuff to figure it out sometimes. Anyhow, thanks if you can answer these or set me straight.
LoRA’s are VERY broad and can be used for a multitude of use cases, though training a language model for Q/A on a VCR is absolutely possible. As for switching LoRA’s between models, I imagine if you’re switching it between the same model type (LLaMa, Vicuna, etc…) and not trying to switch it between different model types it should be fine. Though I’m just making an educated guess, so I wouldn’t quote me on it.
It is being reported that the EU is trying to ban LoRa and open source generative models. What would you recommend a rsearcher do right now to secure the assets required to continue research before these draconian rules goes in place and it is impossible to gain access to them?
It’s unfortunate that these regulations are going to serve only to hurt the open source community and research, but that’s the entire point. OpenAI and the other mega corporations want to have their moat and the only way to get it is through costing everyone else out through regulations.
For the moment, I would look at how LLaMA open source repositories append the LoRA’s since it’s not very difficult to do from a technical perspective. That way, we can retain the knowledge of how it’s done.
Thanks for the comment!
@@AemonAlgiz Good point. I guess I was more considering, for example, to get ooga booga set up and download the models and such before they get taken offline in one way or another. I'm trying to understand, if I get it set up and running today with LoRA capability, is there any way it could be retroactively disabled? Like does it call home anywhere during inference, or is it completely offline once everything is set up. Theoretically, even if they were to "ban" this stuff later I would already have all the elements to continue doing my own research and improve the model via fine-tuning/LoRA for personal use.
Hey Avi! The models don’t do any kind of phone home, no. Also, LoRA’s can be attached and detached at your whim!
I wish you had just shown what model you were using! I get different errors depending on what model I load, but I've never found a single one that can successfully launch training. I want to post a GitHub issue, but there are different errors.
thanks for the tutorial.
any hints on how much text is needed for a sucessfull training, is there a risk of overtraining with too much data? a suggestion to maybe talk a little more about the datasets and what to look for in them? thanks
That’s a difficult question and it depends on what LoRA dimension you want to use and how much data you have. If you could tell me what you’re looking to train that would help!
@@AemonAlgiz Im working on trying to create a fine-tuned model on llama 13b ggml, thats more for neurophysiology any thoughts?
How to train it to speak another language? I'm trying to train it to speak sanskrit.
This may not be possible, at least Sanskrit proper. I would be surprised if the tokens the network requires to learn Sanskrit on in the tokenizer. You may want to check and see if the Sanskrit tokens are valid tokens, either that or convert between those tokens to other tokens that are in the tokenizer…
@@AemonAlgiz not sure how to do that. I'm using vicuns7b, which already knows some hindi and sanskrit. I have various sized datasets. Itihasa.txt is about 47mb and sanskrit_corpus.txt is 630mb. I've trained itihasa before and it does seem to teach it, but it isn't very good. Sanskrit_corpus is too big for me to figure out how to train it.
It takes so long to train itihasa on cpu, it's hard to experiment so lately I've been playing with ooba's deepspeed, xformers and auto-devices to use gpu and cpu together.
@@AemonAlgiz I asked chatgpt3 to translate and I know understand the tolkenizer might not work for sanskrit. Still, if I can get ooba working again, I'll try.
Great video! I'm having an issue, the web UI doesn't detect my training file in the dropdown, did it ever happen to you? thanks!
oh no! the video ended before telling us what model you were using! None of the 4 bit models I've tried work for it apparently. I tried the Monkey Patch thing, but I get still get errors. If you could name at least one model that works with Lora training, that would be awesome thanks! Apparently it's not designed for 4 bit models.. but all of the models are 4 bit models... sooooo yeah not sure what model to use for training. I have 12 GB of VRAM.
Hey there! I was using this one:
huggingface.co/decapoda-research/llama-7b-hf
12 gigs of RAM is going to be pretty tight, but you could try training with that. If that fails Lambdalabs (not sponsored) is giving away free training time to open source models.
Good vid t"hanks!
Thank you for watching!
Thanks Aemon, that was a brilliant video.
Can Claude or the OpenAI GPT models be fine-tuned with LoRA or is it just LLaMa based models that have an interface for this?
I believe the format of the training data set has changed, but they have not updated it in the docs. From what I understand, it should now be like this: {'output': '...', 'instruction': '...'}
are you sure have you tested it? because I'm dealing with this issue atm
how can chats be trained
what if you have 500 pdf`s you want to use as data
Do you think 12gb vram could be enough?
That would be cutting it pretty close, though if you keep the LoRA dimension and the context window low, you should be fine
where are you ? Are you okey ? why dont we get new videos ?
thank you
do you have a discord channel?
I do! TheBlokes discord, you should join us!
@@AemonAlgiz can you please share an invite link?
Would be great if we can just give it a PDF and tell it to learn. We would have our personal trained AI for special applications. Same like Chat PDF but on steroids.
This should be doable with the right pipeline. You could create a service that extracts the text from PDF’s (OCR if they’re scanned) and then use them for training.
localGPT can be used to train by providing PDFs.
did you not have to first load a base model?
I feel like Generative Ai is filled with a bunch of people pretending to really understand concepts and application but not showing proper application.... It's the crypto explosion all over again...
Your audio is slightly out of sync with your image. It makes it hard to follow along, when you speak, the movements of your lips is delayed. This takes attention away from focusing on the content, to instead focusing on the out-of-sync issue.
Otherwise good presentation.
Please put Chinese subtitles on this video, thank you
The fact that you can create a big matrix by multiplying two smaller matrices has nothing to do with loras. The lora is not a clever compression method that compresses a big matrix down into two smaller matrixes. There is no point when the matrix multiplication in the lora layers produces a big matrix which can then be used for further calculations. Also, the lora layers have more than just two matrices. A rank 2 lora (rank 2 = 2 matrices) would be very weak and incapable.
This is not true, LoRA matrices are defined as rank decomposition matrices.
Where your weight matrix, W, has dimensions d x k and your rank decompositions, A and B, have dimensions d x r and r x k. The adjusted network output for an input x, is equal to:
Wx + ABx. If the weights matrix and LoRA decomposition dimensions didn’t match, this wouldn’t make sense.
Otherwise, you would pushing gradients to matrices with the same number of weights as your network, which would not open the door to less powerful hardware. Most of the fine tuned models you’ve used have very low dimensions, such as 32/64, so no it’s also not true that low rank is incapable.
Source (page 4, equation 3):
arxiv.org/pdf/2106.09685.pdf
Nice video. I have a problem, I am getting an error at the end of the Lora training, at saving phase
I get
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 1; 22.02 GiB total capacity; 20.06 GiB already allocated; 61.19 MiB free; 20.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I am using 4xA10 GPUs each with 24Gb VRAM
Any pointers?
The error is unfortunately what it says on the tin, you’re running out of GPU memory. Fortunately, QLoRA will help resolve the majority of these issues.
@@AemonAlgiz Still, I have 96 Gb VRAM accross all 4 GPUs, should be more than enough to train a LORA. I solved it in the end, the problem was starting Oobabooga with --auto-devices parameter. When I dropped that it started working like a charm :)
Ah, thanks for letting me know! I try not to assume that people have setups like that, but I’ll add this to my toolbox!
mindblown, ((greatvid))