This tutorial will go through my new process for producing LoRAs. It should give you tips and tricks on how to create your own. Civit.ai model: civitai.com/mo...
From my understanding, when prompting the images to be trained, these prompts should include everything that's not the character itself if you want to be able to change it later. so if you included "white dress", SD would understand that the dress is not an integral part of the charecter.
Could it be possible to have some tooltips in fields and buttons? Would make it easier to get started using this GUI. If everything you mentioned about options in this video were in tooltips, it would be way more useful - although big thank you for this video!
I love your training style. You’re so calm and seem so patient. I super appreciate that!! Can I ask two questions? My main question is if you have multiple people/characters you want in an image should they each have their own Lora or can they all be in one and if all in one is there a limit to how many characters can be trained in one Lora? Second question is the rule of 4 for the training images ... is this for a reason? I’m training a model now ... hadn’t heard that rule before. :) thank you!!
How do I do this in the most current version of Kohya SS GUI? I tried to change the optimizer to DAdaptation, but I can't get it to train under that with these settings. I have a RTX 4070ti, if that means anything.
Hi Bernard, I am super impressed with your workflow. My use case, is fashion photography. I have one fashion model person-subject and 10 different dresses. I need to output fidelity of the fashion model face-hair-appearance + outfit. So 10x Lora's, one for each outfit? My intention is integrate the photo dataset batches for each outfit session (say 50 images of each outfit the fashion model is wearing that I shoot in a studion grey seamless background) into a realistic street environment from v1.5 model. Is this correct thinking?
have you tried to train on v2.1 model? I'm having a problem to get any meaningful result from trained 2.1 lora (it trains fine, withput errors, but does not output anything like source images). I checked v2 and v parametrization and have chosen 2.1 runwayml model with same training parameters that was used in training working 1.5 lora. Cant find any information on this issue, maybe you could help or give some suggestions
I get some small element like earings and certain type of hat (but i need to specify that it is a person, otherwise it just spits out flowers - keyword was lilio), but not the likeness of the person on which photos 2.1 lora was trained
Following the tutorial, when starting training, "Random crop instead of center crop" gives me an error. I don't know what it could be due to. If I disable it, it works.
@@annonymoususer7672 30:11 You gotta check this part where he uses TensorBoard to see how the training is proceeding. You should expect the loss getting smaller and smaller. Further, it can get worse if the loss is about to start to increase. This is how the number of steps can be estimated. If you do intermediate saves during the training, you will be able to pick the best one, even if the network would eventually deteriorate after finishing all preset steps.
How do i know if my Lora model is overtrained (or undertrained)? When generating images, i have to set CFG scale super low, like between 2 and 3, to get good results. But then it is difficult to change the composition of the images and they all look similiar. Does that mean that the Lora was trained too much?
I tried to make Loras for 2 weeks now. I give up. The faces always look like a different person. I followed multiple guides and used high quality pictures, but the results are never right. I trained on the base SD 1.5 model with 10, 15 or 20 images and tried different amount of Steps. I've wasted so many hours and don't know what the problem is.
The goal here is to only caption the subject token and any other things you may or may not want to see via a prompt. Work very well. There are many caption strategy. At the end you use what work for you and what you need for your model. There is no black and white when it comes to captions.
I think in that if you caption it as "White women, blonde hair" now your bring every white women in the model, using a unique string of letters can only mean one thing, in the model. now just need to avoid over-fitting.
It would be nice to see a video about training a style with LoRA.
I second this. Most examples are of people
I would love to see your methods when doing textual inversion training
Excellent tutorial. Keep it up!!!
That's a big pack of new information that i learned from this video with yours explaining, thanks u alot!
From my understanding, when prompting the images to be trained, these prompts should include everything that's not the character itself if you want to be able to change it later. so if you included "white dress", SD would understand that the dress is not an integral part of the charecter.
Could it be possible to have some tooltips in fields and buttons? Would make it easier to get started using this GUI. If everything you mentioned about options in this video were in tooltips, it would be way more useful - although big thank you for this video!
I love your training style. You’re so calm and seem so patient. I super appreciate that!! Can I ask two questions? My main question is if you have multiple people/characters you want in an image should they each have their own Lora or can they all be in one and if all in one is there a limit to how many characters can be trained in one Lora? Second question is the rule of 4 for the training images ... is this for a reason? I’m training a model now ... hadn’t heard that rule before. :) thank you!!
I believe that rule of 4 was to perfectly divide the amount of training images. If he had 48 images, he could have used 8 images per batch.
Thanks bro, wow learned a lot things. I subscribed to your channel!!!
Pleas make a small video on settingup TensorBoard. UPDATE: I figured-it-out. It is by default installed.
Thanks for this... went from 2hrs/model to 20mins
How do I do this in the most current version of Kohya SS GUI? I tried to change the optimizer to DAdaptation, but I can't get it to train under that with these settings. I have a RTX 4070ti, if that means anything.
Hi Bernard, I am super impressed with your workflow. My use case, is fashion photography. I have one fashion model person-subject and 10 different dresses. I need to output fidelity of the fashion model face-hair-appearance + outfit. So 10x Lora's, one for each outfit? My intention is integrate the photo dataset batches for each outfit session (say 50 images of each outfit the fashion model is wearing that I shoot in a studion grey seamless background) into a realistic street environment from v1.5 model. Is this correct thinking?
This TensorBoard looks interesting. How can we activate it? Any descriptions for that?
have you tried to train on v2.1 model? I'm having a problem to get any meaningful result from trained 2.1 lora (it trains fine, withput errors, but does not output anything like source images). I checked v2 and v parametrization and have chosen 2.1 runwayml model with same training parameters that was used in training working 1.5 lora. Cant find any information on this issue, maybe you could help or give some suggestions
I get some small element like earings and certain type of hat (but i need to specify that it is a person, otherwise it just spits out flowers - keyword was lilio), but not the likeness of the person on which photos 2.1 lora was trained
how do your results compare from using keywords in your captioning vs. descriptive short-form sentences
thanks for sharing! it's very helpful. one question, when using Lora, can SD generate things from the original data model, such as a sword?
how to use finetune? Can you come up with a tutorial
Following the tutorial, when starting training, "Random crop instead of center crop" gives me an error. I don't know what it could be due to. If I disable it, it works.
I already solved it by disabling cache latent
Salut Bernard , est-ce que tu penses mettre kohya sur colab ?
Unless it uses some dll's which can not be run on Colab. And my guess is that it does.
Can you do tutoral for saved training state?
does anyone know how to train text encoder separately and save the state, so that, we can continue actual training later?
19:43 where is repeat numbers (50) calculated from?
If I have ~100 training images what number should be used?
It is an approximation. It take about 100 steps per image for good results. So it is a constant independent from the number of frames (images)
@@DM-dy6vn thank you!
@@annonymoususer7672 30:11 You gotta check this part where he uses TensorBoard to see how the training is proceeding. You should expect the loss getting smaller and smaller. Further, it can get worse if the loss is about to start to increase. This is how the number of steps can be estimated. If you do intermediate saves during the training, you will be able to pick the best one, even if the network would eventually deteriorate after finishing all preset steps.
@@DM-dy6vn can we do training in multistage? Let's say if it will take 3-10 hours(less powerful GPUs with less VRAM) or so, can we do 2hr per day?
@@Endangereds I guess, yes. Look for "LoRA network weights" in the training parameters tab.
Great content!
Using LR-Free branch.
I'm still wondering if the batching is not introducing
some averaging over the images in a batch.
How did you get it installed?
@@Endangereds You have to clone the LR-Free branch into a separated folder (git clone --branch LR-Free )
Then you follow the same installation steps.
@@DM-dy6vn thank you so much. 😃👍
Great TUT !! which branch did you use in this video, LR-Free or dadaptation or DAdaptation?
Back then it was LR-feee but now you can just use the master branch. D'adaptation is supported in the main code now as an optimizer option
@@BernardMaltais Thank you !!!
How do i know if my Lora model is overtrained (or undertrained)? When generating images, i have to set CFG scale super low, like between 2 and 3, to get good results. But then it is difficult to change the composition of the images and they all look similiar. Does that mean that the Lora was trained too much?
If this is the case then it is most certainly overtrained. Mine are flexible at cfg 10 or above...
I tried to make Loras for 2 weeks now. I give up. The faces always look like a different person. I followed multiple guides and used high quality pictures, but the results are never right. I trained on the base SD 1.5 model with 10, 15 or 20 images and tried different amount of Steps. I've wasted so many hours and don't know what the problem is.
Do you have a discord?
Why this way of captioning though? Seems like the opposite of every other person doing it?
The goal here is to only caption the subject token and any other things you may or may not want to see via a prompt. Work very well. There are many caption strategy. At the end you use what work for you and what you need for your model. There is no black and white when it comes to captions.
I think in that if you caption it as "White women, blonde hair" now your bring every white women in the model, using a unique string of letters can only mean one thing, in the model. now just need to avoid over-fitting.