1000% FASTER Stable Diffusion in ONE STEP!
HTML-код
- Опубликовано: 23 сен 2024
- Up to 10x Faster automatic1111 and ComfyUI Stable Diffusion after just downloading this LCM Lora.
Download LCM Lora huggingface.co...
Blog post huggingface.co...
Prompt styles for Stable diffusion a1111 & Vlad/SD.Next: / sebs-hilis-79649068
ComfyUI workflow for 1.5 models: / comfyui-1-5-86145057
ComfyUI Workflow for SDXL: / comfyui-workflow-86104919
Get early access to videos and help me, support me on Patreon / sebastiankamph
Chat with me in our community discord: / discord
My Weekly AI Art Challenges • Let's AI Paint - Weekl...
My Stable diffusion workflow to Perfect Images • Revealing my Workflow ...
ControlNet tutorial and install guide • NEW ControlNet for Sta...
Famous Scenes Remade by ControlNet AI • Famous Scenes Remade b...
As others have mentioned, not using this LCM at full strength helps if you are having issues with messy/distorted images. I'm getting pretty good results with setting the LCM at 0.5 with 16 steps. Still really fast, but with better looking generations. Also, I recommend trying this if you are having issues with the LCM while using models and lora's that are trained on a particular subject.
One issue is it makes animatediff not work well since animatediff usually needs more steps like 25-30 to get some good motion. Just wanted to put that out there, it does work with animatediff though.
how do you change the strength? if its not in the prompt
@@alsoeris In automatic1111, when LCM is used in the prompt, it would look something like this...
for half strength.
for full strength
for 20% strength
etc.
You've made my day, no more waiting 30 mins on my potato pc for a generation. Thank you so much
i installed but must have done something wrong as the quality seems poorer... back to the drawing board lol
it actually works very well, to create small samples then upscale them img2img, even SDXL is quick.
Interesting approach!
Sebastian, I really like your videos and your simple way of explaining things. Could you create a tutorial or recommend a video for Stable Diffusion or CmofyUI on how to insert an object that has been generated into other scenes? Generate the same element in different scenes? For example, I generated the design of a new bottle and then, the prompt gave me a perfect result, after which I want to create an image of this same bottle in a scene with different angles or different poses (like a new photo of someone holding the bottle of juice, for example) It would be very interesting to have this type of video.
Sometimes you give us quite the gems from the industry. Your research and sharing the knowledge is highly appreciated.
Thank you kindly! 🌟😊
I am also using an RTX4090 setup and i gotta say that i dont see much of a speed difference, however finding out about the comparison capabilities made it so much better to choose what model to use based on what i wanted to create. thank you for the info
May also be noted i was doing about 80 sampling steps and at an upscale value of 2.3
@@joppemontezinos2092 You're supposed to use 4 to 10 sampling steps AND cfg 1 to 3. It's very fast and yields good results but it's honestly a godsendfor mass producing images. You can make 100+ images SO FAST you can just pick the best one and high-res that with a better config to get the absolutely best of the best results.
SAME
Tip for experimentation: use it like a regular lora and play with the weight. Some custom models that give horrible colors at 1, will actually work better at 0.7.
Great tip!
I've discovered the same. Also increasing steps to hone in on the right quality. Maybe not 1000% increase, but 500% is still pretty good :) Even going all the way down to .1 will allow some to work much better and still get the speed increase.
yes, I'd feel more comfortable using the standard lora syntax instead of this black box method from the dropdown. same with my saved styles. Anyone know how to see them again and not just the tabs to add them? (please don't mention styles.csv that's where I edit them).
It doesn't appear under the regular lora network for me. I can just choose it from the dropdown menu
Thanks for the tip, I've seen my Anime models get really dim after applying LCM.
I made the same grid as in the video with 8 sampling steps for 2 cases: 1) with this LoRA and 2) withOUT it / None.
The time to generate is basically the same (actually without this LoRA is 10 seconds faster) => so the speed depends on the sampling steps rather than LoRA.
While quality => depends on the sampler but there are some VERY good effects without this LoRA at all for the same sampling steps.
I can't see much difference in either speed or quality if the right sampler is used.
The point of using this lora & sampler is that you can achieve results in 8 steps that you otherwise might need 25 or more steps with other samplers. For the best quality, I'd recommend the Comfy route using the lcm sampler together with that Lora, as a1111 with another sampler is more of a half-measure atm.
@@sebastiankamphlet's be honest, nobody uses lcm if they are looking for the best quality. The only people using lcm are the ones with old pc's who want to have some fun poking a couple 512x512 still unusable image.
On any high end graphic card, 8 steps vs 25 steps is only 1 second difference, no matter the model or sampler used, so something like the lcm makes no sense to professional users.
I've tried this with my ancient gpu gtx 970😂, generating 512x768, cfg 7, 30 steps image usually takes 42 seconds. With LCM it takes only 7 seconds, the result is comparatively good 👍
You should be able to do it in 4-8 steps with LCM, my 3090 can make a a 512x512 image in 0.25 seconds
980ti user here. I feel your pain.
I have tried GTX1080Ti generating 768x768, cfg 8, 30 steps: with LCM or no the same result 30 sec.((((
@@jibcot8541which 100% looks like trash and is totally unusable. Not sure what's up with people wanting to brag about being able to generate some tiny (512x512px) low quality images in a second.
I'm wondering what would happened if i use this with vega8. Hope it helps.
This helped a lot, i dont want to wait 1 hour to generate one image 😅
I am a big fan of you. Thanks for sharing knowledge in easy to follow language while everything is explained within the details not like other radio just repeating information that sometimes is not fully useful. Your stuff is good. Got my like and sub and a long time follower. I am one of you as AI researcher. Thanks very much.
So nice of you!
@@sebastiankamph Thanks man. I told it from heart and I benefited couple time due to your video. Good job while sharing info like a champ.
Duuuude, ive been using the sdxl one for a few days and it is a gamechanger, didnt know there was one for 1.5, awesome!
Sweet! How have you been liking it for SDXL?
@@sebastiankamph It's been amazing honestly, an order of magnitude faster on my 1080, going from 20+ mins with hires fix to about 1.5-3 mins using lcm. I was trying it out with 1.5 yesterday and it's great too, went from about 3 mins to just 30 secs. It honestly makes the experience much more enjoyable for me, being able to see this kind of improvement.
My not so "ptato PC" and my impatience thank you very much, I am your fan. I already passed the information on to my brother, I'm sure he will be happy too.
Thanks for sharing!
fuck das ist das beste SD Video dieses Jahr, ich kannst nicht fassen, wie schnell man jetzt damit arbeiten kann! Nvidia kann ihre TensorRT extention in die Tonne hauen!
JUst started a week ago and ive been loving it. Sweitching to comfy
Doesn't work
The SDXL lora does not seem to work for me. My RTX3060 with 12Gb VRAM gets 100% loaded and freezes the whole system for several seconds for each iteration. The outcoming images are usually a jumble of pixels. SD1.5 lora, however, seems to somewhat accelerate things for SD1.5 trained models.
I found out the picture quality is worse ONLY when applied to custom SDXL models, when applied to SDXL vanilla, or SDXL SSD-1B, it's somewhat par in quality yet SUPER FAST!!! (Tested on ComfyUI, LCM SSD1-B, LCM Sampler, 8 Steps).
Useful info, thanks! Unfortunately, in my case, I'm often on custom checkpoints, but the methodology could be instrumental in making future iterations faster. 👏🤩
@@taiconan8857 yah, surely it's doable for helping animated diff, that needs many frames to generate.
@@marhensa OH! I HADN'T EVEN CONSIDERED THAT YET! You're totally right! I'ma definitely need to revisit this when I'm at that stage. 👌😲
Dunno if it's bc I'm on AMD windows system and on the DirectML branch of A1111, but it doesn't seem like I have any improvement in speeds with this LoRA and even with reduced weight to 0.5 still seems like all it does is reduce generation quality. Oh well. Thanks for sharing Seb, still love your content!
Edit: Finally got it to work, my generations went from 38 sec/image with hires fix and ADetailer inpainting all the way down to 12 sec/image... Only downside is that the quality is worse than I'd prefer, most likely due to the requirement of low cfg scale basically ignoring negative prompts and embeddings
LCM's are what we call Rice Crispy Treats in Australia. Used to love when Mum put them in my lunch box for school 🤣
FYI: install the animatediff extension in A1111, this will automatically install the LCM sampler.
Any chance you can show how the live webcam setup can be done?
Thanks!
For the quickest answer, I'd guide you towards my Discord and ask kiksu himself.
I'm using the DirectML version because I have an AMD and I have to use my CPU and It's PAINFULLY slow. Will this help with that? Or is it only for those using GPUs?
I actually have a really decent GPU (RX 5700 XT) but I sadly can't use it since SD hardly supports AMD.
Did u try it? I have rx 7800 xt and have the same problem. Looking for options to improve rendering performance. AMD released a video with a tutorial but I haven't tried that yet.
@@LinkL337 I have not I just sucked it up and using the painfully slow CPU way lol. I spent 7+ hours trying all types of things though and nothing worked. I literally have to use my CPU it seems.
Update: I wasn't able to get it to work, then found a post on Reddit which suggested deleting the "cache.json" file in the webui directory. I renamed mine to cache2.json (just in case) and sure enough the Lora tab was showing ssd-1b in it and noticed speed improvements. Must be a bug of some sort as the cache.json file showed up again and everything seems to be working
Happy you got it working!
'thanks but it doesnt make it faster...its the same speed...3-4 secs sdxl with or without lora in between ... any ideas why? i have old rtx 3090, 8g
Thanks for the mega grid comparison - most of the comparisons so far are probably using the DPM 2M Karras, long time best performer, and seemingly terrible with LCM. I'll let the community do a few more evaluations with sampler and CFG before switching over.
Wow so instead of creating a hundred images every day that nobody cares about I can create 10,000 images a day that nobody cares about, fantastic!!!
Why are you so salty over this? It's a tool that some people use in their workflow. 😂
I already did the --xformers edit, can I still use this Lora or would the quality of images be affected?
In A1111 I don't see any difference in speed, the results are just worse
Do you know how to make it faster for pony diffusion? I dont think this works for pony models
hey do you think we can have more than 0.7 frame par second if you render only 500X500 with a 4090 as hardware
hey :) did the KSampler changed with the last update? i get errors on all my animatediff workflows since i updated all comfi-ui.
Error occurred when executing KSampler:
local variable 'motion_module' referenced before assignment
Hmmmmm, good question 🤔
me, too!
today i did another UPDATE ALL in comfy-ui, and now animatediff is working fine again :)
@@andreassteinbrecher458 yes,'UPDATE ALL' is the key
I'm confused with trying to get this working with SSD-1B. I downloaded, put in the correct folder, renamed and it shows in the add to network prompt drop down, but so far notice no improvements and quality seems poor. I keeps seeing something about diffusers but not sure what that is all about . Going back to the drawing board lol
Thanks for the research, will try it out !! :)
Hope you like it!
I've got a 4090 as well, and I can't not reproduce your results in A1111. Will keep trying.
I am running with sdp memory optimization. Similar speed increase as xformers.
Thank you! It works nice, Both A1111 and Comfy as well. But I have a rookie question. I can't save the Comfy workflow explained in the video, with the Lora loader node installed. If I save it as a .JSON file or PNG image it does not reload....
After my first generation, the following generations are much slower. Any idea why this happens and how to avoid it?
Vanlandic can't even see the files. They won't show up in the list after dropping them in the folder and restarting.
Is there also a way to enhance performance for image2image generations?
I selected the Lora, adjusted the steps and the CFG Scale but the render time is still the same if not worse. Please help :'D
This is amazing!!! Thanks!
Glad you like it! 😊🌟
my god! it works !!!!thankyou !!
Enjoy!
your link for civitai is no longer working
At a cfg scale of 2, how well does it adhere to complicated prompts?
I get that it's amazing for AnimateDiff or real-time applications, but is the quality good enough to replace workflows for image generation?
Probably less than usual. But try shorter prompts and weight them more.
thanks but mine is still very very very slow... what else can i do?
Thanks as always! I have an off-topic question, is there any way to make StableDiffusion not show people but only clothes? I put no human, no girl, etc. in the negative prompt and it still shows people.
i have a 3060 12GB gpu, was getting vram errors with this workflow on XL. process was rerouted to cpu. 50-70 seconds. so i suspected my vram was being squatted by orphan processes. rebooted and it's now working the way you describe. thanks.
i don't have the option to add the lora setting to the UI
Great video. What I don't get is why the CFG needs to be so low?
Not working for SDXL. Always bad quality, it should be also 8 steps/1CFG Scale at SDXL?
Works great for me with the LCM sampler. Not well without it.
@@sebastiankamph Okay then, now I know. I am at A1111, there is still not patch with LCM sampler, at least 1.5 working with Euler A.
@@ADZIOOinstall the animatediff extension, this will automatically install the LCM sampler.
Hi! Thanks for the lessons, they are great!!!
I cant set the sample steps below 20... Am I missing something?
you mentioned animateDiff? how can you use LCM with animateDIff? great video, btw
How much faster is it really? A comparison would be nice, also this could be used with the new tensorCoreRT?
Lmao this crashes the crap out of my amd card. I have a 7800xt and it steals all of my vram immediately which forces me to restart
And how to manage weight of lora in that upper menu? If you add lora to prompt field it is possible to manage as .
How i can use it with fooocus?
can this work on SD 3 ?
Cool video. Thank you
How did you get that Ui?
I assumed this video would be about the RT in A111, what's going on with that is it out yet? I've break from AI since March.
Ok first run of video, very confused what the one step use to make it 1000% faster??? download "1" file?? you started download several files and what so lost..
hello sebastian love your videos
can you also make video on how to use 2 character loras in image to image generation without inpainting ?
thank you
this will work with intel arc???
Does this trick only work with the dreamshaper model or would it work on any models?
I'm kinda new, but isnt it a problem if i have to use this LoRA? I mean, I can only use 1 LoRA at a time right? And if Im using this one it means I can't use another, which sort of defeats the purpose...
You can use as many LoRAs at a time as you like, there could possibly be a limit that i'm not aware of, but I know for sure you can use at least 4 or 5 at a time
Amazing!!! Thank you :)
You're very welcome! 🌟😊
Hmm... why is it working for you and not for a lot of us in automatic1111?
* Downloaded and renamed both Loras and put them into their Lora directory
* Enabled sd_lora in User-Interface Option in main UI
* Reloaded UI
* Updated complete automatic1111 with all extensions
* Restarted automatic1111 (ORIGINAL)
* lcm Loras do NOT appear in the Lora Tab Gallery, Only in the unusable dropdown list if you have a lot of Loras
* Tried all my Models AND Samplers for 1.5. and XL, all with really bad results with 8 sampling steps
My Options in main UI (like the "Add network to prompt" dropdown is shown in the left column under CFG scale, seed, etc.
Are you using a different version of automatic1111 or ist there something else that has to be anabled what a lot of us maybe don't have?
I have also very bad results.
test LCM on stable diffusion - seems that img2img lcm and vid2vid has an error - TypeError: slice indices must be integers or None or have an __index__ method
Oh and @sebastiankamph... I almost always laugh at your jokes even if my wife hates when I tell her them. Said the facial hair one to her yesterday because I DON'T like facial hair and she knows that! :)
Hah, I love it! Keep spreading the dad jokes for everyone to enjoy 😊🌟
Is it possible to do on Silicon M2? I try, but have some issues
I have a M1 16 unified memory MacBook Air I wonder how has it would be on it
HAHAHA 😅
I ::: honestly ::: look forward to the Dad jokes 🤣
Even if I don't have time to watch the entire video when I initially see it, I will watch until the joke and then come back later 😆👏🏾
Hah, glad to hear it! And great that you're coming back too 😅😁
How to make common image like yours? With all generations results in one table with methods and scalers?
Xyz plot in script at bottom. Can see my settings in video
does having A1111 installed on a HDD or SSD matter?
absolutely - ssd is way faster
Thanks bro!
You bet!
This is WILD! This ecosystem continues to boggle the mind. There's certainly some amount of "too good to be true" in here, such as the lora not playing nice with a lot of samplers, but cool nonetheless.
Btw, a couple things I would have liked discussed / to see is how this performs with common current settings (i.e. higher steps ~20 / CFG ~5), and on other models even if just sd1.5 / sdxl based models. Even if it was just like 15-30 seconds showing a good model vs a bad model that you've found. ofc, there's always the whole "try it in your workflow to see how it is for you," just would be nice to know if I can expect this to work outside of vanilla sd.
can you use it with turbo xl?
still have to experiment with this more but wow.. zoom! a 960x640 usually takes at least 1.5 minutes (RTX1080), this is done in seconds. Not quite happy with the detail yet however. But great for a quick try of a prompt I guess until I do more tweaking.
Will this work on Apple silicon like M1?
Actually, Apple M1 reached the most speed improvements (10x). I haven't tested myself, but the claims seem to be solid.
How do I make the interface like yours ? At the top where you select the model/checkpoint you have two more dropdowns to the right called SD_VAE and Add Network to Prompt. If somebody else than the video creator has the answer feel free to reply
Watch the video, he shows you how...
01:38 - 01:57
I am super confused, when I go to download the LCM model for SDXL, are we downloading the "pytorch_lora_weights.safetensors" file? I did that and used it as LORA, it is stuck! I am using a RTX 4090.
Yes! One for 1.5 and one for SDXL. Rename so you know which is which. Put in Loras folder
But does this work with sdxl?
i would like to find a way to use that with deforum and control net. does anybody have an idea how to make it work in automatic1111?
IS this LORA affecting the outcome of the artwork look or style, other than speeding it up? If this changes quality for the worse, I would not see the point of using it because SD is pretty fast as it is.
I am seeing a lot of Comfy UI and Automatic 1111. Is there and advantage to use one over the other? Is one better at "A" and another at "B"?
It's a very different philosophy. I would recommend automatic1111 for beginner and also for flexibility. ComfyUI in my opinion is more specialized but you don't have as much creative power (the inpaint for instance is quite annoying to setup). I tried ComfyUI and I'm back to automatic1111, it gives me the best results (also I kinda lost my node setup for ComfyUI and it's a pain to do).
@@jonathaningram8157 thank you! I also have been using automatic 1111 atm, but saw so many videos for ComfyUI so I thought i'd ask. thanks for the response!
Its even slower for me and looks much much worse using XL
will this work on mac m1?
can this be done with animation ? animated diff or video to video ? not sure I am setting it up right - in Comfy
Yes!
@@sebastiankamph I tried it too but I only get "weight" errors and noise. The creator of AnimateDiff seems to be working on a fix, not sure why some people claim it works for them?
@@elowine I used it just a few hours ago and worked ok. Not amazing, but ok.
@@sebastiankamph Ah nice, thanks for checking. Maybe an issue with certain GPU's
tried this with sdxl with no good resaults. sdv1-5 worked great though. any ideas? was using sd_xl_base_1.0.safetensors [31e35c80fc] with the lcm-lora-sdxl on mac m1 if that makes any difference
figured it out. I forgot to turn up the resolution :D lol
Is this working with img2img?
I don't see why not 😊
Does the LCM model work only with SDXL, and not SD 1.5 based models?
This is available for both. Works best in Comfy atm.
@@sebastiankamph Thank you. I look forward to trying it out. Still haven't taken the plunge on comfy yet. I really need to take some time and get it set up.
Not optimized for a1111 yet. Im using a custom checkpoint, a1111, 1.5 same settings as in video. Im on a 1080ti, and the quality is worse and the generation speeds are faster, but lower quality image.
mine is not even generating any pic
Im not seeing where to get the LCM scheduler for comfy, can someone point me in the right direction?
It comes automatically when you update comfy
@@sebastiankamph Found it, thank you!
how to add cinematic styles file?
I am confused. my pictures look worse using this :-(
Make sure to use the LCM sampler in Comfy for best results.
@@sebastiankamph I used Auto1111. I did Put the 1.5 lora in the lora folder, loaded a 1.5 model, added the lora to the prompt and set the steps to 8 with euler. Result looks worse than without the lora.
I did not use the lora dropdown Like you did. Ist this a must?
Not at all. Just an easy way of using it. But it limits the use of weights.@@metanulski
@@sebastiankamph thanks. will try again today. :-)
Thanks, but it's not working on mac m2(
Newbie question - would this speed up image generation on my Nvidia GTX 1060 gaming laptop with 4GB VRAM?
Yup, should speed up image generation on any platform so long as you make use of the lower steps "requirement" for a decent image.
@@user-jw9kg5rt4d That's awesome. Thanks!
When is the LCM sampler gonne be in A1111?
I have no idea, hopefully soon 😅
Install the animatediff extension in A1111, this will automatically install the LCM sampler.
sdxl doesnt work, not sure why.
probably need latest pips. will test again later.
You need LCM sampler for that.
Need to try for my gtx 1060. yesterday, with xformers and medvram it took 30 minutes to do a single image with sdxl and no refiner
Let me know what speed improvements you get 😊