If you have a very specific pose you want, use it as a base in img-to-img, even if the style is completely different. I sometimes even do some rough editing in PS beforehand. Then work with a denoise of 0,2 to 0,4 and you're golden.
This is what I always do. I really like using historical leaders, like that one portrait of Charles XIV John or even Victoria Louise of Prussia, because their poses are really excellent.
It's amazing how fast this technology moves. 3 months ago it was indeed quite a problem. Then depth masks came which slightly helped but still not the perfect solution. Now with the new ControlNET extension you have perfect poses every single time.
On the Denoising strength, when I'm trying to find just the right generation of an image, I like to do an X/Y prompt grid with CFG and Denoising. Denoising goes from 0 to 1 for no change to craziness, and CFG goes from 1 to 30 for wild creativity to "do exactly as I typed!". So, I do something like CFG "30, 25, 20, 15, 10, 5, 0" and Denoising "0, 0.15, 0.35, 0.5, 0.65, 0.85, 1". That's a 7x7 grid, but I've gone larger too. When the grid completes, you get the upper right as almost the exact original image, and that guy in the lower right is coming straight out of a Lovecraft novel.
Here's a good one - If you want something that is proving difficult with prompting, photo-bash it first then use the prompt and the photo-bash together via img2img, rather than prompt only and txt2img. The photo-bash doesn't have to look pretty (no need for color corrected, seamless joins etc) because img2img will fix those, assuming you give it a high enough denoising strength (I start at 0.75).
I think the biggest mistake people make when new to AI art is not realizing what an iterative process it is to get a good final result. Don't obsess too much trying to generate a perfect result initially in txt2img(it will never happen). txt2img is really only for dialing in your prompt and getting the composition of your image correct so you can get a good base seed to work from. Refining details, increasing quality, getting your perfect result is going to happen from working that base seed in img2img and inpainting. If you master inpainting, there is nothing you can't change or control in your image. I cannot emphasize that enough - Learn Inpainting! For example, wanna change the eyes or facial expression? Mask what you want to change in inpaint(use Only Masked, not Whole Picture when inpainting the face), then keep the quality portions of your prompt, but remove anything not related to what you want to change(you don't need things like "extra limbs" in your negative prompt for changing eyes, etc), and then replace those things with new prompt items that relate specifically to what you want to change in inpaint(for eyes you could add things like "highly detailed eyes, looking left, rich hazel color, subtle eyeliner" to the BEGINNING of your prompt). Render batches with the cfg unchanged, but the steps maxed out to 150. Boom, now you not only have amazing quality eyes, but can choose from the batches which ones you want exactly. Obviously there is a lot more iterative workflow steps to getting that perfect image(like using your X/Y/Z plot tool in txt2img to dial in the optimal steps and cfg for your prompt), but this is how you need to think when rendering AI images. It's an i t e r a t i v e p r o c e s s! TL/DR: Txt2Img is for dialing in prompting, cfg and steps to get a good base seed to work from. Img2Img and Inpainting is for refining and changing the details and increasing image quality. AI art is an iterative process - don't fall into the trap of playing endless rounds of Russian Roulette to get a good result! Get a good base seed to work from and then work it and refine it. 😎
Thought: You can get some good stuff at 512, but you're still going to have trouble with eyes and facial details. 570-640 is usually where I work. And most of the time, it's not too much for my six year old graphics card. Also restore faces only works on realistic stuff. If you're making cartoony stuff, they'll break your image as easily as fixing it. That's why you want to work at a slightly higher resolution
I find that being good at inpainting is far more important for getting good facial details, such as eyes, rather than upping the initial resolution when generating base images. Staying at 512 for initial batch generation allows more batches and images faster. Pick the one you want to work from as a base and send it to inpaint. Mask what you want to change(make sure to always select Only Masked instead of Whole Image when inpainting faces or the results will not be as good). The key here is to not use your original image prompt, but rather change the prompt to exactly whatever it is you want to change in the face. So for eyes I would simplify the original prompt to keep anything in it that is related to quality, but removing anything that has nothing to do with eyes and then adding to the beginning something like "highly detailed eyes, looking left, vivid hazel color, subtle eyeliner" to the beginning of the prompt. Same for the negative prompt - simplify it by keeping terms that relate to details and removing anything that has nothing to do with eyes(extra limbs, poorly drawn hands, poorly drawn feet, etc). This is a lot more effective way of changing any facial feature, facial expression, etc. as you get much more control over what is changed, how it is changed, and the quality of the changes through using specific prompts for the masked area and denoising strength. Plus you can generate the changes in batches and select the very best one. Specifically speaking to eyes, you can get consistent amazing quality eyes using this method by doing all of the above, not changing the cfg, but maxing out the steps to 150 when you generate your inpainted batches. Works regardless of style once you master this method. 😎
funny you should mention a poker playing friend with a prosthetic arm, i have one as well. his luck just hasnt been good lately. first he loses his arm, then last time he played, when the pot was $10000, his draw was so bad he had to replace his entire hand.
Another common mistake is to close the main window when you meant to close the folder of currently rendering images. I’ve never made that mistake, of course, but this could help others. 💁♂️
I can't tell you how happy your bonus tip made me. I was literally just wondering today if this was possible and then happened to catch your video. Turns out it was enabled by default and I can pull up all my best images again. Thank you!
Henlooo! Watched a couple of your videos on Stable Diffusion setup, basics, etc. I've got a free weekend coming ahead, and I want to get into and fiddle around with it a little bit. So, a "Thank You" in advance for your tutorials.
My main problem is that upscaling with any kind of upscaling tool doesn't really get you that much detail as upscaling in SD with img2img - but then I'm very limited in size due to my bad GPU.
I'm gonna go ahead and clone the repo and take this thing for a spin. How do ya think my RTX-3080Ti will do with the algorithm? And is it worth paying for cloud compute and hosting to run my instances remotely, or is running it locally a decent and productive enough experience? I'm a software engineer, so I don't mind tinkering around or building extra utilities and components to automate things and make it more pleasant to work with ... Also, I noticed that it seems like there are not many good choices out there other than automatic1111's web-ui front-end. Why a browser? It seems to me that it would be a better experience with the headless back-end and a native front-end UI with better features on desktop would be a vastly superior user experience, and I could write some additional tools and utilties to make it easier to work with. If anyone has any ideas/suggestions for a better front-end UI, improved features and user experience, etc, please let me know ... I'm strongly considering making a free / open source project for an improved toolset and front-end layer, to address many of the things that seem rough around the edges upon my first impressions. Any integrations, shortcuts, automations, etc people think would be valuable would be great to hear about too. I really wanna have some fun with this soon as do some things to make it better, faster, more accessible, etc. I have some ideas for performance optimizations and possibly improving the model/algorithm itself, but I will need to spend some time studying the existing ones and experimenting.
Your 3080ti will run perfectly fine. I run on a 3080 myself. In regard to the browser question, it's where we are atm. Model has only been open source a few months. Anyone building a superior ui will surely take market shares.
@@sebastiankamph I think we can both agree that a _4090_ would run better and is necessary and justified ... the children may have to tighten their belts a bit for Christmas ... LOL, kidding, but I already planned on getting one, now I have a great idea for what to do with the 3080Ti afterward also :D I hit the ground doing some heavy research, found _very few_ projects implementing better UI and management tools for this other than a couple work-in-progress things ... outside of that, there's basically just a slew of websites that want to charge you a fee to use it and dole out little bits and pieces of functionality at insane costs ... that's definitely something that needs to change, I know _plenty_ other ways as a developer to make money that doesn't involve extortion or creating artificial roadblocks to create monetized gateways and pay-walls, lol. There are also a _lot_ of unexplored optimizations and enhancements that could be made to this, which I think is largely due to web developers who don't really know what they're doing just reading the setup instructions, putting it on some servers with cloud compute rental and building a simple UI and authentication + monetization layer around it. That's neither how this tech is supposed to be used _nor_ the best way to employ it. I'm not a super AI-expert, my expertise is more in graphics and engineering (mostly 3D real-time) and native engines, applications, games, etc. It seems like people are using _very_ compute-intensive (i.e., _expensive_ and inefficient) methods of upscaling, denoising, filtering, etc ... they probably should have checked with graphics engineers first, haha. I'm getting a lot of great ideas and "opportunist vibes" from this thing ... I've still got more research and tinkering to do though, and need to talk to some "hardcore" users about what could be done differently to make this better ...
@brainswashedthisways nonsense, you just need a cold-fusion or zero-point quantum reactor to power it, which can be obtained from any local intergalactic computer store for a low price of 2,300 Blemkflarkz. Insulated wires, cables and plugs? Ha, so primtive, so inefficient ... if you use Earth technologies to power a 4090 you've made a critical error ...
Very true about the prompt differences between SD and MJ. SD has the potential to give you much more of what you want, but you've got to learn how to speak to the machine ;)
I've heard interesting discussions around this too, in relation to things like ChatGPT. Whether someone is impressed or not often comes down to if someone intuitively knows how to ask in a way that the AI understands.
I feel like I have mastered stable diffusion. I have done it all and my images are far superior to most that I see. I understand how to engineer every part of the picture at every spot in the prompt and there is a hierarchy even without weights I spend a lot of time going through lots of seed configurations, cfg scales, clip steps, and denoising to sometimes get a closer result to my prompt and sometimes to let AI surprise me. I like to do a large batch and sometimes single out the ones that I really like and then run them back in hi-res
8. Don't keep sitting on your desk for Stable Diffusion. You need better inspiration from outside.Your output results are determine by your emotional intelligence and have better understanding of life and people around you. Don't lost in keeping in touch with people, pets,arts, nature and music if you're seeking better AI creativity.
Or creativity in general, I'd say. Observing the best is good for inclining your work, but having a meaningful life helps you a lot best when it comes to finding your own direction
9. Forgetting to return back to the actual checkpoint after changing to an inpainting checkpoint for that finishing touch, and wondering why all the next 50 images you generate look nothing like that crisp one you made earlier
Hey, Sebastian! I just installed 1111 last week and am now diving into SD. Mainly for Deforum Animating, but definitely generating and tweaking images, too. I'd say the biggest noob mistake I've made so far is not watching this and your deep dive SD videos yet... :) Thanks for all your work and info sharing!!
Agree with all of your points besides "stick with a low resolution." While 512x512 does generally result in a consistently cohesive and attractive image, the "Highres. fix" works incredibly well with 1024x1024. Waaaayy sharper and more detailed than 512x512, and looks even better than a 4x upscaled 512x512 image. With the "Highres. fix" I leave "Denoising strength" at 0.7, and set the "Firstpass width/height" to 512. Takes around ~8GB VRAM for 1024x1024. It's by far my favorite feature in the WebUI.
You can also upscale a 512x512 image to 1024x1024 with img2img using the same prompt. Use around 0.2 to 0.3 denoising. This is pretty much what high res fix does. This does alter the image a bit, but adds in the fine detail where upscalers fail. This is useful because you can generate 512x512 way faster than 1024x1024. Once you find an image with the right layout, 'upscale' it using img2img. After that, you can use the regular upscale to make a 4096x4096 image.
Biggest mistake is garbage words. If a model isn't trained with a word, that word sounds up tokenizing to effectively random noise that will warp your image in unpredictable ways.
The Dad jokes were the best part :) ---- Btw, Face restore often blurs out the image a little.. makes it too perfect. I tell people to be ok with a crappy face, then go in and inpaint it using "only masked" -- and this will build you out a high-res face, with all the skin texture, etc....
whats your best tip to avoid double heads apart from keeping the resolution low? negative prompts that work for you? I used two persons, two heads and such, what are yours?
I don't mess around too much with long negative prompts, just use some basic words. But I iterate a lot in img2img when I have a base composition that I want with the right number of heads. You can also check the highres-fix box when you're working outside of 512x512.
Now I'm thinking of a process with ChatGPT generating dad jokes, matching this with images using stable diffusion and put the joke text in text balloons... 🤣
Thanks for the video Sebastian, definitely gonna improve my renders. btw, do you have any idea how to make a lyric video for a song? I had trouble displaying words so idk if there's like something specific I have to do with my prompt or something like that. Thank you
You're most welcome! Are you referring to SD actually creating the text for you? If so, no, it's terrible at text. I'd slap those on afterwards in a video editing software.
Always wondered what codeformer is when booting automatic1111 :). Have you tried "asthetics gradients" (auto1111 plugin) where you can slab LAION beautifiers over your creations? I didn't see much of a change in my tests. It's a lot of more of these over-sensitive knobs, but promising from the idea and claim. Also: those different gradient files are crazy-small.
Thank you the helpful video. Appreciate it! Anyone know where "codeforner" setting is please? Do you have to dl it as a separate module? Thank you very much
THX for this good informations. I have a question to the automatic update SD111 with the command git pull in the webui.batch. It always ran, but now i have an error message if execute : error: The following untracked working tree files would be overwritten by merge: javascript/generationParams.js modules/script_loading py. Please move or remove them before you merge. Anybody now what does this means and what should do? THX
I think you can remove that git pull in the file for now. I did too, sometimes updates come too fast and things break. So might as well leave as is and update when necessary. And if you keep getting that error in the future and need to update, do a fresh install. And thanks for the kind words 😊
@@sebastiankamph Thank you and you are welcome. I have learned a lot about stable diffusion from your good videos and tutorials. Keep up the good work 😊
Loving these videos (and hearing your lovely Swedish accent makes me homesick for Sweden...I lived there for nearly 20 years). Any tips about how to avoid heads being cropped off? Or how to build out an image if you like it but half the head is missing? So far that is by far my biggest annoyance.
Thank you for the kind words! Now I don't understand why you would ever leave Sweden, but I do admit, it's kinda cold here right now. Heads being cropped off is a regular annoyance for everyone. I would recommend working more with img2img. Find a composition that you like in txt2img, and then move on to img2img. You could also outpaint, but that's more finicky tbh. Seeing as you've practiced drawing I'd even recommend starting from your own sketches and use those in img2img. That way you'll also have the heads like you want them.
Wish i could live there (i have family there but 3 generations back). Starting to feel a little crazy living in this country 😢😅😢 Have had some luck with strong negative prompts like “cut off head” etc.
@@sebastiankamph I had to leave for luuuuuuuv...but I get SO homesick for Sweden, even after 15 years. Oh well. Thanks for the tips! Seriously, cut off stuff is the bane of my existence. I've had mixed success with outpainting, so I'll give img2img a try and see how it goes. LOL that you know about my drawing experiment! I have to say, it was a year well spent. Thanks again for your response...you rock!
@@LoriLinstruth You can also check out high res fix for expanding. Like start with 512x512 and then a higher resolution to get more stuff in there. Oh, I sometimes have a quick look at commenters if they have content on their own RUclips. I'm curious by nature.
Not sure if you kept with SD or not, but make sure you check out openpose with ControlNet if you haven't. Should prevent heads from ever being chopped off again.
Friend, first of all congratulations for the video, by the way, is it possible to train a model and during that training pause to continue on another day? Thanks
If you are using the automatic1111 web ui then it actually changes the seed. If you enter Seed 1, and generate 4 images, then Image 1 will have Seed 1, Image 2 Seed 2, etc. It adds always adds 1. You can check this yourself. You can click on, say, image 7 and then use the button to retrieve the seed and can then compare this to the seed of image 6 and 8, it should always be 1 apart from each other
You guys seem to know how all of this works so let me ask. Sometimes I see people taking a picture of themselves into img2img and remake it in the style of Arcane or something similar. This requires the arcane.ckpt file which means you don't have the style transfer setting that you get with the 1.5 inpaint.ckpt which means that your resulting picture is in the style of Arcane but the composition is not close to your start image at all. Corridor crew did this in a recent movie, how? When I try I need the 1.5 inpaint.ckpt to get a matching image but the arcane.ckpt file for the actual style which changes the comp drastically.. what am I missing here?
You can merge models, but you can also adjust the denoising strength in img2img. I'd try merging a trained Dreambooth model on you together with arcane.
Also, I think there is an arcane LoRA now, so should be able to use that on top of an inpainting capable model, if I understand how it works correctly.
#1 Mistake: Thinkin you NEED to use 512x512 resolution. Nope. SD has zero problems with say 1024x2048 or my current most used resolution 384x2048. That works just fine with 1.5 and 2.0.
I wouldn't say "zero" problems. There certainly are issues as different resolutions and ratios will influence composition. For example, make the image too tall and you might find your characters growing another waist from their thighs, things like that. But yeah, no need to limit yourself to 512x512, just need to up that batch number.
@@tomaszwota1465 Thinkin that my comment is now 8mo old and how little SD has improved in that time :D You are correct that it makes weird human anatomy. SDXL has the same problem. Midjourney 5.2 does that rarely if ever. But i rarely do humans for real work. For landscapes SD did great job with big resolutions like 2024x1024.
I always seem to get horrible body distortions when I've included some detail in the prompt that the model doesn't appear to have 'awareness' of.. I could have a prompt that's making some decent images (still very new to this), but then I change an adjective the model doesn't seem to know, & it either alters the image in a way that it can avoid having to incorporate that detail (slacker!), or it's limbs, limbs, limbs 😬 btw, I'm physically incapable of telling dad jokes... I have no children 🥁😄
Is it just me or is IMG-IMG literally black magic? I can't figure it out. I've tried using it SO many times and can never get any good results. Currently using Anything V3
Using models like Waifu Diffusion, Anywhere v3, etc. which are trained on Danbooru tags has made sure that I never prompt like a human whenever making waifus. But that first mistake is definitely something I should keep in mind using SD 1.5 and other models, that's for sure. The rest were also definitely something I can learn from. Thanks for all the useful advice!
Imagine Iiving in a world where you ргіḓəḟսꞁꞁꭚ talk about your іпԀῦІƺҽлт мᾲᴤtսꭇβάϮtіѳח habits on RUclips, while ргꬲէӟոժіпᶃ to be a סתгмἇ׀ person without any ꝓѕꭚꞇԩσΙоϧίсᶐӏ enwarpments. It sure is a new generation of Internet ꞇꞕᴉլԀꭋἔꞃ we have in our woꭇId today. And it's also ẜꝺꭇьіԀɗɘὴ to spеаk սp against іt, because that's ԁιςᴐᵲίպḯꭒӑϯσꝵꭚ, so it has to become normal now.
@@yesyoucan4619 Imagine being so insecure you have to attack strangers online for having fun with drawing pretty girls, something humanity did since before antiquity, and being so paranoid from your persecution complex you mask your words like that.
Has anyone found a solution to AI lackings in multiple actors in a given prompt? For example - Dog sitting on top of a camel which is licking a cat that is chewing on a mouse? I know that the AI is somewhat in it's infancy but it bewilders me why it cannot see that level of description and interpret the final result when it draws the image. Anyone? I know I may be expecting too much, but using the reality that it can produce a dog sitting on top of a camel wearing a sombrero in the desert at night, why couldn't it?
That level of spatial awareness is still definitely tricky for the AIs in general. MidJourney V5 may or may not be a bit better if they manage to get a better text engine into it (they said they weren't sure on the time), and perhaps DeepFloyd though that isn't totally clear (and who knows when that actually comes out). Like Sebastian said, ControlNet can help, especially the posing and new MultiDiffusion region stuff.
DO NOT use "restore faces" if you want to generate anime/ cartoon style images. Otherwise, SD tries to humanize the faces, which then ends in alien-like results
do not try to turn on the restoration of faces, when generating art, or any non-photo realistic effect, try to do without it until the last, this jackdaw (yes, I know about the possibility of setting it up) greatly affects the overall style, and often completely destroys it. I'm saying that you should try to achieve this without her, and you will get a much cooler and more consistent result at the output!
Though important for people reading to know SD 2.1 is natively at 768x768, so is very appropriate with a model/mix based on that (like Illuminati Diffusion)
i fundametally disagree. i generate the straight image from txt2img and upscale. very few times i had to img2im to fix a crop. i use img2img to clean up archival photos. running the same image 20 times is a waste of time and you will probably end up ruining the things you liked about it in the first place. 1000x1200 txt2img resolution means you dont have to do much to it
I guess short Midjourney prompt looks lazy. More details more specifications, I would use same prompt fro SD in Midjourney because Midjourney prompts like let AI decide instead of the user... like wombo dream, I don't like short prompts!
How the hell do you make actions? Like people doing things, pulling, walking, holding... Most of the time is just makes people standing there. If I want someone picking something up or throwing something, there are just random hands or arms in a strange place. It just seems to make people standing there but not doing things!
How do you salute a sailor, that uses AI...? AI AI Capten...! Oh...Video is great, Thank You...! Looking forward, to get my own SD and rewatch you. Learning in Playground now...I just can't use their SD inpaint proper...Is It Me...? PS: I think I will settle for the spider...most in tune.
Hah, great with another one! Thank you for the kind words. And I absolutely think you should jump on SD. Looking forward to seeing the pixel light stuff 🌟
Great video. But using "Restore Faces" is a trap. Believe me, do not use it. It changes so much of the face, turning a great model into meh most of the time. Just make sure you write a good prompt, using a good model and you will never have to use Restore Faces.
The FREE Prompt styles I use here:
www.patreon.com/posts/sebs-hilis-79649068
Don't ignore negative prompts.
If you have a very specific pose you want, use it as a base in img-to-img, even if the style is completely different. I sometimes even do some rough editing in PS beforehand. Then work with a denoise of 0,2 to 0,4 and you're golden.
Great comment Eduardo! 🌟
This is what I always do. I really like using historical leaders, like that one portrait of Charles XIV John or even Victoria Louise of Prussia, because their poses are really excellent.
Use a depth mask to maintain the composition/pose while changing everything else
It's amazing how fast this technology moves. 3 months ago it was indeed quite a problem. Then depth masks came which slightly helped but still not the perfect solution. Now with the new ControlNET extension you have perfect poses every single time.
On the Denoising strength, when I'm trying to find just the right generation of an image, I like to do an X/Y prompt grid with CFG and Denoising. Denoising goes from 0 to 1 for no change to craziness, and CFG goes from 1 to 30 for wild creativity to "do exactly as I typed!".
So, I do something like CFG "30, 25, 20, 15, 10, 5, 0" and Denoising "0, 0.15, 0.35, 0.5, 0.65, 0.85, 1". That's a 7x7 grid, but I've gone larger too. When the grid completes, you get the upper right as almost the exact original image, and that guy in the lower right is coming straight out of a Lovecraft novel.
Here's a good one - If you want something that is proving difficult with prompting, photo-bash it first then use the prompt and the photo-bash together via img2img, rather than prompt only and txt2img.
The photo-bash doesn't have to look pretty (no need for color corrected, seamless joins etc) because img2img will fix those, assuming you give it a high enough denoising strength (I start at 0.75).
That's actually a REALLY great idea! Thanks! 😁
photo-bash?
I think the biggest mistake people make when new to AI art is not realizing what an iterative process it is to get a good final result.
Don't obsess too much trying to generate a perfect result initially in txt2img(it will never happen). txt2img is really only for dialing in your prompt and getting the composition of your image correct so you can get a good base seed to work from.
Refining details, increasing quality, getting your perfect result is going to happen from working that base seed in img2img and inpainting. If you master inpainting, there is nothing you can't change or control in your image. I cannot emphasize that enough - Learn Inpainting!
For example, wanna change the eyes or facial expression? Mask what you want to change in inpaint(use Only Masked, not Whole Picture when inpainting the face), then keep the quality portions of your prompt, but remove anything not related to what you want to change(you don't need things like "extra limbs" in your negative prompt for changing eyes, etc), and then replace those things with new prompt items that relate specifically to what you want to change in inpaint(for eyes you could add things like "highly detailed eyes, looking left, rich hazel color, subtle eyeliner" to the BEGINNING of your prompt). Render batches with the cfg unchanged, but the steps maxed out to 150. Boom, now you not only have amazing quality eyes, but can choose from the batches which ones you want exactly.
Obviously there is a lot more iterative workflow steps to getting that perfect image(like using your X/Y/Z plot tool in txt2img to dial in the optimal steps and cfg for your prompt), but this is how you need to think when rendering AI images. It's an i t e r a t i v e p r o c e s s!
TL/DR: Txt2Img is for dialing in prompting, cfg and steps to get a good base seed to work from. Img2Img and Inpainting is for refining and changing the details and increasing image quality. AI art is an iterative process - don't fall into the trap of playing endless rounds of Russian Roulette to get a good result! Get a good base seed to work from and then work it and refine it. 😎
this was really well explained, thanks so much
@@NeroZenith Thanks for the thanks! Glad someone got something out of it. 😁
@@thebrokenglasskids5196 Thanks from me too.
The most annoying thing is, when denoiser is too high, it still draws a cat even when you put "cat" in the negative prompts.
It gives the AI more creative freedom, for sure 😅
If you set that denoiser to 9+ you're in the wild west! Hold on to your hat, god knows what'll happen 😂
Thought: You can get some good stuff at 512, but you're still going to have trouble with eyes and facial details. 570-640 is usually where I work. And most of the time, it's not too much for my six year old graphics card. Also restore faces only works on realistic stuff. If you're making cartoony stuff, they'll break your image as easily as fixing it. That's why you want to work at a slightly higher resolution
I find that being good at inpainting is far more important for getting good facial details, such as eyes, rather than upping the initial resolution when generating base images.
Staying at 512 for initial batch generation allows more batches and images faster. Pick the one you want to work from as a base and send it to inpaint.
Mask what you want to change(make sure to always select Only Masked instead of Whole Image when inpainting faces or the results will not be as good). The key here is to not use your original image prompt, but rather change the prompt to exactly whatever it is you want to change in the face. So for eyes I would simplify the original prompt to keep anything in it that is related to quality, but removing anything that has nothing to do with eyes and then adding to the beginning something like "highly detailed eyes, looking left, vivid hazel color, subtle eyeliner" to the beginning of the prompt. Same for the negative prompt - simplify it by keeping terms that relate to details and removing anything that has nothing to do with eyes(extra limbs, poorly drawn hands, poorly drawn feet, etc).
This is a lot more effective way of changing any facial feature, facial expression, etc. as you get much more control over what is changed, how it is changed, and the quality of the changes through using specific prompts for the masked area and denoising strength. Plus you can generate the changes in batches and select the very best one.
Specifically speaking to eyes, you can get consistent amazing quality eyes using this method by doing all of the above, not changing the cfg, but maxing out the steps to 150 when you generate your inpainted batches. Works regardless of style once you master this method. 😎
@@thebrokenglasskids5196 Yeah, that original comment is pretty ancient. We've come a long way since last year. Also try adetailer. Take care
funny you should mention a poker playing friend with a prosthetic arm, i have one as well.
his luck just hasnt been good lately. first he loses his arm, then last time he played, when the pot was $10000, his draw was so bad he had to replace his entire hand.
I actually took a second with the prosthetic joke because I was expecting a "bad hand" joke and got the difficulty dealing joke 😅🙃
Another common mistake is to close the main window when you meant to close the folder of currently rendering images. I’ve never made that mistake, of course, but this could help others. 💁♂️
I've also heard of this mistake, that only other people do 🌟
Pin the tab, then there is no close button
I would like to know how to properly close sd like I close the cmd window or what?
@@pranshumittal3374 I always close the main window and then the cmd window.
I can't tell you how happy your bonus tip made me. I was literally just wondering today if this was possible and then happened to catch your video. Turns out it was enabled by default and I can pull up all my best images again. Thank you!
Thank you 4 the dad jokes 😁✌️
You're very welcome! And a huge thank you for the support, your DNA is made of diamonds, you absolute gem you! 💎💎💎
Henlooo!
Watched a couple of your videos on Stable Diffusion setup, basics, etc. I've got a free weekend coming ahead, and I want to get into and fiddle around with it a little bit. So, a "Thank You" in advance for your tutorials.
Thank you. The best tip is to starts at 512x512 and then upscale your favorite images.
Happy to hear you found something valuable to you! 🌟
My main problem is that upscaling with any kind of upscaling tool doesn't really get you that much detail as upscaling in SD with img2img - but then I'm very limited in size due to my bad GPU.
I'm gonna go ahead and clone the repo and take this thing for a spin. How do ya think my RTX-3080Ti will do with the algorithm? And is it worth paying for cloud compute and hosting to run my instances remotely, or is running it locally a decent and productive enough experience? I'm a software engineer, so I don't mind tinkering around or building extra utilities and components to automate things and make it more pleasant to work with ...
Also, I noticed that it seems like there are not many good choices out there other than automatic1111's web-ui front-end. Why a browser? It seems to me that it would be a better experience with the headless back-end and a native front-end UI with better features on desktop would be a vastly superior user experience, and I could write some additional tools and utilties to make it easier to work with. If anyone has any ideas/suggestions for a better front-end UI, improved features and user experience, etc, please let me know ... I'm strongly considering making a free / open source project for an improved toolset and front-end layer, to address many of the things that seem rough around the edges upon my first impressions. Any integrations, shortcuts, automations, etc people think would be valuable would be great to hear about too. I really wanna have some fun with this soon as do some things to make it better, faster, more accessible, etc. I have some ideas for performance optimizations and possibly improving the model/algorithm itself, but I will need to spend some time studying the existing ones and experimenting.
Your 3080ti will run perfectly fine. I run on a 3080 myself. In regard to the browser question, it's where we are atm. Model has only been open source a few months. Anyone building a superior ui will surely take market shares.
@@sebastiankamph I think we can both agree that a _4090_ would run better and is necessary and justified ... the children may have to tighten their belts a bit for Christmas ... LOL, kidding, but I already planned on getting one, now I have a great idea for what to do with the 3080Ti afterward also :D
I hit the ground doing some heavy research, found _very few_ projects implementing better UI and management tools for this other than a couple work-in-progress things ... outside of that, there's basically just a slew of websites that want to charge you a fee to use it and dole out little bits and pieces of functionality at insane costs ... that's definitely something that needs to change, I know _plenty_ other ways as a developer to make money that doesn't involve extortion or creating artificial roadblocks to create monetized gateways and pay-walls, lol. There are also a _lot_ of unexplored optimizations and enhancements that could be made to this, which I think is largely due to web developers who don't really know what they're doing just reading the setup instructions, putting it on some servers with cloud compute rental and building a simple UI and authentication + monetization layer around it. That's neither how this tech is supposed to be used _nor_ the best way to employ it. I'm not a super AI-expert, my expertise is more in graphics and engineering (mostly 3D real-time) and native engines, applications, games, etc. It seems like people are using _very_ compute-intensive (i.e., _expensive_ and inefficient) methods of upscaling, denoising, filtering, etc ... they probably should have checked with graphics engineers first, haha. I'm getting a lot of great ideas and "opportunist vibes" from this thing ... I've still got more research and tinkering to do though, and need to talk to some "hardcore" users about what could be done differently to make this better ...
@@GameDevNerd Be careful, 4090 may melt down and set your house on fire 😂
@brainswashedthisways nonsense, you just need a cold-fusion or zero-point quantum reactor to power it, which can be obtained from any local intergalactic computer store for a low price of 2,300 Blemkflarkz. Insulated wires, cables and plugs? Ha, so primtive, so inefficient ... if you use Earth technologies to power a 4090 you've made a critical error ...
Very true about the prompt differences between SD and MJ. SD has the potential to give you much more of what you want, but you've got to learn how to speak to the machine ;)
Exactly how I feel! 🌟
I've heard interesting discussions around this too, in relation to things like ChatGPT. Whether someone is impressed or not often comes down to if someone intuitively knows how to ask in a way that the AI understands.
Ok noob question here. Do you know how prevent elements from combining? For example, if you want a cat and a dog, not a cat mixed with a dog. Thanks.
I feel like I have mastered stable diffusion. I have done it all and my images are far superior to most that I see. I understand how to engineer every part of the picture at every spot in the prompt and there is a hierarchy even without weights
I spend a lot of time going through lots of seed configurations, cfg scales, clip steps, and denoising to sometimes get a closer result to my prompt and sometimes to let AI surprise me. I like to do a large batch and sometimes single out the ones that I really like and then run them back in hi-res
😂😂😂 👏
4:53 wait... Can you explain to me what I have to do in order to upscale the resolution? Pls (🙏🏻)
8. Don't keep sitting on your desk for Stable Diffusion. You need better inspiration from outside.Your output results are determine by your emotional intelligence and have better understanding of life and people around you. Don't lost in keeping in touch with people, pets,arts, nature and music if you're seeking better AI creativity.
Or creativity in general, I'd say.
Observing the best is good for inclining your work, but having a meaningful life helps you a lot best when it comes to finding your own direction
What's with those video clips you show during the video (e.g. 2:33)? Were they generated with Stable Diffusion?
Nice tunes, good advice condensed into a short video. Subbed.
What a banger Sebastian! Great video input and also big step to more professional videos! Love the style :)!
Thank you, glad you liked it and happy to hear feedback on the new style! 😊🌟
9. Forgetting to return back to the actual checkpoint after changing to an inpainting checkpoint for that finishing touch, and wondering why all the next 50 images you generate look nothing like that crisp one you made earlier
Hey, Sebastian! I just installed 1111 last week and am now diving into SD. Mainly for Deforum Animating, but definitely generating and tweaking images, too.
I'd say the biggest noob mistake I've made so far is not watching this and your deep dive SD videos yet...
:) Thanks for all your work and info sharing!!
Thank you kindly for the nice words! Welcome to the world of SD 🌟🤩
I just like and suscribe with the first dad joke. Thanks for the content. It's really usefull
Great video input and also big step to more professional videos!
Thank you for your content!
What is the music at 4:06?
i like your jokes sometime.. give me a chill out start.. good work sebastian
Getting used to Sebastian's dad jokes. Funny stuffs!
🥰
Agree with all of your points besides "stick with a low resolution." While 512x512 does generally result in a consistently cohesive and attractive image, the "Highres. fix" works incredibly well with 1024x1024. Waaaayy sharper and more detailed than 512x512, and looks even better than a 4x upscaled 512x512 image.
With the "Highres. fix" I leave "Denoising strength" at 0.7, and set the "Firstpass width/height" to 512. Takes around ~8GB VRAM for 1024x1024. It's by far my favorite feature in the WebUI.
Where is the Hires.fix? Is it a setting or do you download it? I just started using WebUI, thanks!
@@NathanLorenzana It's a checkbox on TXTIMG in the same area as Restore Faces
You're right, highres.fix has been improving a lot. I think it depends if you have the hardware and time to use it.
You can also upscale a 512x512 image to 1024x1024 with img2img using the same prompt. Use around 0.2 to 0.3 denoising. This is pretty much what high res fix does. This does alter the image a bit, but adds in the fine detail where upscalers fail.
This is useful because you can generate 512x512 way faster than 1024x1024. Once you find an image with the right layout, 'upscale' it using img2img.
After that, you can use the regular upscale to make a 4096x4096 image.
@@KadayiPolokov Got it! Thanks!!
Biggest mistake is garbage words. If a model isn't trained with a word, that word sounds up tokenizing to effectively random noise that will warp your image in unpredictable ways.
The Dad jokes were the best part :) ---- Btw, Face restore often blurs out the image a little.. makes it too perfect. I tell people to be ok with a crappy face, then go in and inpaint it using "only masked" -- and this will build you out a high-res face, with all the skin texture, etc....
whats your best tip to avoid double heads apart from keeping the resolution low? negative prompts that work for you?
I used two persons, two heads and such, what are yours?
I don't mess around too much with long negative prompts, just use some basic words. But I iterate a lot in img2img when I have a base composition that I want with the right number of heads. You can also check the highres-fix box when you're working outside of 512x512.
Generally, using "highrez fix" with square resolution works best.
@@Reverie_Blaze cool thank you
@@sebastiankamph thank you!
he is looking up as if he is thinking and then reads it from the screen, haha I luv this guy
He's on to us! Run!
Now I'm thinking of a process with ChatGPT generating dad jokes, matching this with images using stable diffusion and put the joke text in text balloons... 🤣
5:48 I love this music, who's the artist? Thanks for the helpful video bro
Thanks for the video Sebastian, definitely gonna improve my renders.
btw, do you have any idea how to make a lyric video for a song? I had trouble displaying words so idk if there's like something specific I have to do with my prompt or something like that.
Thank you
You're most welcome! Are you referring to SD actually creating the text for you? If so, no, it's terrible at text. I'd slap those on afterwards in a video editing software.
Excellent tips! Thanks.
My fav joke was taking out the spider! Thank you :) Sub'd!
When upscaling in imagetoimage, lower the denoising strenght, otherwise your picture will consist of multiple tiles
Spider joke, and my big mistake (started Yesterday) is not putting in the negative prompts. I got a lot of stuff I didn't want that way.
Always wondered what codeformer is when booting automatic1111 :). Have you tried "asthetics gradients" (auto1111 plugin) where you can slab LAION beautifiers over your creations? I didn't see much of a change in my tests. It's a lot of more of these over-sensitive knobs, but promising from the idea and claim. Also: those different gradient files are crazy-small.
I haven't tried that particular plugin/extension. I'll be sure to check it out, thanks for the tip! 🌟
Web Designer! That was too good to be a dad joke!
My biggest mistake? Forgetting to reset my seed to random. That was an infuriating 10 minutes...
Hah, but at least you learned something 😊
@@sebastiankamph oh you bet I did.
Keep up the great work Sebastian ;)
Thank you the helpful video. Appreciate it! Anyone know where "codeforner" setting is please? Do you have to dl it as a separate module? Thank you very much
It's included in automatic1111. Check settings tab under face restoration.
@@sebastiankamph Thank you very much! Do you find codeformer better than GFPGAN?
Play around with models, merge them, test them. Easy way to add alot of flare
THX for this good informations. I have a question to the automatic update SD111 with the command git pull in the webui.batch. It always ran, but now i have an error message if execute : error: The following untracked working tree files would be overwritten by merge: javascript/generationParams.js modules/script_loading py. Please move or remove them before you merge.
Anybody now what does this means and what should do? THX
I think you can remove that git pull in the file for now. I did too, sometimes updates come too fast and things break. So might as well leave as is and update when necessary. And if you keep getting that error in the future and need to update, do a fresh install. And thanks for the kind words 😊
@@sebastiankamph Thank you and you are welcome. I have learned a lot about stable diffusion from your good videos and tutorials. Keep up the good work 😊
Whichs apps you use to make your thumbnail?
Specifically taking my sketches and finishing the art with flat color and inked
Loving these videos (and hearing your lovely Swedish accent makes me homesick for Sweden...I lived there for nearly 20 years). Any tips about how to avoid heads being cropped off? Or how to build out an image if you like it but half the head is missing? So far that is by far my biggest annoyance.
Thank you for the kind words! Now I don't understand why you would ever leave Sweden, but I do admit, it's kinda cold here right now. Heads being cropped off is a regular annoyance for everyone. I would recommend working more with img2img. Find a composition that you like in txt2img, and then move on to img2img. You could also outpaint, but that's more finicky tbh. Seeing as you've practiced drawing I'd even recommend starting from your own sketches and use those in img2img. That way you'll also have the heads like you want them.
Wish i could live there (i have family there but 3 generations back). Starting to feel a little crazy living in this country 😢😅😢
Have had some luck with strong negative prompts like “cut off head” etc.
@@sebastiankamph I had to leave for luuuuuuuv...but I get SO homesick for Sweden, even after 15 years. Oh well. Thanks for the tips! Seriously, cut off stuff is the bane of my existence. I've had mixed success with outpainting, so I'll give img2img a try and see how it goes. LOL that you know about my drawing experiment! I have to say, it was a year well spent. Thanks again for your response...you rock!
@@LoriLinstruth You can also check out high res fix for expanding. Like start with 512x512 and then a higher resolution to get more stuff in there. Oh, I sometimes have a quick look at commenters if they have content on their own RUclips. I'm curious by nature.
Not sure if you kept with SD or not, but make sure you check out openpose with ControlNet if you haven't. Should prevent heads from ever being chopped off again.
how do you actually paste the copied generation data?
Friend, first of all congratulations for the video, by the way, is it possible to train a model and during that training pause to continue on another day?
Thanks
could u give me tipps for making: good eyes, teeth, nails, fingers etc.
"Nice guy, he's a web designer." ROFL!!!!
😂
i have bigger problem cant generate image and also even using prompt doesnt work im using rog flow x13 1060 gpu and other solution
best tip for upscaling? please :)
SD does upscaling pretty good. Check my ultimate guide for some upscaling comparisons.
you had me at the dad jokes❤
Glad I could be of service! 😘🌟
I had to think twice about that web designer (lack of sleep perhaps) XD win!
😅
Very nice!! 🤩 But use Stable Diffusion images for your B-roll! 😁
I tend to use a mix, but best would be for everything to be AI, even videos! :)
I like his B-roll 🎉 he has nice taste in stock clips 😂
Is restore faces useful for anime?
Why is it that, when u use a specific seed and generate a batch of maybe 10 images from it, they all look different. Shouldn't they all look the same?
I think when you generate a batch of 10 it automatically uses a different variation seed for every one of those images
@@androsforever500 oh yeah that makes sense
If you are using the automatic1111 web ui then it actually changes the seed. If you enter Seed 1, and generate 4 images, then Image 1 will have Seed 1, Image 2 Seed 2, etc. It adds always adds 1.
You can check this yourself. You can click on, say, image 7 and then use the button to retrieve the seed and can then compare this to the seed of image 6 and 8, it should always be 1 apart from each other
How do you have soo low subs
You deserve way more
Thank you for the kind words, it warms my heart! 😊 You're a real 💎
Come for SD, stay for the jokes😁
Glad you liked them! 😊🌟
5:10
😆 🤣 😂
And multiple legs and toes on certain animals too!
So many weird images, right!? 😅
your spider joke secured the subscribe
Hah, glad to hear it :)
You guys seem to know how all of this works so let me ask. Sometimes I see people taking a picture of themselves into img2img and remake it in the style of Arcane or something similar. This requires the arcane.ckpt file which means you don't have the style transfer setting that you get with the 1.5 inpaint.ckpt which means that your resulting picture is in the style of Arcane but the composition is not close to your start image at all.
Corridor crew did this in a recent movie, how? When I try I need the 1.5 inpaint.ckpt to get a matching image but the arcane.ckpt file for the actual style which changes the comp drastically.. what am I missing here?
You can merge models, but you can also adjust the denoising strength in img2img. I'd try merging a trained Dreambooth model on you together with arcane.
Also, I think there is an arcane LoRA now, so should be able to use that on top of an inpainting capable model, if I understand how it works correctly.
i dont get what is automatic 111 ?? is this another name for stable diff??
Automatic1111 is currently the #1 Stable diffusion graphical user interface.
#1 Mistake: Thinkin you NEED to use 512x512 resolution. Nope. SD has zero problems with say 1024x2048 or my current most used resolution 384x2048. That works just fine with 1.5 and 2.0.
I wouldn't say "zero" problems. There certainly are issues as different resolutions and ratios will influence composition. For example, make the image too tall and you might find your characters growing another waist from their thighs, things like that.
But yeah, no need to limit yourself to 512x512, just need to up that batch number.
@@tomaszwota1465 Thinkin that my comment is now 8mo old and how little SD has improved in that time :D You are correct that it makes weird human anatomy. SDXL has the same problem. Midjourney 5.2 does that rarely if ever. But i rarely do humans for real work. For landscapes SD did great job with big resolutions like 2024x1024.
I always seem to get horrible body distortions when I've included some detail in the prompt that the model doesn't appear to have 'awareness' of.. I could have a prompt that's making some decent images (still very new to this), but then I change an adjective the model doesn't seem to know, & it either alters the image in a way that it can avoid having to incorporate that detail (slacker!), or it's limbs, limbs, limbs 😬 btw, I'm physically incapable of telling dad jokes... I have no children 🥁😄
Excellent
I'm not sure how much it matters, but when I specify "irises" instead of "eyes", the eyes maybe look slightly better?
Cool tip!
Is it just me or is IMG-IMG literally black magic? I can't figure it out. I've tried using it SO many times and can never get any good results. Currently using Anything V3
Sure is, and works great. Check some basics on img2img here: ruclips.net/video/DHaL56P6f5M/видео.html
Using models like Waifu Diffusion, Anywhere v3, etc. which are trained on Danbooru tags has made sure that I never prompt like a human whenever making waifus. But that first mistake is definitely something I should keep in mind using SD 1.5 and other models, that's for sure. The rest were also definitely something I can learn from. Thanks for all the useful advice!
anything v3
Glad you liked it, thanks for the support! 🌟
Imagine Iiving in a world where you ргіḓəḟսꞁꞁꭚ talk about your іпԀῦІƺҽлт мᾲᴤtսꭇβάϮtіѳח habits on RUclips, while ргꬲէӟոժіпᶃ to be a סתгмἇ׀ person without any ꝓѕꭚꞇԩσΙоϧίсᶐӏ enwarpments. It sure is a new generation of Internet ꞇꞕᴉլԀꭋἔꞃ we have in our woꭇId today. And it's also ẜꝺꭇьіԀɗɘὴ to spеаk սp against іt, because that's ԁιςᴐᵲίպḯꭒӑϯσꝵꭚ, so it has to become normal now.
@@yesyoucan4619 writing like that is crime who taught you to write like that EW!!
@@yesyoucan4619 Imagine being so insecure you have to attack strangers online for having fun with drawing pretty girls, something humanity did since before antiquity, and being so paranoid from your persecution complex you mask your words like that.
Thanks!
You're very welcome 😊
Second dad joke was the best.
It was a mistake to compliment him on the dad gags.
I've ventured down a path I cannot stray from now.
Has anyone found a solution to AI lackings in multiple actors in a given prompt? For example - Dog sitting on top of a camel which is licking a cat that is chewing on a mouse? I know that the AI is somewhat in it's infancy but it bewilders me why it cannot see that level of description and interpret the final result when it draws the image. Anyone? I know I may be expecting too much, but using the reality that it can produce a dog sitting on top of a camel wearing a sombrero in the desert at night, why couldn't it?
Check my latest videos on ControlNet. That fixes that. Wasn't possible previously.
That level of spatial awareness is still definitely tricky for the AIs in general. MidJourney V5 may or may not be a bit better if they manage to get a better text engine into it (they said they weren't sure on the time), and perhaps DeepFloyd though that isn't totally clear (and who knows when that actually comes out). Like Sebastian said, ControlNet can help, especially the posing and new MultiDiffusion region stuff.
The jokes are so terrible they end up making me laugh :D
Great, eh? 😊
What is your opinion playground AI
It doesn't look different from every other online Stable diffusion ui out there.
DO NOT use "restore faces" if you want to generate anime/ cartoon style images.
Otherwise, SD tries to humanize the faces, which then ends in alien-like results
Definitely the spider one
Glad you liked it! 🌟😊
do not try to turn on the restoration of faces, when generating art, or any non-photo realistic effect, try to do without it until the last, this jackdaw (yes, I know about the possibility of setting it up) greatly affects the overall style, and often completely destroys it. I'm saying that you should try to achieve this without her, and you will get a much cooler and more consistent result at the output!
Just make it save both before and after pic, and you can choose.
you forgot to use the "no cross eyes" prompt
I have had pretty bad results with many of the batches of images I've had SD render at and above 768 X 768.
Then use the native resolution, 512x512
@@devnull_
Yes, of course. The larger rez is experimentation. Some of it renders nice results.....but generally that's a relatively small percentage
Though important for people reading to know SD 2.1 is natively at 768x768, so is very appropriate with a model/mix based on that (like Illuminati Diffusion)
So yeah... I forgot about restore faces. Dang.
Those faaaaaaaaaans 🌀🌀
i fundametally disagree. i generate the straight image from txt2img and upscale. very few times i had to img2im to fix a crop. i use img2img to clean up archival photos. running the same image 20 times is a waste of time and you will probably end up ruining the things you liked about it in the first place. 1000x1200 txt2img resolution means you dont have to do much to it
Anyone got a spare graphics card haha with my radeon r9 290x it takes about 20 min for a 512x512 with 10 sampling steps xD
I guess short Midjourney prompt looks lazy.
More details more specifications, I would use same prompt fro SD in Midjourney because Midjourney prompts like let AI decide instead of the user... like wombo dream, I don't like short prompts!
OMG! 🤣A 🕸Web designer 🕷
How the hell do you make actions? Like people doing things, pulling, walking, holding... Most of the time is just makes people standing there. If I want someone picking something up or throwing something, there are just random hands or arms in a strange place.
It just seems to make people standing there but not doing things!
ControlNet extension is your friend here
He’s a web crawler
Would have been a better end to the punchline
Thanks for the videos
web developer
How do you salute a sailor, that uses AI...?
AI AI Capten...!
Oh...Video is great, Thank You...! Looking forward, to get my own SD and rewatch you.
Learning in Playground now...I just can't use their SD inpaint proper...Is It Me...?
PS: I think I will settle for the spider...most in tune.
Hah, great with another one! Thank you for the kind words. And I absolutely think you should jump on SD. Looking forward to seeing the pixel light stuff 🌟
Please include (for absolute beginners) in the title. Thanks.
I fatfingered my mouse and clicked into the middle of the video to hear "anime waifu".
I'm scared
🤣
last joke the funniest 😆🤣😂
🌟🌟🌟
Great video. But using "Restore Faces" is a trap. Believe me, do not use it. It changes so much of the face, turning a great model into meh most of the time.
Just make sure you write a good prompt, using a good model and you will never have to use Restore Faces.
You're right! But this was good info back when this video was released. Nowadays I inpaint to improve faces.
You forgot commenting for the algorithm.
I sure did, but you did it anyway, so goldstar to you my friend! 🌟
If you save txt2img as JPG, the comment field of the file will have all your settings.
04:06