How to use Aesthetic Gradients: Stable Diffusion Tutorial
HTML-код
- Опубликовано: 3 июн 2024
- A new paper "Personalizing Text-to-Image Generation via Aesthetic Gradients" was published which allows for the training of a special "aesthetic embedding" which allows the user to specify more clearly what they want to any existing stable diffusion model. In this tutorial we walk through how to train an aesthetic embedding, and use it to generate images.
Discord: / discord
00:00 - Summary
01:07 - Paper Explanation
06:51 - Webui Installation
10:25 - Aesthetic Gradients Installation
11:45 - Using Pretrained Embeddings
21:50 - Training New Embedding
29:40 - Comparing Embeddings
34:39 - Experiment Outcomes
------- Requirements -------
python 3.10
Local Nvidia GPU
CUDA 11.3+
------- Links -------
Aesthetic Gradients Paper: arxiv.org/abs/2209.12330
GIthub Desktop: desktop.github.com/
Webui: github.com/AUTOMATIC1111/stab...
Huggingface Stable Diffusion 1.4 Model: huggingface.co/CompVis/stable...
Aesthetic Gradients Extension: github.com/AUTOMATIC1111/stab...
Premade Aesthetic Embeddings: github.com/vicgalle/stable-di...
CLIP search of LAION: rom1504.github.io/clip-retrie...
Code for downloading LAION images: github.com/Lewington-pitsos/c...
Script to download LAION: github.com/Lewington-pitsos/c...
Webui thread on embeddings: github.com/AUTOMATIC1111/stab...
Another webui thread on embeddings: github.com/AUTOMATIC1111/stab...
Github Repo for original implementaion: github.com/vicgalle/stable-di...
Useful Blog Post: metaphysic.ai/custom-styles-i...
Github Repo containing the portrait embeddings: github.com/Lewington-pitsos/c...
------- Misc -------
ffmpeg commands to extract images from videos:
ffmpeg resize and crop video: ffmpeg -i ghib.mp4 -c:a copy -filter:v "scale=960:512,crop=iw-448:ih-0" smol-ghib.mp4
ffmpeg extract images: ffmpeg -i smol-ghib.mp4 -r 0.3 -f image2 image-%4d.jpeg
------- Music -------
Music from freetousemusic.com
‘Late Morning’ by ‘LuKremBo’: • (no copyright music) c...
‘Daily’ by ‘LuKremBo’: • (no copyright music) c...
‘Marshmallow’ by ‘LuKremBo’: • lukrembo - marshmallow...
‘Travel’ by ‘LuKremBo’: • lukrembo - travel (roy...
‘Sunset’ by ‘LuKremBo’: • (no copyright music) j...
‘Biscuit’ by ‘LuKremBo’: • (no copyright music) l...
‘Sunflower’ by ‘LuKremBo’: • lukrembo - sunflower (...
‘Chocolate’ by ‘LuKremBo’: • (no copyright music) l...
‘Branch’ by ‘LuKremBo’: • (no copyright music) c...
‘Rose’ by ‘LuKremBo’: • lukrembo - rose (royal...
‘Butter’ by LuKremBo: • lukrembo - butter (roy...
‘Onion’ by LuKremBo: • (no copyright music) l...
‘Animal Friends’ by LuKremBo: • lukrembo - animal frie...
‘Snow’ by LuKremBo: • lukrembo - snow (royal...
‘Affogato’ by LuKremBo: • lukrembo - affogato (r...
Many thanks to LuKremBo
#stablediffusion #aiart #tutorials #techtutorials #promptcrafting #install #installation #researchpaper - Наука
The latest code from automatic1111 doesn't require you to call it model.ckpt, it gives you a drop down to select from all .ckpt files in that directory. So you can have 1.4 and 1.5 etc and just select between them.
Thaaaaank you, I was suspecting that might be the case but hadn't had time to test it myself
i FINALLY Got automatic1111 installed and noticed that it works well with teh inpaaint/1.5 ckpt and it's great. i was not sure if it made a diff between that ckpt and the one in the instructions
25:45 Just in case you didn't know, two ways to avoid having to type "cd (folder url)" :
1) Shift right click in the folder inside the file explorer --> Open Powershell here.
2) Replace the url in the file explorer with "cmd" and press enter.
Thanks for the tutorial.
Oh damn, that second one is kind of cool thanks dude!
The way I see it, it feels like the aesthetic embedding creation basically does a clip interrogation on the set of images you provide it.
So if you give it 1000 "closeup photograph beautiful woman", it is the same as 20 "closeup photograph of beautiful woman" images. It gets the jist.
And that's all the aesthetic does to mesh with the text embedding. It takes a (100% weight "hamburger") text and can't reconcile it with a (5% "painting of a cabin on a sunset horizon, 3% "painting of a sailboat in rough waves"...).
But it obviously works very well when it sees the text "portrait of a woman" and an aesthetic of beautiful women. So it says, oh okay you want a portrait of a woman, but you want that woman to be beautiful
Loved this tutorial, but your background sound drove me crazy!
Please: Dont't even think about background sound! - We're not interested in that disturbing noise while listening to a topic where you have to concentrate.
Again: Great video. I'm already hooked and am ordering parts for my new "numbercruncher" because of these new AI opportunities, also open for us.
And I will definetely try this tomorrow! - It's soo cool ;-)
Awesome that sharing your failed experiments. You just saved me tons of time!
That's why im here lol, to serve the hive mind
Great video thank you I tried aesthetic images for a while psst few days and was trying to use the promot from the embedding etc. You helped me understand it a lot more thank you.
if I can save anyone from suffering the same despair and turmoil as I did then it's all worth it
This was absolutely fantastic. added the aesthetic model and i'm FINALLY getting outputs like i wanted which look more like midjourney.
that's sick dude, pls link if you happen to upload any examples
That's awesome, did you create embeddings from images in MJ?
@@LM-zj7xp i only got to use midjourney for a bit but i loved the style. With stable 1.5 and the sac8 aesthetic, i can actually get faces to look like faces with all the engagements. Its just so good and a lot faster than what I've been able to use until now
Thank you so much for this one! 😊
Thank you! You solved my long running main problem!
Thanks, well explained and investigated 😁
thanks for the great video! very informative
You are wonderful to listen to 👏👏👏
Great video! Very helpful ☺️ 👍
Amazing work!
Thanks, very helpful.
This is the same thing that happened to me during the beta 2nd wave, when I typed murca and open shutter on a highway. I had both a woman and the highway and lights put together, and it was interesting looking. Using those weights appears to be what is happening in both cases.
I've actually had success using it for a specific person's face. I trained an embedding on cartoonized pictures of my friend and it was able to make an embedding that generate faces that are strikingly similar to my friend, so now I can make cartoon avatars for my friend. If you think about it, the model should already know about different facial features, different nose shapes, eye angels, jawlines, etc. So it shouldn't be hard for it to combine those features to generate a specific face. I trained it for 15 hours on 20 images.
Thanks a lot man
very helpful! liked and subbed. Looking forwards to the follow-up. I wondered whether it might be just build longer text prompts by using img-to-text on the aesthetic inputs… but I’ve played with that and it feels very weak currently, the prompts don’t seem like good ones, and feeding them back in as generation prompts doesn’t result in related images very often.
I have always found img-to-text mostly useless, so I'm not super surprised
tyvm !
Hello, I need help. I've tried both this and a number of other tutorials but for some reason I don't get the Aesthetic Gradient window to appear in my Automatic1111. It's currently up to date with git pull, I tried opening 1.5 and 2.0 models, all other extensions are working just fine, and I've restarted it many times.
any reason to not use flipped images for the aesthetic embeddings? Also Did you try the embedding on novelai ckpt or waifu diffusion? Prob more likely to get the ghibli embeddinfs work nicely. Also try AND operators for when you're having trouble with targeting colors.
what's the upshot of using flipped images, like, just to have more images? Since all you need is 20, it doesn't seem like a huge benefit...
Heym any success downloading the json file? it looks like it's prbably "limiting" theaount of links to something around 60?
Thanks
How different it is from textual inversion that tries to create a matching embedding basing on the input images and the fixed model? By the way - in case of Dreambooth: isn't them U-net left untouched and only the text generator is modified to properly understand your new embeddings?
I'm pretty certain that in dreambooth both are modified, but I'm no expert! Textual inversion is a bit meh as far as I'm aware. I don't trust it with a 10 foot pole
Thanks!
hi, i found yours has a "Image Browser" tab in the webui , but i found mine was missed since i done a git pull today! do you know how to revert to that version?
I ended up at commit "696cb33", "after initial launch, disable --autolaunch for subsequent restarts". You can revert to that using github desktop... I imagine they had a good reason for changing the way image browsing worked though.
Aesthetic Gradients are now an extension. You can install it using git:
yoo this made a HUGE DIFFERENCE FOR ME AHAHAH XD thats amazing ,
Appreciate the hard work, thanks very much!
great tutorial, thanks.
i am getting *ModuleNotFoundError: No module named 'requests'* error from clip download script, what may the issue be?
yo! you probably have the wrong version of python, check out this stackoverflow for more: stackoverflow.com/questions/17309288/importerror-no-module-named-requests
Bloody legend!
spoken like a fellow OZZY, thanks mate!
@@lewingtonn haha and a happy one at that! Great material as always I will give it a try if I can get it working in a Colab!
Does this still work today in a1111 ?
I try different settings and nothing changes.
Bailed at 'they barely even do' so thanks for the heads up!
that is very fair my dude, I thought I'd give everyone some warning
great video, lots of info to digest, thanks...
That's interesting, if a bit situational. I was hoping it could be used to created a consistent style for the output but it seems not.
Does original checkpoint influence on training VAE? For example if I select waifu diffusion model or robo diffusionor something else
glad you asked BUCKO, yes, a lot
maybe get a link on that fine portrait embedding ? That be super nice. thank you for this awesome explanation.
good point, I just uploaded them (link in description). Kinda sad that the 20 image one was as good as the 5000 image one tbh
What exactly does learning rate do?
Another fine addition to my collection :)
haha are you training an AI on my videos??
Our analyst's guarantee that all responses are genuine, we can assure you are not being harassed by bots.
@@ceaselessvibing5997 lol
Wait, so it's basically image prompts? Cause variation generation would be pretty nice.
Very interesting tutorial. Thank you.
I have tried to use the aivazovski embedding, and I get this error message:
AttributeError: 'FrozenOpenCLIPEmbedderWithCustomWords' object has no attribute 'tokenizer'
How can I fix that?
I also get an error when trying to create an aesthetic embedding.
Thank you for your time and help.
im having same issue, did you fix it?
@@real_snl No, I did not.
hey could somebody help me out? I m getting: AttributeError: 'NoneType' object has no attribute 'T'
37:03 that probably happened for those lights lines (second and fifth image)
wish i could like twice
feel aesthetic gradients is more to push the prompt towards a pre trained subject instead of a style … if you train the aesthetic embed on comic pages it will generate comic pages and not the comic style.
if it is trained on images with lots of dogs … it will generate dogs even if you asked for a cat
but it still never gives good hands :'(
@@lewingtonn That may be a fault with the model/training, not necessarily in the communication with it.
I don't get it. So I have Stable Diffusion, and I downloaded the code from github. But where do I have to locate the aestetic gradient folder? In my stable diffusion folder? there is no tab called "Extensions" There. Help please? :D
i just checked the repo to make sure, and there IS an "extensions" folder in the repo. as long as you are using the latest version of the stable diffusion webui, the extensions folder WILL be there
github.com/AUTOMATIC1111/stable-diffusion-webui
@@lewingtonn hey thanks! I had to update my stable diffusion, even tho I installed it just a few days ago, probably I didn't get the newest verison!
@@lewingtonn subscribed and liked, great service and fast answer thank youuuu!
How far is this from how the main 1.5 model was trained. I don't think we are there yet. But for practical uses I would like to curate a set of architectural images, and be able to generate more predictable results that can be used for business purposed.
this isn't the way to do that. You'd be looking at dreambooth, but even then actually selling it would be hard af
Thank you
And does it have to be with 5000? can't i just do it with 100? I don't want my computer to explode :(
Pd: now i saw that you did explain that, but i hadn't reached that minute yet hahah, thanks again
Awesome run-through, Thanks! By the way, your clip-download link set to private on github??
OOOOOPS!!!! Lemmie fix that real quick, thank
@@lewingtonn Thank you ! This is CLIP downloader is a game changer for experimentation. I do run into an error from line 2 of download.py. ModuleNotFoundError: no module named 'requests' . Is there a requirement that the requirements.txt didn't express, perhaps?
Follow up - I needed to pip install 'requests' -- heads up that this might need to be in some folks' requirements. Script working!
I honestly never had much luck with them either.
Shift + right click on folder and no need to "cd" )
If you try your anime embeddings on Waifu Diffusion or the leaked model-that-shall-not-be-named, they'll probably look much better.
Ivazonsky? )
why u lower batch size to 128?
wow, i'm surprised how worthless this feature is. are we sure it's implemented properly? simply adding a few words to the prompt would have a better effect (like you said)
I think that's a bit of a harsh view... I can definitely see cases where downloading 8 images and creating an embedding would be quicker and easier than iterating on different prompts... but yeah it's not world breaking by any means
@@lewingtonn Not world breaking BUT the ones they give you are FANTASTIC and damn drastic changes. My problem is with 192 images I sure couldn't do what they did. Oh, btw, the more images you do I did find it the end result was better. Did a txt2img and his head stopped being a lightbulb (was blue and glowed) and more of a head with 192 images. I just wonder if all this shit needs 24 gigs of ram? Oh, another issue is that there is absolutely no vram conservations implemented into this NOW script. If you get an OOM for the gpu then ctrl-c it and start all over because you will always get it even if you have only 1 image as happened to me.
@@generalawareness101 yeah EXACTLY the same thing happened to me, the OOM propagates to all subsequent attempts, if you have an embedding that makes really sick results can you upload it somewhere? I'd be really keen to see one that is actually really good
@@lewingtonn Good is subjective but what I have found from my own creations is that what you train is not what you will get. I trained a cartoon for instance. Well, I don't get the characters, or even its style, BUT it does turn everything into cartoons. What I found is the sampler used on it is VERY critical. I also found there is a sweet spot on the cartoon one where it demands to be 18-20 but 18 is the best. Another I made demands to be 7. The best sampler I found for them is dpm2 fast. I tried other samplers and I couldn't find a sweet spot. I will say this that using it I have a hell of a time fixing eyes now in inpainting. Pretty much the majority of them I can't do or just inpaint the entire head.
It would've been more meaningful if you tried more aesthetic steps.
Buy a microphone dude
I literally have a good mic!!! it's the razer siren one...
@@lewingtonn you made a good video. It was hard to listen to. Poor sound quality. Work on it and the audience will pick up. Good presentation of the topic. I watched the whole video.
Nah, you're right, thanks for the heads up
"Millions" of possibilities? Oh dear... how many bits of entropy do you think exist in the noise seed? PER PROMPT there are probably quadrillions, quintillions, sextillions... or even more possible results.