How to use Aesthetic Gradients: Stable Diffusion Tutorial

Поделиться
HTML-код
  • Опубликовано: 3 июн 2024
  • A new paper "Personalizing Text-to-Image Generation via Aesthetic Gradients" was published which allows for the training of a special "aesthetic embedding" which allows the user to specify more clearly what they want to any existing stable diffusion model. In this tutorial we walk through how to train an aesthetic embedding, and use it to generate images.
    Discord: / discord
    00:00 - Summary
    01:07 - Paper Explanation
    06:51 - Webui Installation
    10:25 - Aesthetic Gradients Installation
    11:45 - Using Pretrained Embeddings
    21:50 - Training New Embedding
    29:40 - Comparing Embeddings
    34:39 - Experiment Outcomes
    ------- Requirements -------
    python 3.10
    Local Nvidia GPU
    CUDA 11.3+
    ------- Links -------
    Aesthetic Gradients Paper: arxiv.org/abs/2209.12330
    GIthub Desktop: desktop.github.com/
    Webui: github.com/AUTOMATIC1111/stab...
    Huggingface Stable Diffusion 1.4 Model: huggingface.co/CompVis/stable...
    Aesthetic Gradients Extension: github.com/AUTOMATIC1111/stab...
    Premade Aesthetic Embeddings: github.com/vicgalle/stable-di...
    CLIP search of LAION: rom1504.github.io/clip-retrie...
    Code for downloading LAION images: github.com/Lewington-pitsos/c...
    Script to download LAION: github.com/Lewington-pitsos/c...
    Webui thread on embeddings: github.com/AUTOMATIC1111/stab...
    Another webui thread on embeddings: github.com/AUTOMATIC1111/stab...
    Github Repo for original implementaion: github.com/vicgalle/stable-di...
    Useful Blog Post: metaphysic.ai/custom-styles-i...
    Github Repo containing the portrait embeddings: github.com/Lewington-pitsos/c...
    ------- Misc -------
    ffmpeg commands to extract images from videos:
    ffmpeg resize and crop video: ffmpeg -i ghib.mp4 -c:a copy -filter:v "scale=960:512,crop=iw-448:ih-0" smol-ghib.mp4
    ffmpeg extract images: ffmpeg -i smol-ghib.mp4 -r 0.3 -f image2 image-%4d.jpeg
    ------- Music -------
    Music from freetousemusic.com
    ‘Late Morning’ by ‘LuKremBo’: • (no copyright music) c...
    ‘Daily’ by ‘LuKremBo’: • (no copyright music) c...
    ‘Marshmallow’ by ‘LuKremBo’: • lukrembo - marshmallow...
    ‘Travel’ by ‘LuKremBo’: • lukrembo - travel (roy...
    ‘Sunset’ by ‘LuKremBo’: • (no copyright music) j...
    ‘Biscuit’ by ‘LuKremBo’: • (no copyright music) l...
    ‘Sunflower’ by ‘LuKremBo’: • lukrembo - sunflower (...
    ‘Chocolate’ by ‘LuKremBo’: • (no copyright music) l...
    ‘Branch’ by ‘LuKremBo’: • (no copyright music) c...
    ‘Rose’ by ‘LuKremBo’: • lukrembo - rose (royal...
    ‘Butter’ by LuKremBo: • lukrembo - butter (roy...
    ‘Onion’ by LuKremBo: • (no copyright music) l...
    ‘Animal Friends’ by LuKremBo: • lukrembo - animal frie...
    ‘Snow’ by LuKremBo: • lukrembo - snow (royal...
    ‘Affogato’ by LuKremBo: • lukrembo - affogato (r...
    Many thanks to LuKremBo
    #stablediffusion #aiart #tutorials #techtutorials #promptcrafting #install #installation #researchpaper
  • НаукаНаука

Комментарии • 99

  • @TheLifeOfRyanB
    @TheLifeOfRyanB Год назад +27

    The latest code from automatic1111 doesn't require you to call it model.ckpt, it gives you a drop down to select from all .ckpt files in that directory. So you can have 1.4 and 1.5 etc and just select between them.

    • @lewingtonn
      @lewingtonn  Год назад +3

      Thaaaaank you, I was suspecting that might be the case but hadn't had time to test it myself

    • @Sully365
      @Sully365 Год назад +1

      i FINALLY Got automatic1111 installed and noticed that it works well with teh inpaaint/1.5 ckpt and it's great. i was not sure if it made a diff between that ckpt and the one in the instructions

  • @jimdelsol1941
    @jimdelsol1941 Год назад +12

    25:45 Just in case you didn't know, two ways to avoid having to type "cd (folder url)" :
    1) Shift right click in the folder inside the file explorer --> Open Powershell here.
    2) Replace the url in the file explorer with "cmd" and press enter.
    Thanks for the tutorial.

    • @lewingtonn
      @lewingtonn  Год назад

      Oh damn, that second one is kind of cool thanks dude!

  • @matthewleiner4262
    @matthewleiner4262 Год назад +4

    The way I see it, it feels like the aesthetic embedding creation basically does a clip interrogation on the set of images you provide it.
    So if you give it 1000 "closeup photograph beautiful woman", it is the same as 20 "closeup photograph of beautiful woman" images. It gets the jist.
    And that's all the aesthetic does to mesh with the text embedding. It takes a (100% weight "hamburger") text and can't reconcile it with a (5% "painting of a cabin on a sunset horizon, 3% "painting of a sailboat in rough waves"...).
    But it obviously works very well when it sees the text "portrait of a woman" and an aesthetic of beautiful women. So it says, oh okay you want a portrait of a woman, but you want that woman to be beautiful

  • @henrischomacker6097
    @henrischomacker6097 Год назад +4

    Loved this tutorial, but your background sound drove me crazy!
    Please: Dont't even think about background sound! - We're not interested in that disturbing noise while listening to a topic where you have to concentrate.
    Again: Great video. I'm already hooked and am ordering parts for my new "numbercruncher" because of these new AI opportunities, also open for us.
    And I will definetely try this tomorrow! - It's soo cool ;-)

  • @bobdelul
    @bobdelul Год назад +5

    Awesome that sharing your failed experiments. You just saved me tons of time!

    • @lewingtonn
      @lewingtonn  Год назад +1

      That's why im here lol, to serve the hive mind

  • @unknownuser3000
    @unknownuser3000 Год назад +3

    Great video thank you I tried aesthetic images for a while psst few days and was trying to use the promot from the embedding etc. You helped me understand it a lot more thank you.

    • @lewingtonn
      @lewingtonn  Год назад +1

      if I can save anyone from suffering the same despair and turmoil as I did then it's all worth it

  • @Sully365
    @Sully365 Год назад +3

    This was absolutely fantastic. added the aesthetic model and i'm FINALLY getting outputs like i wanted which look more like midjourney.

    • @lewingtonn
      @lewingtonn  Год назад

      that's sick dude, pls link if you happen to upload any examples

    • @LM-zj7xp
      @LM-zj7xp Год назад

      That's awesome, did you create embeddings from images in MJ?

    • @Sully365
      @Sully365 Год назад

      @@LM-zj7xp i only got to use midjourney for a bit but i loved the style. With stable 1.5 and the sac8 aesthetic, i can actually get faces to look like faces with all the engagements. Its just so good and a lot faster than what I've been able to use until now

  • @swannschilling474
    @swannschilling474 Год назад +1

    Thank you so much for this one! 😊

  • @woszkar
    @woszkar Год назад

    Thank you! You solved my long running main problem!

  • @simonbronson
    @simonbronson Год назад +2

    Thanks, well explained and investigated 😁

  • @plasticpippo201
    @plasticpippo201 Год назад +1

    thanks for the great video! very informative

  • @crimsonmagenta6852
    @crimsonmagenta6852 Год назад +1

    You are wonderful to listen to 👏👏👏

  • @tomm5765
    @tomm5765 Год назад

    Great video! Very helpful ☺️ 👍

  • @filmmaster001
    @filmmaster001 7 месяцев назад

    Amazing work!

  • @EnricoRos
    @EnricoRos Год назад

    Thanks, very helpful.

  • @synthoelectro
    @synthoelectro Год назад +2

    This is the same thing that happened to me during the beta 2nd wave, when I typed murca and open shutter on a highway. I had both a woman and the highway and lights put together, and it was interesting looking. Using those weights appears to be what is happening in both cases.

  • @aaronsj80
    @aaronsj80 Год назад

    I've actually had success using it for a specific person's face. I trained an embedding on cartoonized pictures of my friend and it was able to make an embedding that generate faces that are strikingly similar to my friend, so now I can make cartoon avatars for my friend. If you think about it, the model should already know about different facial features, different nose shapes, eye angels, jawlines, etc. So it shouldn't be hard for it to combine those features to generate a specific face. I trained it for 15 hours on 20 images.

  • @MrDevidu
    @MrDevidu Год назад +1

    Thanks a lot man

  • @j.j.maverick9252
    @j.j.maverick9252 Год назад

    very helpful! liked and subbed. Looking forwards to the follow-up. I wondered whether it might be just build longer text prompts by using img-to-text on the aesthetic inputs… but I’ve played with that and it feels very weak currently, the prompts don’t seem like good ones, and feeding them back in as generation prompts doesn’t result in related images very often.

    • @lewingtonn
      @lewingtonn  Год назад +1

      I have always found img-to-text mostly useless, so I'm not super surprised

  • @salted_peanuts
    @salted_peanuts Год назад

    tyvm !

  • @Spaceisprettybig
    @Spaceisprettybig Год назад

    Hello, I need help. I've tried both this and a number of other tutorials but for some reason I don't get the Aesthetic Gradient window to appear in my Automatic1111. It's currently up to date with git pull, I tried opening 1.5 and 2.0 models, all other extensions are working just fine, and I've restarted it many times.

  • @unknownuser3000
    @unknownuser3000 Год назад +1

    any reason to not use flipped images for the aesthetic embeddings? Also Did you try the embedding on novelai ckpt or waifu diffusion? Prob more likely to get the ghibli embeddinfs work nicely. Also try AND operators for when you're having trouble with targeting colors.

    • @lewingtonn
      @lewingtonn  Год назад

      what's the upshot of using flipped images, like, just to have more images? Since all you need is 20, it doesn't seem like a huge benefit...

  • @Ivangarciafilho
    @Ivangarciafilho Год назад

    Heym any success downloading the json file? it looks like it's prbably "limiting" theaount of links to something around 60?

  • @FusionDeveloper
    @FusionDeveloper Год назад +1

    Thanks

  • @BartoszBielecki
    @BartoszBielecki Год назад +2

    How different it is from textual inversion that tries to create a matching embedding basing on the input images and the fixed model? By the way - in case of Dreambooth: isn't them U-net left untouched and only the text generator is modified to properly understand your new embeddings?

    • @lewingtonn
      @lewingtonn  Год назад

      I'm pretty certain that in dreambooth both are modified, but I'm no expert! Textual inversion is a bit meh as far as I'm aware. I don't trust it with a 10 foot pole

  • @devnull_
    @devnull_ Год назад

    Thanks!

  • @arixerchan3807
    @arixerchan3807 Год назад +1

    hi, i found yours has a "Image Browser" tab in the webui , but i found mine was missed since i done a git pull today! do you know how to revert to that version?

    • @lewingtonn
      @lewingtonn  Год назад

      I ended up at commit "696cb33", "after initial launch, disable --autolaunch for subsequent restarts". You can revert to that using github desktop... I imagine they had a good reason for changing the way image browsing worked though.

  • @TheGreatBizarro
    @TheGreatBizarro Год назад +2

    Aesthetic Gradients are now an extension. You can install it using git:

  • @keisaboru1155
    @keisaboru1155 Год назад

    yoo this made a HUGE DIFFERENCE FOR ME AHAHAH XD thats amazing ,

  • @decoryder
    @decoryder Год назад +3

    Appreciate the hard work, thanks very much!

  • @lunaticinorbit
    @lunaticinorbit Год назад +1

    great tutorial, thanks.
    i am getting *ModuleNotFoundError: No module named 'requests'* error from clip download script, what may the issue be?

    • @lewingtonn
      @lewingtonn  Год назад

      yo! you probably have the wrong version of python, check out this stackoverflow for more: stackoverflow.com/questions/17309288/importerror-no-module-named-requests

  • @TheCopernicus1
    @TheCopernicus1 Год назад +1

    Bloody legend!

    • @lewingtonn
      @lewingtonn  Год назад +1

      spoken like a fellow OZZY, thanks mate!

    • @TheCopernicus1
      @TheCopernicus1 Год назад +1

      @@lewingtonn haha and a happy one at that! Great material as always I will give it a try if I can get it working in a Colab!

  • @razvanab
    @razvanab 9 месяцев назад

    Does this still work today in a1111 ?
    I try different settings and nothing changes.

  • @audiogus2651
    @audiogus2651 Год назад

    Bailed at 'they barely even do' so thanks for the heads up!

    • @lewingtonn
      @lewingtonn  Год назад

      that is very fair my dude, I thought I'd give everyone some warning

  • @cryptotastic8167
    @cryptotastic8167 Год назад +1

    great video, lots of info to digest, thanks...

  • @lilowhitney8614
    @lilowhitney8614 Год назад +1

    That's interesting, if a bit situational. I was hoping it could be used to created a consistent style for the output but it seems not.

  • @bonfire4254
    @bonfire4254 Год назад

    Does original checkpoint influence on training VAE? For example if I select waifu diffusion model or robo diffusionor something else

    • @lewingtonn
      @lewingtonn  Год назад +2

      glad you asked BUCKO, yes, a lot

  • @earthequalsmissingcurvesqu9359

    maybe get a link on that fine portrait embedding ? That be super nice. thank you for this awesome explanation.

    • @lewingtonn
      @lewingtonn  Год назад +2

      good point, I just uploaded them (link in description). Kinda sad that the 20 image one was as good as the 5000 image one tbh

  • @phantomhydra
    @phantomhydra 3 месяца назад

    What exactly does learning rate do?

  • @ceaselessvibing5997
    @ceaselessvibing5997 Год назад +2

    Another fine addition to my collection :)

    • @lewingtonn
      @lewingtonn  Год назад +1

      haha are you training an AI on my videos??

    • @ceaselessvibing5997
      @ceaselessvibing5997 Год назад +2

      Our analyst's guarantee that all responses are genuine, we can assure you are not being harassed by bots.

    • @lewingtonn
      @lewingtonn  Год назад +1

      @@ceaselessvibing5997 lol

  • @desu38
    @desu38 Год назад

    Wait, so it's basically image prompts? Cause variation generation would be pretty nice.

  • @titushora8147
    @titushora8147 Год назад

    Very interesting tutorial. Thank you.
    I have tried to use the aivazovski embedding, and I get this error message:
    AttributeError: 'FrozenOpenCLIPEmbedderWithCustomWords' object has no attribute 'tokenizer'
    How can I fix that?
    I also get an error when trying to create an aesthetic embedding.
    Thank you for your time and help.

    • @real_snl
      @real_snl Год назад

      im having same issue, did you fix it?

    • @titushora8147
      @titushora8147 Год назад

      @@real_snl No, I did not.

  • @lievengouwy1972
    @lievengouwy1972 Год назад

    hey could somebody help me out? I m getting: AttributeError: 'NoneType' object has no attribute 'T'

  • @p_p
    @p_p Год назад

    37:03 that probably happened for those lights lines (second and fifth image)

  • @slashkeyAI
    @slashkeyAI Год назад

    wish i could like twice

  • @kuromiLayfe
    @kuromiLayfe Год назад +4

    feel aesthetic gradients is more to push the prompt towards a pre trained subject instead of a style … if you train the aesthetic embed on comic pages it will generate comic pages and not the comic style.
    if it is trained on images with lots of dogs … it will generate dogs even if you asked for a cat

    • @lewingtonn
      @lewingtonn  Год назад +1

      but it still never gives good hands :'(

    • @LM-zj7xp
      @LM-zj7xp Год назад

      @@lewingtonn That may be a fault with the model/training, not necessarily in the communication with it.

  • @stefanleithner6922
    @stefanleithner6922 Год назад

    I don't get it. So I have Stable Diffusion, and I downloaded the code from github. But where do I have to locate the aestetic gradient folder? In my stable diffusion folder? there is no tab called "Extensions" There. Help please? :D

    • @lewingtonn
      @lewingtonn  Год назад +1

      i just checked the repo to make sure, and there IS an "extensions" folder in the repo. as long as you are using the latest version of the stable diffusion webui, the extensions folder WILL be there
      github.com/AUTOMATIC1111/stable-diffusion-webui

    • @stefanleithner6922
      @stefanleithner6922 Год назад

      @@lewingtonn hey thanks! I had to update my stable diffusion, even tho I installed it just a few days ago, probably I didn't get the newest verison!

    • @stefanleithner6922
      @stefanleithner6922 Год назад

      @@lewingtonn subscribed and liked, great service and fast answer thank youuuu!

  • @FilmFactry
    @FilmFactry Год назад +1

    How far is this from how the main 1.5 model was trained. I don't think we are there yet. But for practical uses I would like to curate a set of architectural images, and be able to generate more predictable results that can be used for business purposed.

    • @lewingtonn
      @lewingtonn  Год назад

      this isn't the way to do that. You'd be looking at dreambooth, but even then actually selling it would be hard af

  • @Aesio92
    @Aesio92 Год назад

    Thank you
    And does it have to be with 5000? can't i just do it with 100? I don't want my computer to explode :(
    Pd: now i saw that you did explain that, but i hadn't reached that minute yet hahah, thanks again

  • @cateyefotos
    @cateyefotos Год назад +1

    Awesome run-through, Thanks! By the way, your clip-download link set to private on github??

    • @lewingtonn
      @lewingtonn  Год назад

      OOOOOPS!!!! Lemmie fix that real quick, thank

    • @cateyefotos
      @cateyefotos Год назад +1

      @@lewingtonn Thank you ! This is CLIP downloader is a game changer for experimentation. I do run into an error from line 2 of download.py. ModuleNotFoundError: no module named 'requests' . Is there a requirement that the requirements.txt didn't express, perhaps?

    • @cateyefotos
      @cateyefotos Год назад

      Follow up - I needed to pip install 'requests' -- heads up that this might need to be in some folks' requirements. Script working!

  • @Because_Reasons
    @Because_Reasons Год назад

    I honestly never had much luck with them either.

  • @dan323609
    @dan323609 Год назад

    Shift + right click on folder and no need to "cd" )

  • @AnOmegastick
    @AnOmegastick Год назад

    If you try your anime embeddings on Waifu Diffusion or the leaked model-that-shall-not-be-named, they'll probably look much better.

  • @dan323609
    @dan323609 Год назад

    Ivazonsky? )

  • @hardkur
    @hardkur Год назад

    why u lower batch size to 128?

  • @stephenkamenar
    @stephenkamenar Год назад +2

    wow, i'm surprised how worthless this feature is. are we sure it's implemented properly? simply adding a few words to the prompt would have a better effect (like you said)

    • @lewingtonn
      @lewingtonn  Год назад +2

      I think that's a bit of a harsh view... I can definitely see cases where downloading 8 images and creating an embedding would be quicker and easier than iterating on different prompts... but yeah it's not world breaking by any means

    • @generalawareness101
      @generalawareness101 Год назад +2

      @@lewingtonn Not world breaking BUT the ones they give you are FANTASTIC and damn drastic changes. My problem is with 192 images I sure couldn't do what they did. Oh, btw, the more images you do I did find it the end result was better. Did a txt2img and his head stopped being a lightbulb (was blue and glowed) and more of a head with 192 images. I just wonder if all this shit needs 24 gigs of ram? Oh, another issue is that there is absolutely no vram conservations implemented into this NOW script. If you get an OOM for the gpu then ctrl-c it and start all over because you will always get it even if you have only 1 image as happened to me.

    • @lewingtonn
      @lewingtonn  Год назад

      @@generalawareness101 yeah EXACTLY the same thing happened to me, the OOM propagates to all subsequent attempts, if you have an embedding that makes really sick results can you upload it somewhere? I'd be really keen to see one that is actually really good

    • @generalawareness101
      @generalawareness101 Год назад +1

      @@lewingtonn Good is subjective but what I have found from my own creations is that what you train is not what you will get. I trained a cartoon for instance. Well, I don't get the characters, or even its style, BUT it does turn everything into cartoons. What I found is the sampler used on it is VERY critical. I also found there is a sweet spot on the cartoon one where it demands to be 18-20 but 18 is the best. Another I made demands to be 7. The best sampler I found for them is dpm2 fast. I tried other samplers and I couldn't find a sweet spot. I will say this that using it I have a hell of a time fixing eyes now in inpainting. Pretty much the majority of them I can't do or just inpaint the entire head.

  • @guzu672
    @guzu672 Год назад

    It would've been more meaningful if you tried more aesthetic steps.

  • @StgKross
    @StgKross Год назад +3

    Buy a microphone dude

    • @lewingtonn
      @lewingtonn  Год назад

      I literally have a good mic!!! it's the razer siren one...

    • @StgKross
      @StgKross Год назад +3

      @@lewingtonn you made a good video. It was hard to listen to. Poor sound quality. Work on it and the audience will pick up. Good presentation of the topic. I watched the whole video.

    • @lewingtonn
      @lewingtonn  Год назад

      Nah, you're right, thanks for the heads up

  • @Chris-xo2rq
    @Chris-xo2rq Год назад +1

    "Millions" of possibilities? Oh dear... how many bits of entropy do you think exist in the noise seed? PER PROMPT there are probably quadrillions, quintillions, sextillions... or even more possible results.