Do THIS to speed up SDXL image generation by 10x+ in A1111! Must see trick for smaller VRAM GPUs!

Поделиться
HTML-код
  • Опубликовано: 10 июн 2024
  • #SDXL, #automatic1111, #stablediffusiontutorial,
    Is your SDXL 1.0 crawling at a snails pace? Make this one change to speed up your SDXL image generation by 10x or more in Automatic 1111! I have a 8GB 3060Ti GPU and it was unbearably slow, until I put in this command line argument, and it sped up my image generation with the base model plus refiner by 10x to 14x for a single image generation. It's actually very simple. In this video tutorial, I will show you how to speed up your image generation, and how to install and use the Refiner model within Txt2Img without manually switching models.
    UPDATE: With the release of A1111 webUI version 1.6.0, there is a new command line argument that we can use: '--medvram-sdxl' instead of '--medvram'. '--medvram-sdxl' will only enable model optimization for SDXL models. This is nice because if you wanted to generate images with SD v1.5 models and do not need to enable '--medvram', you won't need to manually edit your webui-user.bat file (having '--medvram' enabled when you don't need it tends to slow down your image generation). I hope this helps, cheers!
    If you have not installed SDXL 1.0 base and refiner models, watch my previous video for a step-by-step guide • How to Install and Use...
    Chapters:
    00:00 Intro
    00:38 Add the magic words to 'webUI-user.bat' file
    01:48 Image generation test with SDXL1.0
    02:05 Only 41 seconds for a single image (11x faster)
    02:20 Trying '--lowvram'
    03:00 Does this negatively impact my Image Quality?
    03:30 Comparing images generated with medvram ON or OFF
    04:25 Install Refiner Extension
    05:00 How to use Refiner Extension
    06:07 Refiner works better with less steps
    06:32 Images generated in SDXL 1.0 with the Refiner Extension
    Useful links
    A1111 Command Line Arguments:
    github.com/AUTOMATIC1111/stab...
    Refiner extension:
    github.com/wcde/sd-webui-refiner
    sdxl-vae-fp16-fix on Hugging Face:
    huggingface.co/madebyollin/sd...
    The VAE fixes an issue where you will get an entirely black image when decoding in float16 precision.
    **If you enjoy my videos, consider supporting me on Ko-fi**
    ko-fi.com/keyboardalchemist

Комментарии • 81

  • @KeyboardAlchemist
    @KeyboardAlchemist  9 месяцев назад +24

    UPDATE: With the release of A1111 webUI version 1.6.0, there is a new command line argument that we can use: '--medvram-sdxl' instead of '--medvram'. '--medvram-sdxl' will only enable model optimization for SDXL models. This is nice because if you wanted to generate images with SD v1.5 models and do not need to enable '--medvram', you won't need to manually edit your webui-user.bat file (having '--medvram' enabled when you don't need it tends to slow down your image generation). I hope this helps, cheers!

  • @lordsirmoist1594
    @lordsirmoist1594 9 месяцев назад +3

    That refiner extension will be amazing, also love the generated images at the end

  • @DrAmro
    @DrAmro 10 месяцев назад +3

    Keep up the good work, you're really good in explaining everything rather than many out there with thousands of subscribers, you are the best bro 👍

    • @KeyboardAlchemist
      @KeyboardAlchemist  10 месяцев назад +1

      Thank you for your kind words! I appreciated it!

    • @DrAmro
      @DrAmro 10 месяцев назад +1

      ❤@@KeyboardAlchemist

  • @Nrek_AI
    @Nrek_AI 8 месяцев назад +1

    Thank you so much for this dude... your content has been truly valuable for the community

  • @rexs2185
    @rexs2185 10 месяцев назад +1

    Great content! Thank you for sharing this tip!

  • @chrisfox961
    @chrisfox961 10 месяцев назад +1

    Thank you for these great tips!

  • @ehsanrt
    @ehsanrt 10 месяцев назад +2

    hi, just here for moral support, you doing amazing

  • @3diva01
    @3diva01 10 месяцев назад +2

    Thank you SO MUCH! This helped me a LOT! It brought the image generation time down from around 5 minutes to about 2 1/2 minutes. So cut my render time in half. THANK YOU!
    I'm hoping that someone will eventually find a way to get SDXL to run as fast as SD 1.5 when it comes to image generation times. The same size image that renders in about 2.5 minutes in SDXL renders in about 45 seconds in 1.5. It's a pretty big time increase and so far the images, at least in A1111, don't seem to be as high quality as a lot of the SD 1.5 models. I'm sure the community will eventually make amazing models for SDXL, we just have to be patient. I have to remember that when SD 1.5 first launched the images it produced weren't great eather. lol
    Sorry for the rant! Thank you again for the great video! SO HELPFUL!
    Edit- It seems lowering the Steps makes a much bigger impact on render time with SDXL. I find that lowering the steps by quite a bit REALLY helps speed things up. I'm sure there's probably a loss in image quality doing that, but I'll have to do some testing with the same seed to see.

    • @KeyboardAlchemist
      @KeyboardAlchemist  10 месяцев назад +1

      Thank you for sharing your experience! I'm so glad this video helped you cut down your render time! Yeah, I'm sure A1111 will provide optimization fixes very soon. And like you said, it's still very early for SDXL. There are definitely going to be a LOT more fine tuned models in the near future that will make things easier. Also, thanks for sharing the tip on lowering Steps. Cheers!

  • @mrBrownstoneist
    @mrBrownstoneist 9 месяцев назад +6

    adding "--medvram" is slower in sd1.5 . Try add "--medvram-sdxl" that only enables medvram on sdxl model only.

  • @testtest-bb2dt
    @testtest-bb2dt 10 месяцев назад +1

    Thank you so much!!

  • @aiart7702
    @aiart7702 9 месяцев назад +1

    Legend!!

  • @shaolinmonk1537
    @shaolinmonk1537 4 месяца назад +1

    Works awesome, thanks

  • @mada_faka
    @mada_faka 8 месяцев назад +1

    THANKS FOR THIS TUTORIAL, SUBSSS

  • @KeyboardAlchemist
    @KeyboardAlchemist  10 месяцев назад +3

    Did this trick help you speed up your image generation with SDXL 1.0? I hope it did. Feel free to comment below. I would love to hear your success stories.

    • @tobinrysenga1894
      @tobinrysenga1894 10 месяцев назад +1

      I've never gotten xl to work. I have enough of vram but still battling random errors. May just reinstall.

    • @KeyboardAlchemist
      @KeyboardAlchemist  10 месяцев назад

      @@tobinrysenga1894 When all else fails, a fresh install is probably the way to go. I hope you get SDXL to work for you soon.

    • @tobinrysenga1894
      @tobinrysenga1894 10 месяцев назад +1

      @@KeyboardAlchemist Yay the reinstall worked - well I also noticed that I had multiple versions of python installed for some reason which could have also been hurting me. Too much playing with AI on the same computer...

    • @KeyboardAlchemist
      @KeyboardAlchemist  9 месяцев назад

      @@tobinrysenga1894 Haha, I'm glad to hear that it worked out! 🙂

    • @tripleheadedmonkey6613
      @tripleheadedmonkey6613 9 месяцев назад

      @@tobinrysenga1894 You're supposed to install the dependencies for each AI in their own virtual environments. Which keeps them contained within the install folder for the AI webUI you are using, instead of replacing the main windows files.

  • @meko264
    @meko264 9 месяцев назад

    Doesn't Stable Diffusion use the Tensor cores when in half precision mode?

  • @akarshrao47
    @akarshrao47 10 месяцев назад

    Hey, it might be out of context, but clicking on 'Load Available' in extensions gives me this error: URLError:

  • @prixmalcollects9332
    @prixmalcollects9332 4 месяца назад

    Hi, so I somewhat fixed it, but I ran into another issue that generation with LORAS is pretty slow, and generation without it is so fast, do you have a solution to that? Thank you

  • @canaldetestes4517
    @canaldetestes4517 9 месяцев назад +1

    Hi, first thank you very much for sharing this tip with us. Question do you think that this hack can speed others rendering in 1111? I ask ask because I have Sadtalker 0.01 installed in it, and as I have a Nvidia 970 with 4gb, take too long to render the character talk, it's took almost 4 hours to render a 4 minutes character talk

    • @KeyboardAlchemist
      @KeyboardAlchemist  9 месяцев назад +1

      You're very welcome! Thank you for the question, unfortunately, I don't have experience with SadTalker. But just to take an educated guess, you can try using '--lowvram' and '--xformers' together to see if it will help. Best of luck!

    • @canaldetestes4517
      @canaldetestes4517 9 месяцев назад

      @@KeyboardAlchemist Hi, thank you for your attention and answer, let's do one thing I will do what you said and later I will come back here and write the result.

  • @streamdungeon5166
    @streamdungeon5166 10 месяцев назад +1

    Tried the same with a Pinokio install of SD XL 1.0 on my laptop with 16GB DDR5 RAM 4800 and a mobile RTX 3050 Ti (4GB+8GB shared). Without your setting an image at 512x512px takes about 1min and is rendered step by step at about equal intervals. With your setting, the steps get slower and the final step freezes the entire machine almost completely and takes an extra 2-3min. Just to let you know this is not a universally good setting for low VRAM GPUs. Might be because mobile GPUs can use shared RAM anyway? Or because my laptop uses DDR5?

    • @KeyboardAlchemist
      @KeyboardAlchemist  10 месяцев назад

      Thank you for sharing your experience! I'm very surprised that you were able to run SDXL with just 4GBs of dedicated VRAM. Did you have to tweak any other settings? Maybe it is because your mobile GPU is making good use of the shared memory. But you make a great point in that if a rig has more than 8GBs of VRAM, putting in '--medvram' might actually be detrimental to the image generation speed, and this command is by no means a magic bullet. It's a tool to help those who has less VRAM and will need to be tested depending on the individual's PC and situation.

    • @streamdungeon5166
      @streamdungeon5166 10 месяцев назад +1

      @@KeyboardAlchemist I used the SDXL 1.0 install that is default for Pinokio with no changes (other than your parameter for comparison and then again without):
      set COMMANDLINE_ARGS=--no-download-sd-model --xformers --no-half-vae --api

  • @MadazzaMusik
    @MadazzaMusik 10 месяцев назад +1

    Would i get better times with this command if you had a 24gb card

    • @KeyboardAlchemist
      @KeyboardAlchemist  10 месяцев назад

      If you are not using '--xformers', that command may help you increase speed. But '--medvram' or '--lowvram' is just for machines with medium or low VRAM. In your case, it will probably make things slower for you.

  • @pavi013
    @pavi013 5 месяцев назад

    I have 4gb and medvram works fine.

  • @kuvjason7236
    @kuvjason7236 6 месяцев назад

    I got RTX 3060TI. Before I was going 30s/image. Now I am going 17s/image. Great improvment but I have seen people pump out images with 2s/image. I need this type of speed.

  • @FailedMaster
    @FailedMaster 8 месяцев назад +1

    You are amazing. Holy shit does this work good. I'm generating images in about 30 seconds now on a RTX 3070. Took me about 4 minutes before. This is definitely a game changer, thank you!
    Edit: For some reason it doesn't work sometimes. Restarting the WebUi fixes it, but if I create a few pictures in a row the "magic words" seem to be ignored. Any idea why that could be?

    • @KeyboardAlchemist
      @KeyboardAlchemist  8 месяцев назад

      I'm glad to hear this helped you generate images faster in SDXL! Regarding your question, I'm not sure why that would happen. Here is a random guess. It is possible that older versions of A1111 has memory leaks somewhere and you are building up some memory leaks as you generate more images because restarting the webUI fixes your issue. Are you using webUI v1.6.0? If not, it might be worth it to update the webUI version.

    • @FailedMaster
      @FailedMaster 8 месяцев назад +1

      @@KeyboardAlchemist I am using the newest version, but maybe you're right. Could just be a bug. Well, doesn't matter that much, since restarting and generating is still fast as hell.

    • @KeyboardAlchemist
      @KeyboardAlchemist  8 месяцев назад

      @@FailedMaster Cool beans, happy creating!

  • @user-yf4fh6bd7t
    @user-yf4fh6bd7t 8 месяцев назад +1

    I just make everithing as u explain on the video but my images on sdxl takes about 8 to 10 minutes to render, idk what can i do :( i have a RTX 3070 8GB

    • @KeyboardAlchemist
      @KeyboardAlchemist  8 месяцев назад

      There could be many different factors slowing down your image generation. Here are some ideas to try, (1) turn off your extensions and turn them back on one-by-one and test your image gen speed, as some extensions can cause problems, (2) update your Nvidia driver to the latest version or roll the driver back to an older version (try ver 536.67), (3) use an app like CPU-Z to see what else might be taking up your VRAM in the background. Best of luck!

  • @hairy7653
    @hairy7653 10 месяцев назад +2

    My issue is not vram but ram...at 16gb loading models is slow and stalls my pc sometimes///what you got?

    • @3diva01
      @3diva01 10 месяцев назад +2

      Yeah, loading both the SDXL main model and then the refiner takes AGES to load in AUTOMATIC1111. After that then the first image generation ALSO TAKES AGES. But after the first image generation of the session it speeds up a lot more after that, in my experience. On my machine it takes about 15-20 minutes to load the SDXL models (main model + refiner model) and then render the first image. So I always set it up to render the first image and then go make coffee before I can get started with my image generation session.
      A few things to note in my experimenting so far: The sampling method DPM++ 2M SDE Karras seems to render images pretty fast with SDXL and creates decent looking images even at lower sample steps (14-35 steps). The "refiner" can make things look super ugly super quickly. I recommend not putting it above 3 steps, particularly if you're using a low step number with the main model.

    • @KeyboardAlchemist
      @KeyboardAlchemist  10 месяцев назад +1

      If you have VRAM to spare, then use '- -lowram' (no space between the dashes of course). This will load your model/checkpoint weights to VRAM instead of RAM. Hope this helps!
      Alternatively, RAM is cheaper compared to VRAM, you can just get a couple of new sticks and plug them in (this would be a better solution actually).

    • @hairy7653
      @hairy7653 10 месяцев назад +1

      @@KeyboardAlchemist Oh wow, im gonna try it, thanks

    • @carlinite
      @carlinite 9 месяцев назад +2

      i couldn't load sdxl with 16gb of ram, i threw an older 16gb stick in for 24 total and now it's all smooth, loads in about 30 seconds

    • @hairy7653
      @hairy7653 9 месяцев назад +1

      @@carlinite same here, i upgraded to 32 and now works fine!

  • @darkjanissary5718
    @darkjanissary5718 7 месяцев назад +1

    On my 3070 ti, using exactly same prompt and settings on A1111 1.6.0 with medvram, it takes 5 mins to complete. How can you render it in 41 seconds??? Do you use any other extension or something?

    • @KeyboardAlchemist
      @KeyboardAlchemist  7 месяцев назад

      Hello, thanks for watching! Here are the parameters that I use in COMMANDLINE_ARGS: --medvram-sdxl --no-half-vae --xformers. Before --medvram-sdxl was available in the webUI, I used --medvram, which does the same thing.

    • @darkjanissary5718
      @darkjanissary5718 7 месяцев назад

      @@KeyboardAlchemist Yeah I fixed it, the problem was the nvidia driver. I upgraded to the latest 545.84 which supports tensorRT and reinstalled A1111 from scratch. Now it is normal and takes 15 seconds to render 1024x1024 image on SDXL

    • @KeyboardAlchemist
      @KeyboardAlchemist  7 месяцев назад

      @@darkjanissary5718 That's great to hear! I did not think it could be a driver issue. Just out of curiosity, what version of the driver were you using before the update?

    • @darkjanissary5718
      @darkjanissary5718 7 месяцев назад

      @@KeyboardAlchemist it was 537

    • @prixmalcollects9332
      @prixmalcollects9332 4 месяца назад

      @@darkjanissary5718 hi how did you reinstall A1111? is there any quick way? I don't wanna lose my models.. (sorry very noob)

  • @Woolfio
    @Woolfio 9 месяцев назад

    I am using amd's 6750xt with 12gb and i cannot do 1024x1024, while others with nvidia 8gb can do 1024 and even 2x upscale. I regret getting amd's gpu so much.

    • @alt666
      @alt666 9 месяцев назад

      Sure you may not get all the ai goodness but hey gaming is way better on the amd side for the price

    • @Woolfio
      @Woolfio 9 месяцев назад

      @@alt666 Why is it way better? It is just cheaper to get same performance.

  • @sairampv1
    @sairampv1 9 месяцев назад +1

    i have a laptop with 3050ti with 4gb vram is it possible to produce images on my laptop?

    • @KeyboardAlchemist
      @KeyboardAlchemist  9 месяцев назад

      Yes, you absolutely can produce images on a 3050Ti with 4GBs of VRAM. BUT it will likely be very slow if you are going to use the SDXL model. I would recommend to do image generation using SD v1.5 fine-tuned models. Fine-tuned v1.5 models will give you great results and the generation speeds are a LOT faster than SDXL.

    • @sairampv1
      @sairampv1 9 месяцев назад

      @@KeyboardAlchemist i read linux based computers give higher it/sec is it a good idea to download a vm and try sd1.5 on that ?

  • @DARKNESSMANZ
    @DARKNESSMANZ 9 месяцев назад

    It went to 1 hour .. my graphic card is 1080 32 gb ram.. vlad diffusion is much faster then SD 1.6.0. but does not support SDXL

  • @guruabyss
    @guruabyss 9 месяцев назад +2

    Wouldn't this make image generation even faster if you did that trick with a 4090?

    • @KeyboardAlchemist
      @KeyboardAlchemist  9 месяцев назад +1

      If you are not using '--xformers' already, adding this command may help you increase speed. But '--medvram' or '--lowvram' are for GPUs with medium or low VRAM; adding these to a 4090 will likely slow it down. A 4090 already have enough VRAM to handle anything SDXL throws at it.

  • @maxstepaniuk4355
    @maxstepaniuk4355 7 месяцев назад +1

    SDXL should work better with --opt-sdp-attention instead of xformers. It takes ~ 11 sec on 3080ti with --opt-sdp-attention --no-half-vae only

    • @KeyboardAlchemist
      @KeyboardAlchemist  7 месяцев назад

      Hello, thanks for the tip! The point I was making in the video is that for GPUs with less VRAM, it is more about the memory optimization with '--medvram' that helps to speed things up by keeping all the processing within your GPU. I have tested both '--xformers' and '--opt-sdp-attention' previously, and on my 8GB 3060ti, both commands generated a 1024x1024 image in about 7 minutes, but adding '--medvram-sdxl' resulted in the image generation completing in about 28 seconds for both commands. I'm guessing your 3080ti GPU has 12GBs of VRAM, so you don't have to use '--medvram-sdxl' when running SDXL, and maybe in that case '--opt-sdp-attention' edges out '--xformers', but based on what I can test, both of these cross-attention optimization commands provide similar results. I would be interested in knowing how long it takes you to generate the same image using '--xformers' versus '--opt-sdp-attention'. Thanks for watching!

    • @prixmalcollects9332
      @prixmalcollects9332 4 месяца назад

      hi, can you share your commandline? I don't know how to write this

    • @prixmalcollects9332
      @prixmalcollects9332 4 месяца назад

      I tried this and it is still slow T_T

  • @Samogub
    @Samogub 7 месяцев назад +1

    ths

  • @earm5779
    @earm5779 9 месяцев назад +1

    what about 4 gb vram? does it work?

    • @KeyboardAlchemist
      @KeyboardAlchemist  9 месяцев назад

      Try using '--lowvram' and '--xfromers' together. Hope it helps!

    • @earm5779
      @earm5779 9 месяцев назад

      @@KeyboardAlchemist tried already but not working. If I use Fooocus it works

  • @neetcoomer
    @neetcoomer 2 месяца назад +1

    It works