ELLA - A Powerful Adapter for Complex Stable Diffusion Prompts

Поделиться
HTML-код
  • Опубликовано: 10 апр 2024
  • Diffusion models have made incredible strides in text-to-image generation, but they still struggle with dense, complex prompts that involve multiple objects, detailed attributes, and intricate relationships.
    Enter ELLA - the Efficient Large Language Model Adapter that's poised to revolutionize how diffusion models handle sophisticated prompts. This ingenious adapter equips text-to-image models with the power of large language models, without the need to train either the U-Net or the language model itself.
    In this video, I dive into using ELLA in ComfyUI, and explore how it tackles the limitations of current text encoders like CLIP. Prepare to be amazed as Ishowcase the superior performance of ELLA compared to CLIP conditioning, unlocking a new level of sophistication in text-to-image generation. If you've ever struggled to craft the perfect prompt, this video is a must-watch!
    (* oops! Was playing with scheduling in the video and so forgot to switch back to sgm unfiform, but it really doesn’t make much difference. It also turns out they are no longer going to release the SDXL weights.)
    However! For SDXL, check this video out!
    Pixart Sigma - Like ELLA but for SDXL+ Resolutions in ComfyUI!
    • Pixart Sigma - Like EL...
    Want to support the channel?
    / nerdyrodent
    Links:
    ella-diffusion.github.io/
    github.com/TencentQQGYLab/ELLA
    github.com/ExponentialML/Comf...
    huggingface.co/QQGYLab/ELLA/b...
    huggingface.co/google/flan-t5...
    huggingface.co/Kijai/flan-t5-...
    See also: github.com/TencentQQGYLab/Com...
    == More Stable Diffusion Related Stuff! ==
    * Installing Anaconda for MS Windows Beginners - • Anaconda - Python Inst...
    * Installing ComfyUI - • How to Install ComfyUI...
    * ComfyUI Workflow Creation Essentials For Beginners - • ComfyUI Workflow Creat...
    * Faster Stable Diffusions with the LCM LoRA - • LCM LoRA = Speedy Stab...
    * Make an Animated, Talking Avatar - • Create your own animat...
    * One Image Gets You a Consistent Character in ANY pose - • Reposer = Consistent S...
  • НаукаНаука

Комментарии • 75

  • @Cingku
    @Cingku Месяц назад +14

    Finally...I don't need SD3 anymore if this is the case. Who needs to download more SD3 models when this is doing prompt adherence that well...because my disk space is suffering these days.. :D

    • @Cingku
      @Cingku Месяц назад

      I took that back. It seems that I need SD3 after all. I cannot get the style I want because of limitation in prompting (like nerdy said in the video). And the prompt adherence is very well only with the raw prompt without including styling prompt. So it is basically useless.

    • @nowshinnur
      @nowshinnur Месяц назад

      @@Cingku plus they don't release sdxl

    • @hphector6
      @hphector6 Месяц назад

      SD3 has a wayyy better VAE too. That's the main thing I'm looking forward to

    • @user-cz3io5tg5l
      @user-cz3io5tg5l 29 дней назад

      I see small improvements with ELLA but it is not even close to sdxl examples on their github page which they wont release :(
      edit: oh wait, seems like I used outdated extension, Will try again
      edit: still very tiny improvement

  • @kofteburger
    @kofteburger Месяц назад +7

    I've been looking forward to this.

  • @MrSporf
    @MrSporf Месяц назад +6

    That is quite the improvement!

  • @Pauluz_The_Web_Gnome
    @Pauluz_The_Web_Gnome Месяц назад +3

    Thanks man for the flow, I am one of your Patreon's now! 😀

  • @Remianr
    @Remianr Месяц назад +4

    6:38 Love your sense of humor, Nerdy Rodent :D (pretty sure I would have made similar joke either lol).
    Also it's amazing that such a simple additional tool on top of the already known one, can make such a huge difference and it shows that these models are actually more powerful that we thought they are. Really impressive new nerdy tech news as of recent 3-6 months to me :)

    • @NerdyRodent
      @NerdyRodent  Месяц назад +1

      I can’t wait to see what we get two papers down the line!

  • @Pending22
    @Pending22 Месяц назад

    Epic!! Brilliant tutorial as always, thanks Nerdy :)

  • @LIMBICNATIONARTIST
    @LIMBICNATIONARTIST Месяц назад

    Absolutely amazing!

  • @godpunisher
    @godpunisher Месяц назад +2

    Nerdy's always on time 👍

  • @GyattGPT
    @GyattGPT Месяц назад +1

    This really seems to be getting closer to the ability of the private imagegen models to follow the prompt better.

  • @marschantescorcio1778
    @marschantescorcio1778 Месяц назад

    Thank goodness I can write my mini-novel prompts! This is quite a game changer.

  • @kenmillionx
    @kenmillionx Месяц назад

    Cool video. Much love ❤❤❤❤. Cool video. Am waiting for next video 😊😊😊😊

  • @southcoastinventors6583
    @southcoastinventors6583 Месяц назад

    How well does it output text and does it work with artist name like art by Picasso ? Also can it run with 8GB of Vram like normal SDXL models can ? Thanks for the great video, it like getting a sneak peak at SD3 capability.

  • @testales
    @testales Месяц назад

    Very impressive, I need to try this out as soon as possible! :) What's the sigma node doing and what kind of impact does it have? Should it use the same sample and scheduler as the KSampler?

  • @SLAMINGKICKS
    @SLAMINGKICKS 28 дней назад

    perfect, just got your patron too

  • @DrMattPhillips
    @DrMattPhillips Месяц назад +3

    I installed all the T5 files linked but keep getting a "T5Tokenizer requires the SentencePiece library but it was not found in your environment." error. I re-downloaded the spiece.model file which I assume is the file in question but still get the error. It could be something I'm doing since I'm new to comfy (still learning how to navigate it).
    Edit: managed to fix it somehow, installed and reinstalled ella, also pip installed sentencepiece and installed the other ELLA add-on in comfy, so if anyone has a similar issue one of those might help (can't be more specific as I genuinely have no idea why it happened or how it was fixed)

    • @stevenkosin6965
      @stevenkosin6965 Месяц назад +1

      I wonder if the SentencePiece Library has to be installed prior to installing the customNode? I am having the same issue but didn't try removing Ella's node and reinstalling it, So I will give that a go now.
      What worked for me was uninstalling it and then using the Experimental Pip installer to install it from the ComfyUI manager. No clue why that worked.

  • @TUSHARGOPALKA-nj7jx
    @TUSHARGOPALKA-nj7jx 29 дней назад

    How does it too for low steps if we want to combine with LCM Lora?

  • @michalgonda7301
    @michalgonda7301 27 дней назад

    Hey thanks for your videos ;) you are awesome! ...
    Did you tried it with animediff ? :) do you think it would work? :) maybe even some control-nets would help it with understanding and it can color them correctly :)

  • @harshitpruthi4022
    @harshitpruthi4022 17 дней назад

    t5 model is showing as unidentified even after putting it in right folder , any help , i also placed it in ella-embedd folder

  • @peacetoall1858
    @peacetoall1858 4 дня назад

    Would have loved something like this for SD 1.5 on Forge

  • @suffolkcountysheriff
    @suffolkcountysheriff Месяц назад +1

    Would love to see this with a control net depth map

  • @dkracingfan2503
    @dkracingfan2503 Месяц назад

    Is there a huggingface space where I can try this out?

  • @mettlerai
    @mettlerai 10 дней назад

    Hi NerdyRodent,great video!.What operating system do you use to install Ella with comfyui?.

    • @NerdyRodent
      @NerdyRodent  9 дней назад

      For anything to do with artificial intelligence, your best bet is to run Linux with an Nvidia card 🙂

    • @mettlerai
      @mettlerai 8 дней назад

      ok awesome! thanks @NerdyRodent!. oh I was wondering if you Can make a tutorial on how to install comfyui on Linux?. and What version of Linux do you use for Artificial Intelligence?

    • @NerdyRodent
      @NerdyRodent  7 дней назад

      @@mettlerai links are in the video description!

  • @97BuckeyeGuy
    @97BuckeyeGuy Месяц назад +3

    Unfortunately, this group has stated definitively that they are NOT releasing their SDXL weights. I'm waiting to see what the RPG Dungeon Master group comes up with.

    • @NerdyRodent
      @NerdyRodent  Месяц назад +3

      We will also have to see what license they have on that 🫤

  • @DemShion
    @DemShion Месяц назад +1

    A shame it was tencent who came up with this, they stated that sdxl wont be made publicly available, they also didn't release the training process for 1.5. There is a community effort to reverse engineer the training process, hopefully they'll pull it off.

  • @jibcot8541
    @jibcot8541 Месяц назад +3

    Shame they probably aren't ever going to release the SDXL weights (only the SD 1.5 version)

    • @97BuckeyeGuy
      @97BuckeyeGuy Месяц назад

      They have stated definitively that they are NOT releasing the SDXL weights.

    • @NerdyRodent
      @NerdyRodent  Месяц назад

      Ah yes, I see they said after I’d made the video 😞 Still, maybe someone will do it in the future!

    • @prettyawesomeperson2188
      @prettyawesomeperson2188 Месяц назад +2

      Any reason for not releasing the weight for SDXL?

    • @97BuckeyeGuy
      @97BuckeyeGuy Месяц назад +1

      @@prettyawesomeperson2188 Business reasons... ie they want to make money with the good stuff.

    • @lalayblog
      @lalayblog Месяц назад +1

      I see it might not be so profitable to release SDXL version of this technique. 1. It produces the anime style of semi-realistic picture alone (flan-t5-encoder-only gives anime). You need to combine it's output conditioning with clip conditioning.
      2. SDXL prompt adherence comparable with Ella on sd1.5. I can't expect Ella adherence will be significantly better for SDXL (didn't see claims on that yet).

  • @jacket8818
    @jacket8818 29 дней назад

    Ok cool
    But, how good is it with cartoonish or anime style?

  • @shirwan
    @shirwan Месяц назад +1

    I'm getting this error "RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'" on a GTX 1660S, could the limited VRAM be the cause?

    • @NerdyRodent
      @NerdyRodent  Месяц назад +1

      My guess would be due to the older GPU 😕

    • @shirwan
      @shirwan Месяц назад

      @@NerdyRodent I figured that would be it, thank you anyway, I'm sure the requirements would be halved in the future

    • @rebeldeconcausa9227
      @rebeldeconcausa9227 14 дней назад +1

      @@shirwan The GTX1660s is obsolete for AI due to the scarce 6GB, I recommend the RTX3060 12GB which is very economical due to the launch of the RTX4060

  • @USBEN.
    @USBEN. Месяц назад +7

    FINALLY! Stable diffusion is at prompt adherence level of natural language as DALLE!!

  • @jeffkilgore2693
    @jeffkilgore2693 Месяц назад +1

    now I want chips

  • @prolamer7
    @prolamer7 Месяц назад

    I still dont understand HOW ELLA does what it does?

  • @jonmichaelgalindo
    @jonmichaelgalindo Месяц назад +2

    Just tried it. The prompt adherence is good (not as good as SD3), but the quality of SD1.5 is terrible. The SDXL version would probably be a lot better.

    • @itycagameplays
      @itycagameplays Месяц назад +2

      How about using it as first pass and then a SDXL model as second pass? The prompt adherence would help a lot.

    • @jonmichaelgalindo
      @jonmichaelgalindo Месяц назад

      @@itycagameplays Might be worthwhile.

    • @rebeldeconcausa9227
      @rebeldeconcausa9227 14 дней назад +1

      @@itycagameplays I've been doing AI images for a long time and the idea of ​​using the SD1.5 model to create hundreds of quick images and improve the one I like in SDXL didn't occur to me, thanks man 🤣

  • @MarcSpctr
    @MarcSpctr Месяц назад +2

    NO SDXL release and NO TRAINING CODE either.
    I just don't even bother working with such things, as it is just a dead end.
    edit: feels like they just released it to create hype for something else that they wanna monetize

    • @lalayblog
      @lalayblog Месяц назад

      Flan-t5-encoder-only-bf16 useless alone because it produces anime style only.
      So I agree that without fine tuning to realism it is useless.

  • @animetechs2191
    @animetechs2191 Месяц назад +1

    Can we be seeing this anywhere in automatic1111 or forge sd , i don't want to switch to comfy ui as i am already comfortable with forge

  • @kariannecrysler640
    @kariannecrysler640 Месяц назад +2

    🐀 ❤🤘

  • @virtualalias
    @virtualalias Месяц назад

    Doesn't OpenAI already do this with DallE3?

  • @thevoid6756
    @thevoid6756 Месяц назад

    combine this with FreeU and Self-Attention Guidance

  • @xXxPRxXx
    @xXxPRxXx Месяц назад +1

    CHISP!

  • @pragmaticcrystal
    @pragmaticcrystal Месяц назад +1

    🫶

  • @Mika43344
    @Mika43344 Месяц назад

    YOU USED 2 DIFFERENT SCHEDULERS, now you have to redo everything dude))))

  • @therookiesplaybook
    @therookiesplaybook 25 дней назад +1

    Can you please do a more clear instruction on the path. ComfyUI is complicated as it is without giving how you set all this spaghetti up.

  • @Valentinebej
    @Valentinebej 6 дней назад

    Can you do a HiDiffusion ComfyUI tutorial?🙏

  • @blahblahdrugs
    @blahblahdrugs Месяц назад

    So this only works with sd 1.5 models?

    • @lalayblog
      @lalayblog Месяц назад +1

      So, this thing is profitable for sd1.5 mostly. The only advantage for SDXL would be support for different languages but for the price of huge space consumption on SSD.

    • @blahblahdrugs
      @blahblahdrugs Месяц назад

      @@lalayblog It's working for me with sd1.5 but I prefer sdxl and it just crashes with sdxl.