Textual Inversion with Automatic1111 (I Read The Paper)

Поделиться
HTML-код
  • Опубликовано: 14 май 2024
  • Textual inversion is very similar to dreambooth, in both cases you use 3-5 sample images to teach stable diffusion about a concept or style, which the model then learns to generate. Textual inversion has two key advantages, (1) is non-destructive and does not effect the original model and (2) it produces a highly portable embedding, rather than a new model.
    We talk about how Textual Inversion works according to the authors of "An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion", and then once we understand the underlying mechanics we download some checkpoints from civitai and actually use them with Automatic1111's webui.
    Discord: / discord
    ======= Links =======
    The Paper: arxiv.org/abs/2208.01618
    Video about using dreambooth to destroy a model: • I drove a Machine Insane
    ======= Music =======
    Music from freetousemusic.com
    ‘Onion’ by LuKremBo: • (no copyright music) l...
    ‘Late Morning’ by ‘LuKremBo’: • (no copyright music) c...
    ‘Marshmallow’ by ‘LuKremBo’: • lukrembo - marshmallow...
    ‘Butter’ by LuKremBo: • lukrembo - butter (roy...
    From RUclips Audio Library:
    Escapism Yung Logos
    #tutorials #techtutorials #ai #text2image #art #aiart #stablediffusion #automatic1111
  • НаукаНаука

Комментарии • 51

  • @SoCalGuitarist
    @SoCalGuitarist Год назад +31

    Hey, I'm the author of the SDA768 Embed, glad you like it! It's actually one of my earlier embeds, I should probably run a V2 on it at some point. Thanks for the shoutout! 👍🏼

    • @GG666KING
      @GG666KING Год назад

      AI "art" will never replace human art. What you're doing is illegal

    • @jeff_clayton
      @jeff_clayton Год назад +1

      The courts decide that -- and it is different in every country what is legal there. This type of battle happens EVERY SINGLE TIME there is a new big technology. Not everything created by AI is using copyrighted work either. There are bezillions of free and public domain (defined as not copyrighted or out of copyright) works that can be used with these new technologies even if the rest gets ruled unlawful later.

    • @jeff_clayton
      @jeff_clayton Год назад

      I for one am dying to see how it all pans out, or if it doesn't just continue to go back and forth in courts for the next several decades. In other big cases there have been varying results... settling out of court, laws changing, or a real loss - but sometimes the losing party moved to a country where their thing was not against the law. Internet companies can do that, so something may be deemed illegal where you are, but NOT where they are.

    • @ArisenProdigy
      @ArisenProdigy Год назад

      I love your textural inversions. I haven't gotten quite to the point of training one myself, but I really want a negative text/words/letters inversion ;)

  • @meanwhiles432
    @meanwhiles432 Год назад +8

    Just wanted to point out that Empire also has a negative embedding file you can also download. That will likely be the reason for the difference. For some reason the negative embedding improves my outputs.

  • @generalawareness101
    @generalawareness101 Год назад +8

    I love making embeddings as I released a few now, and I am working on purely negative prompt embeddings with amazing results.

    • @lewingtonn
      @lewingtonn  Год назад +5

      if you have any cool embeddings pls share on the discord!

  • @ismailtibba
    @ismailtibba Год назад +3

    2.0 Embeddings works great with 2.1 model

  • @techviking23
    @techviking23 Год назад +1

    Thanks for the great explanation!

  • @Z10T10
    @Z10T10 Год назад +3

    @koiboi Please make a video about training an embedding, there is already some videos out there about TI and how to train one but no one is giving some detailed info about settings, learning rate, steps, ideal amount of images and why, some TI templates for a person face or a style
    There is so much to be mentioned but everyone is like setting learning rate to 0.005 and steps to 15000 and results are horrible

  • @swannschilling474
    @swannschilling474 Год назад +1

    Thanks for the recap!! 😇

  • @cryptidsNstuff
    @cryptidsNstuff Год назад +4

    Nice work as always.

  • @poisenbery
    @poisenbery Год назад

    Dude, thank you for going over the technical details.
    I was raging with confusion as to why textual inversion takes massively more time than making a new model.
    It makes sense that having to train something, essentially from scratch, would take longer than to build on existing knowledge.
    I also was not aware that dreambooth is a destructive process. It's very obvious to me now that you said it, but WOW I did not make that connection before.

  • @ratside9485
    @ratside9485 Год назад +5

    Donates a wndows key to the man.

  • @jordandavis406
    @jordandavis406 Год назад

    You're one of the best RUclipsrs of all time. Don't change a thing.

  • @simonbronson
    @simonbronson Год назад

    Nice one....Looking forward to a how to for Textural Inversion 😃

  • @lkewis
    @lkewis Год назад +1

    Great video and explanation, though Textual Inversion came before Dreambooth, it was originally the only way to easily teach a new concept into Stable Diffusion with a limited dataset, then Dreambooth came out after and was implemented for SD instead of Imagen.

  • @MultiMam12345
    @MultiMam12345 Год назад

    Amazing tools and workflow---> spend time on trying to replicate a seed with copy paste only to think its garbage because its not a copy paste result of the embedding example. Great art, you must have an amazing paintbrush.

  • @reijin999
    @reijin999 Год назад

    thanks for reading the paper

  • @zerodefcts
    @zerodefcts Год назад

    Thanks very much for such an excellent video as usual, question for you. I your description of how DreamBooth works, how do the regularization images work within that explanation. Thanks!!!

  • @kesar
    @kesar Год назад +1

    thoughts about Textual Inversion vs Lora?

  • @abuelos84
    @abuelos84 Год назад +1

    If anything, watching the images generated as the TI is being trained is pretty funny. Especially if you're training your own face.

  • @UnderstandingCode
    @UnderstandingCode Год назад

    230 would like to hear more on these

  • @Nalestech
    @Nalestech Год назад +2

    Great explaination! I am finally understanding how it all works. I have made heaps of successful ckpt models but embeddings have been a challenge. They only produce the images I train them on. I ask for something different and it spits out the same images. You mentioned using 3-5 images while I have been using ~20. Perhaps that is my issue? 🤪

    • @lewingtonn
      @lewingtonn  Год назад +1

      there are a lot of ways you can do training wrong sadly, definitely have a go with 3-5 images, maybe that will help

    • @Nalestech
      @Nalestech Год назад +4

      @@lewingtonn I went with 5 and it worked much better. I also went with fewer steps. 7500 worked better than 20k. Rather counterintuitive but I'm thinking I overtrained the earlier attempts.

    • @lewingtonn
      @lewingtonn  Год назад +1

      @@Nalestech that's siiiiiiick!!!!

  • @haydenmartin5866
    @haydenmartin5866 Год назад

    i seem to be having issues creating hypernetworks in 2.1... as i monitor my textual-inversion images they are all identical even though i have a training set and everything else setup

  • @KeinNiemand
    @KeinNiemand Год назад

    What about hypernetworks? Also what's the diffrence between dreambooth and real finetuning?

  • @RexelBartolome
    @RexelBartolome Год назад

    I really like the thought of a few kb's worth of data that's easily shareable but the quality just isn't the same with dreambooth or general fine tuning... So for now, I'll have to make do with 2GB models or look into merging to cut them down a bit

    • @juanjesusligero391
      @juanjesusligero391 Год назад

      I read somewhere that textual inversion works much better on Stable Diffusion 2.0 and 2.1 (I haven't tried though). Maybe on future model versions its quality will improve even more?

    • @RexelBartolome
      @RexelBartolome Год назад

      @@juanjesusligero391 thats true, 2.1 embeddings are already powerful but for my use case (specific art styles) it's still not as good as a 1.5 dreambooth/finetune. hopefully it gets even better though :)

  • @bobbob9821
    @bobbob9821 Год назад

    Textual inversion - best for training one very specific object or person that you'd like to use on multiple models.
    Models - Best for training a larger "class" of persons or objects or a certain style.

  • @Peppermint_juice
    @Peppermint_juice Год назад

    Can u explain how we can add different checkpoint in Automatic 1111 Google Collab?

  • @tag_of_frank
    @tag_of_frank 2 месяца назад

    Are they training for a specific sampler if so how?

  • @p_p
    @p_p Год назад +1

    10:00 maybe is the size, cause wasn't a square.. i dunno

    • @lewingtonn
      @lewingtonn  Год назад

      lol, maybe, there are a lot of ways to make it NOT work

    • @p_p
      @p_p Год назад

      @@lewingtonn hjahahah true buddy

  • @SteveWarner
    @SteveWarner Год назад

    Keep in mind that your model has a massive impact on the TI/embed, both in terms of its creation and its end use. You're using the base SD1.5 model, which is more or less garbage. A decent model will significantly improve your results with any embed you add on top of it.

  • @juanjesusligero391
    @juanjesusligero391 Год назад +1

    How much VRAM do I need to create textual embeddings? (I've got an Nvidia with 8GB VRAM, I hope I can! ^^)

    • @Z10T10
      @Z10T10 Год назад +3

      I'm doing it on rtx 2060 (6 gb)

    • @juanjesusligero391
      @juanjesusligero391 Год назад

      @@Z10T10 Those are really good news for me! Thank you very much! :D Are you using automatic1111 repo to create them? Or maybe another method?

    • @Z10T10
      @Z10T10 Год назад +3

      @@juanjesusligero391 I'm using a1111 typical method (the standard train tab), just to decrease vram usage go to setting and under training check "Move VAE and CLIP to RAM when training if possible. Saves VRAM." and "Use cross attention optimizations while training"
      When training set "Save an image to log directory every N steps, 0 to disable" to 0, this way it will not generate any images while training to save some vram. For the other settings I wouldn't claim I'm expert at that but I found 15,000 steps with 0.00005 of embedding learning rate fine but not the best.

  • @darshitgoswami
    @darshitgoswami Год назад +1

    Can it replace dreambooth ?? Can u compare with real person , not just style

    • @pipinstallyp
      @pipinstallyp Год назад +2

      They fundamentally work in different ways, textual inversion doesn't add any new images to the model itself. Except it takes parts of text that corelates with features of an image.
      It's like -> you give a bunch of images to the TI script, TI identifies features of the images and then make associations with the text prompt/keyword.
      Whereas dreambooth burns new images into the model itself. By converting images to noise, and then model making those noise back into the images. The model has a new concept introduced. That's dreambooth. Just a token, that's it.
      So fundamentally if TI doesn't introduce new images, then it's pretty much not recommended to train a non celebrity person inside it. The native SD model simply doesn't have our pictures. Though if you wanna introduce new artstyles, sure! Since SD has a lot of artstyles in it, and TI only makes better connections/word associations.

    • @techviking23
      @techviking23 Год назад +1

      ​@@pipinstallyp 🙏 thanks for the explanation! Are you saying that TI would work better for styles than personal character? And Dreambooth is better at character than style?

  • @khirondb
    @khirondb Год назад

    I might be dumb but i think your first test was off because of the hight x width.

  • @poisenbery
    @poisenbery Год назад

    7:48
    I should mention that those details are user generated during upload. CivitAI doesn't auto generate that based on special data, it's the user's responsibility to put the correct info when they upload.
    There are a lot of users who input incorrect settings.
    One glaring example are people who make NAI based merges, and recommend clip skip of 2, but they list "SD 1.5" as the base model
    EDIT: Yeah the user clearly did not put the correct checkpoint. There's no fkn way they got that wtih SD1.5, it's very obviously an NAI based model they used.

  • @MultiMam12345
    @MultiMam12345 Год назад

    I love using AI to create by using my own material to create models. But using existing IP without permission, then sell it is no different than stealing a bread. Or thinking that tickets for a concert of your favorite band should be free. These models exist because the artists who made them possible could buy and eat bread. Lets wait until AI gets hungry. I trust it will be smart enough to know whom to eat first😎