I drove a Machine Insane

Поделиться
HTML-код
  • Опубликовано: 9 июн 2024
  • I used the Dreambooth script to slowly dissolve all the accumulated knowledge in a Stable Diffusion 1.5 model and documented it's decline by having it output images every 10 training steps until everything went black.
    This was inspired by a battery of failed experiments with dreambooth and some Midjourney v4 images which you can read more about here: medium.com/@lewingtonpitsos/c...
    Discord: / discord
    00:00 - Summary
    01:11 - What I was trying to do
    02:34 - How "training" works
    06:45 - Animation by Training
    08:58 - Descent
    ------- Links -------
    Huge dumb spreadsheet full of failed experimental conditions: docs.google.com/spreadsheets/...
    You can find some of the models I trained earlier on here: huggingface.co/lewington/MJv4...
    The final model from the animation output completely black images so there would be no point in uploading it.
    The final video can be found by itself on my alt channel: • A Stable Diffusion Mod...
    SD animation video was pilfered from Purz: • Stable Diffusion + Def...
    Dreambooth Paper: arxiv.org/abs/2208.12242
    Madness Colab Notebook (you can use this to replicate, but you will probably need to know how to code): colab.research.google.com/dri...
    ------- Music -------
    Background Music from RUclips Audio Library
    ELPHNT - A Great Darkness Approaches, Can You Feel It
    ELPHNT - Subterranean Howl
    Anno Domini Beats - Sinister
    The final track (J1 - Stage 4 Post Awareness Confusions) is from The Caretaker, an (excellent) series exploring dementia.
    It does not belong to me.
    Many thanks to all involved: • The Caretaker - Everyw...
    #stablediffusion #aiart #art #ai #machinelearning #tech #midjourney #dreambooth
  • НаукаНаука

Комментарии • 44

  • @LoneRanger.801
    @LoneRanger.801 Год назад +12

    Now THIS is uniquely awesome piece of content. Love it my friend. I would very much love to see more experimentations like this. It’s not about what works. It’s a lot more about what doesn’t work and what learnings and experiences we get through such experiments. Brilliant. Keep them coming! Love this content.
    Ps - would love for you to share more technical details about your experiments. What did you do (detailed steps)? How did you do it? What variables you changed or variations you tried. Outcomes. Etc. Thanks again for this lovely video.

    • @lewingtonn
      @lewingtonn  Год назад

      dude that's so nice, thanks for saying. I might do a related follow-up video soon!

    • @LoneRanger.801
      @LoneRanger.801 Год назад

      @@lewingtonn 😊👍🏼

  • @pavpav8627
    @pavpav8627 Год назад +5

    Captivating and spooky, if you are feeling sorry for the model remember, he can always train it back to coherency, one word at a time! :D

  • @bencomo28idart41
    @bencomo28idart41 Год назад +3

    This animation is much more interesting than the normal dizzy way, which I hate and makes me feel sick. this one is surprisingly interesting and watchable. Thanks for sharing your experiments and congratulations.

    • @lewingtonn
      @lewingtonn  Год назад +1

      dude, thank you so much for saying that

  • @autonomousreviews2521
    @autonomousreviews2521 Год назад +1

    Fantastic experiment! Thank you for your time!

  • @bencomo28idart41
    @bencomo28idart41 Год назад +1

    It's like reversing the life of the poor old model before he was born!!!

  • @PeppePascale_
    @PeppePascale_ Год назад +2

    you are a damn genius!

  • @-Belshazzar-
    @-Belshazzar- Год назад +1

    Awesome stuff! Project justified.
    I think it's also interesting because you can cleary see this way at what step the model breaks, around 5,000

  • @MysteryFinery
    @MysteryFinery Год назад +2

    Very interesting

  • @alefalfa
    @alefalfa 9 месяцев назад

    This is so awsome! Thank you for the video

  • @keisaboru1155
    @keisaboru1155 Год назад +2

    I feel like this is some kind or form of torture that I don't want to see again 😳😭.

    • @lewingtonn
      @lewingtonn  Год назад +1

      yeah, it feels kind of bad watching it hey...

    • @LoneRanger.801
      @LoneRanger.801 Год назад

      Why don’t you unsubscribe then? No one’s forcing you to watch it Mr Entitled. Stop complaining. Learn to appreciate other people and their work.

  • @brezl8
    @brezl8 Год назад

    thanks mate, super useful content

  • @ThatTrueCJ201
    @ThatTrueCJ201 Год назад +6

    I think your idea might have actually worked, given that the input pair of text and image was matched, rather than a generic "mjz" keyword, which would amalgamate all the input images into a single term.

    • @lewingtonn
      @lewingtonn  Год назад +2

      yes! so at that point we're no longer doing dreambooth training, we'd just be doing normal training, which means you need a label for each image... which IS possible but I think the midjourney people would have been super mad at me so I chickened out

    • @calmmarketing
      @calmmarketing Год назад

      so would this be what people refer to as captioning? ive seen projects like stabletuner and everydream doing this

  • @mordokai597
    @mordokai597 Год назад +3

    secondary comment: use the midjourney art as REGULARAZATION images, train the model with a SINGLE image of ANYTHING, but use complete gibberish for the instance and class tokens, then train it for about 5-10k steps with NO text encoder or ema training ;)

    • @lewingtonn
      @lewingtonn  Год назад

      you mean with a unique label for each image? Isn't that just literally doing normal non-dreambooth finetuning?
      I wanted to do that but it's not really allowed :(

    • @bencomo28idart41
      @bencomo28idart41 Год назад +1

      have you done this?

    • @mordokai597
      @mordokai597 Год назад

      @@lewingtonn it's not "recommended" if your goal is a coherent model still capable of photorealistic representations, but theres no one dis-allowing you to do it - the "AI police" police don't kick your door down for not following best practices xP, but i meant use the the amased midjourney art as your folder of regularazation images. it is possible to give every image an individual instance/class token if youre using thelastben collab or the current auto111 dreambooth implementation, (look up use of [filewords] label to read instance/class prompts directly from filenames) but im not talking about that either. to overrfit the aesthetic style of a whole model, without expunging the core ema/latent-space weights completly, use the style art as the reg images, use any random image as the singular training image, and the instance and class token of the ONE image to a gibberish string that has no prior associations to class objects and is impossible to replicate with typos or simulate through strings of human readable text (because youre going to type it in ever again). youve just givin it another impossible task, but in a way that will cause it to start making random assosiations with the instance image, and the only to decrease weighting lose is for the model to STOP trying to actually learn the new image. the ONLY way to yeild image with lower low rates in this scenario, is for ALL future images to share as many aspects of the reg images as possible so some aspect of the color or detail or SOMETHING about reg images ends up matching the instance image. (ie if you train on an image of a random woman, set the inastance and class tokens to something like kjdfngk;jhsdfkujgnbsdlkrfhbgkjdkj, and use a bunch of nude art cgi with octane render unreal engine as regularazation images, you eventually end up with a model that ONLY wants to make cgi lewds of women with high color saturation, lens flares and misty ambient occulsion)

  • @mordokai597
    @mordokai597 Год назад +1

    lol, i do this on a daily basis as part of benchmarking local settings, stress testing a retrofit tesla m40 i added to the build, and just for laughs. have you tried using the same images as training images and regularazation images at the same time, while using a json file to vector the same instance token onto multiple class tokens? like take pictures of a from, and tell it it's man, woman, boy, girl, and person... the speed at which it learns to render frogs, the quickly devolves into incoherent madness is pretty impressive. i also do things like use anime as reg images while i train it on a photorealistic subject, vectored to class token minamalism, and simultaneously train it on some other style based on my own vanilla model output

    • @lewingtonn
      @lewingtonn  Год назад

      well at the point where you're creating a bespoke prompt for each image you may as well train normally right (as opposed to dreambooth)? I was going to do that, but it's a clear breach of the midjourney Ts and Cs 😢
      sounds like you really know your stuff though, what's the coolest SD finetune at the moment in your opinion?

  • @RexelBartolome
    @RexelBartolome Год назад

    dreambooth is limited in its training method, there's already general finetuning methods like StableTuner for a better way to train images without the model going goblin mode

  • @swannschilling474
    @swannschilling474 Год назад +2

    I am not sure, but maybe you should have had more regularization images? But also yes, it is obvious that at some point you were breaking the model!! Good job!!! 😁

    • @lewingtonn
      @lewingtonn  Год назад +2

      yeah, the rest of my experiments (e.g. row 40 here: docs.google.com/spreadsheets/d/1JXPFBfxoaEzVc0c-t0UDLYBVjiOPAtm36BXTELMYnl0/edit?usp=sharing) indicate that even with reg images it would have gone nutso, but it would probably have taken longer, and it already took 8 hours

  • @joe.todddq
    @joe.todddq Год назад +2

    Sd 2.0 video soon ?

  • @teac117
    @teac117 9 месяцев назад

    Lovecraftian horror for AI is... modern art. Gotcha :P

  • @phizc
    @phizc Год назад

    Fixing the seed and prompt and making mixed models could make for cool animations. E.g.
    Frame 1 : 1.0*SD15
    Frame 2: 0.95*SD15 + 0.05*Waifu
    Frame 3: 0.9*SD15 + 0.1*Waifu
    ...

  • @qaesarx
    @qaesarx Год назад

    Just remember, machines can REMEMBER! Who knows in the future, AI WILL remember! Lets be careful! 😀

  • @TheKdcool
    @TheKdcool Год назад +2

    Maybe the learning rate is way too high, you completely lost all weights at 5000 steps so it's like you lost 0,0002 at every step

    • @lewingtonn
      @lewingtonn  Год назад +1

      sooooo, I ended up using custom scheduler which increased the learning as the steps increased (otherwise it would have taken 800 hours rather than 8)... I think that's what ended up blowing it out at the very very end.
      I started with a very reasonable LR though it was below 1e-5 until around step 6000

    • @TheKdcool
      @TheKdcool Год назад

      Got it, I guess you could train like one subject at a time to be in Midjourney style using this technique tho.
      If not, textual inversion using "in the style of" as if it was an artist maybe could help

    • @lewingtonn
      @lewingtonn  Год назад +1

      @@TheKdcool yes, if you trained it long enough you'd definitely experience the same language drift, etc etc, but the cool thing about feeding it a TON of MJ art is that it overfits in this really nice artistic manner, like, look at those clouds, or the mystical "happy dog" it keeps manifesting.
      In my experiments with 7 images you start overfitting those 7 subjects really quick.

    • @hendrikoosthuizen4002
      @hendrikoosthuizen4002 Год назад +1

      My learning rate is very high when I watch these videos :)

    • @lewingtonn
      @lewingtonn  Год назад +1

      @@hendrikoosthuizen4002 if it's too high your internal structure will deteriorate, be careful

  • @j.j.maverick9252
    @j.j.maverick9252 Год назад

    I think your diagram is inaccurate. You show the model feeding back to train itself, but from what you’ve said I believe there’s an evaluation step that looks at the output image, rates it, then provides that rating to the back prop learning stage that alters the model. Without this, the experiment is just random walking.
    With that extra step in place, the question is clear… why didn’t it “improve” according to the rating… and the answer must be either the learning is broken, the rating is broken, or the rating is random.

    • @lewingtonn
      @lewingtonn  Год назад +1

      when I read the dreambooth paper I thought it meant "we compare the instance images (the 1601) to the outputs, and penalize the model for making things that don't match them" for some complex definition of "match" so I am assuming that this is what is going on under the hood. That's what you mean by "evaluate" right?
      But yes, my diagram is very high level. I feel like I'm always tying to walk this fine line between rushing the content and a 40 minute video...

    • @j.j.maverick9252
      @j.j.maverick9252 Год назад +1

      @@lewingtonn yep, in this context they’re judging what constitutes a match so I’d say that’s an evaluation. Totally get the difficulty in pitching things at an accessible level while trying to remain accurate, in a limited time too! I think my point may be valid though, when you draw that extra step into the loop-back, it seems clear that the behaviour you’ve discovered is bizarre and (possibly) one of the systems is not working as expected

    • @lewingtonn
      @lewingtonn  Год назад

      @@j.j.maverick9252 possibly.... I think I'd say it's more likely that the training problem is just too hard.. like, making an image that somehow "matches" 1601 very different images is likely to confuse the heck out of the model. I could be wrong though

  • @ernestobenson2948
    @ernestobenson2948 Год назад

    🙋 'Promosm'.

  • @daniel99497
    @daniel99497 Год назад

    first