Stable Diffusion in Code (AI Image Generation) - Computerphile

Поделиться
HTML-код
  • Опубликовано: 22 ноя 2024

Комментарии • 445

  • @BernardJollans
    @BernardJollans Год назад +49

    If anyone is stuck with the code. The "i" should be a "t" in this line in the loop:
    ```
    latents = scheduler.step(noise_pred, i, latents)["prev_sample"]
    ```

    • @alenmathew8115
      @alenmathew8115 Год назад +1

      Did you get the code working?. for me it's showing "unsupported operand type(s) for /: 'DecoderOutput' and 'int'" in line 59

    • @Phobos221B
      @Phobos221B Год назад +10

      @@alenmathew8115 in the last few lines, change this line
      image = (image / 2 + 0.5).clamp(0, 1) to this image = (image.sample / 2 + 0.5).clamp(0, 1)

    • @peepdawg8995
      @peepdawg8995 Год назад

      man this helped me. thanks bro :)

    • @mayurpatil9871
      @mayurpatil9871 Год назад

      Thanks man because of you I solved this error

    • @romainflorentz5771
      @romainflorentz5771 Год назад

      Also in the Image Loop section, this needs to be moved inside the for loop :
      ```
      # Prep Scheduler
      scheduler.set_timesteps(num_inference_steps)
      ```

  • @DampeS8N
    @DampeS8N 2 года назад +285

    I've been using Stable Diffusion to _deCGI_ images. Take a screenshot from a game, run it through SD with a low noise rate, give it a detailed description of everything in the picture and it produces pretty solid photo recreations of the images. Also, often, it gets possessed by Eldritch gods and spews out monsters.

    • @zwenkwiel816
      @zwenkwiel816 2 года назад +21

      So win-win, right?

    • @MattRose30000
      @MattRose30000 2 года назад +5

      now do it in real time with DLSS and you've got something huge

    • @DampeS8N
      @DampeS8N 2 года назад +17

      @@MattRose30000 This is a long way off. It isn't just that it currently takes my 3090 Ti about 5 minutes to do one frame at 1024x1024 but also it can't be playing a game at the same time and also-also it would be very disorienting because each frame will be a _different_ photo that isn't consistent from frame to frame but probably the worst part is that _you need to write a text prompt that reflects what is in the scene for each frame somehow._

    • @FayezButts
      @FayezButts 2 года назад

      @@DampeS8N that’s great. Have you messed around with reusing seeds across different frames? I imagine if you get an output you like you’d want to reuse that seed

    • @dibbidydoo4318
      @dibbidydoo4318 2 года назад +1

      @@DampeS8N making text to video is the easy part, making video to text is the hard part.

  • @YSPACElabs
    @YSPACElabs 2 года назад +34

    I've been playing with Stable Diffusion (specifically the "InvokeAI" fork because I don't have 10gb VRAM), and I've found out that spamming the end with keywords like "realistic, 4k, trending on artstation, 8k, photorealistic, hyperrealistic" has more effect on how good the output image is than I thought.

    • @ShankarSivarajan
      @ShankarSivarajan 2 года назад +11

      You should try negative prompts.

    • @nicoliedolpot7213
      @nicoliedolpot7213 2 года назад +4

      to add, try emphasis "((x))" for specific objects.
      Edit: you can also use x(y), y being the weight value for that tag.

  • @morphman86
    @morphman86 2 года назад +202

    Mike asked himself what the use case for mixing two prompts is.
    I used this only yesterday, to produce a photorealistic painting of an owlbear from DnD...
    So it has practical uses!

    • @MushookieMan
      @MushookieMan 2 года назад +42

      Maybe google is planning to create new, even more impossible captchas. "Select all the cat-dogs in the picture"

    • @dembro27
      @dembro27 2 года назад +5

      Does it hoot or roar??

    • @IceMetalPunk
      @IceMetalPunk 2 года назад +3

      @@dembro27 It hoots and growls, in fact, here at Aguefort's Adventuring Academy!

    • @euchale
      @euchale 2 года назад

      Its how I make my fish people too for tabletop. Tons of applications for DnD

    • @morphman86
      @morphman86 2 года назад

      @@euchale You get half-decent tieflings if you ask for a quarter human, a half lizard and the last quarter goat.

  • @Yupppi
    @Yupppi 2 года назад +5

    I really liked the stable diffusion that came with the webui that you could install on your own computer, to avoid quotas or subscription costs, and it provided easy to use UI as well. With inpaint feature inside the UI as well. Shoutouts to people who make those applications from the rough code for regular people to use.

  • @IceMetalPunk
    @IceMetalPunk 2 года назад +131

    The very concept of embeddings is amazing to me. It's literally "organize concepts themselves into points in space, where similar things are closer together, in many many dimensions; now you can do arithmetic on *the meanings of words, phrases, and sentences.* " Want to add the meaning of "horse" and the meaning of "male"? Well, just add these vectors together and the resulting coordinates will point right at "stallion"!
    They amaze me so much that, when I watched Everything, Everywhere, All At Once for the first time, I completely geeked out when I realized their description of the organization of the multiverse is effectively a well-embedded latent space 😅

    • @floydmaseda
      @floydmaseda 2 года назад +15

      @@mrteco4236 It literally is and is done all the time.

    • @IceMetalPunk
      @IceMetalPunk 2 года назад +15

      @@mrteco4236 It's... common, in fact. There's a whole video on this channel about embeddings. And it's how CLIP fundamentally works...

    • @TheColorman
      @TheColorman 2 года назад +1

      This is super fascinating, especially as someone studying Data Science just learning about vector spaces and their many uses!

    • @alexanderkirilov7820
      @alexanderkirilov7820 2 года назад

      @@mrteco4236 lol

    • @Emperorhirohito19272
      @Emperorhirohito19272 2 года назад

      @@mrteco4236 that is literally what it does bro

  • @jeffwads
    @jeffwads 2 года назад +125

    SD is just outstanding. It can mimic the other projects and the 1.4/1.5 models will be public domain. You can't beat that.

    • @zwenkwiel816
      @zwenkwiel816 2 года назад +10

      Lol just add "dall-e 2" to your prompts XD

    • @paryska991
      @paryska991 2 года назад +9

      1.5 model just went public today i think

    • @StefanReich
      @StefanReich 2 года назад +1

      @@paryska991 Ye

    • @dgo4490
      @dgo4490 2 года назад +5

      You can beat that with human creativity that doesn't require billions of calculations per second to brute force a synthetic result.

    • @zwenkwiel816
      @zwenkwiel816 2 года назад +11

      @@dgo4490 doesn't it though?

  • @byteborg
    @byteborg 2 года назад +6

    I love it how you simplify and explain this heap of complexity that is in generative models like this. You gave me the impulse to play around with it, inspite of being pretty complicated code due to the depth of the abstraction. It's a lot of fun to fantasize about something and have the model come up with a visual representation.

  • @paulspaws1521
    @paulspaws1521 2 года назад +388

    I'm sorry but , "unlock your face with your phone" just cracked me up..

    • @deadfr0g
      @deadfr0g 2 года назад +39

      This is inadvertently an excellent poetic description of someone using the selfie camera to apply makeup.

    • @zwenkwiel816
      @zwenkwiel816 2 года назад +12

      Unlock your phace with your fone

    • @afog
      @afog 2 года назад +4

      I think he was referring to using the Energizer Power Max P18K whilst in bed... :)

    • @davidm2.johnston684
      @davidm2.johnston684 2 года назад +2

      Hahahaha didn't even notice!

    • @absalomdraconis
      @absalomdraconis 2 года назад +4

      I am reminded of an odd commercial from a few years ago: "apply directly to the forehead".

  • @lucamatteobarbieri2493
    @lucamatteobarbieri2493 2 года назад +3

    I like how your channel has adapted to the advent of the machine learning boom we are experiencing

  • @christopherg2347
    @christopherg2347 2 года назад +91

    "Simple, you just chip away all the stone that doesn't look like David."

    • @housellama
      @housellama 2 года назад +14

      "I saw the angel in the marble and carved until I set him free" - Michalangelo

  • @thomasnicolet9561
    @thomasnicolet9561 2 года назад +18

    The current version of the reference notebook is already deprecated due to Hugging Face's API changes :)
    You try to operate on "image", which is now a DecoderOutput class:
    image = (image/ 2 + 0.5).clamp(0, 1)
    It is fixed by unpacking its tensor attribute with its sample method:
    image = (image.sample / 2 + 0.5).clamp(0, 1)

    • @Dancedfsk8
      @Dancedfsk8 2 года назад +2

      The rest of the notebook is hard to fix, I tried but in vain. I think I'll wait for Mike's update.

    • @victorwesterlund4826
      @victorwesterlund4826 2 года назад +3

      Same goes for pil_to_latent():
      AutoencoderKL.encode() returns a AutoencoderKLOutput class:
      return 0.18215 * latent.mode()
      The desired DiagonalGaussianDistribution class is now a property ("latent_dist") of this new class:
      return 0.18215 * latent.latent_dist.mode()

    • @Dancedfsk8
      @Dancedfsk8 2 года назад +2

      in img2img,
      I just extract the code of add_noise and used int instead of floatTesnsor.
      Change add_noise function to the following.
      also notice the for loop now loop 51 times.
      Not sure if this is correct, but at least it works.
      # View a noised version
      noise = torch.randn_like(encoded) # Random noise
      for i in tqdm(range(51)):
      scheduler.sigmas = scheduler.sigmas.to(device=encoded.device, dtype=encoded.dtype)
      scheduler.timesteps = scheduler.timesteps.to(encoded.device)
      sigma = scheduler.sigmas[i].flatten()
      while len(sigma.shape) < len(encoded.shape):
      sigma = sigma.unsqueeze(-1)
      noisy_samples = encoded + noise * sigma
      img = latents_to_pil(noisy_samples)[0]

    • @aaron6807
      @aaron6807 Год назад +1

      @@victorwesterlund4826 What is the 0.18215 for? I keep seeing it in the code but I can't find an explanation for what is does or how it's derived

  • @simplesimon4561
    @simplesimon4561 Год назад +117

    I would like to see a version of the code where it shows the result of each step, so you can see the noise getting reduced with each iteration

    • @JalexRosa
      @JalexRosa Год назад +6

      me too!!

    • @gianluca.g
      @gianluca.g Год назад +10

      I think I'm going to do it. I'm downloading the source code and save a png for each step

    • @AlphaNovaOfficial
      @AlphaNovaOfficial Год назад +7

      Not necessarily what you're after, but if you "interrupt" a run, you can see what it's current progress was. Depending on your steps and how early you catch it, I've seen some very interesting early "noisy" images that were themselves inspiration for other images!

    • @ReneArmenta19
      @ReneArmenta19 Год назад +2

      There is already a script for that

    • @m0nkeyb0i666
      @m0nkeyb0i666 Год назад +13

      If you run automatic1111 there’s a setting for that, uses slightly more vram, but it’s great to watch it work

  • @paultapping9510
    @paultapping9510 2 года назад +111

    "there are questions of ethics, there are questions on how it's trained. Let's leave those for another time"
    well, if that doesn't just sum up the tech industry.

    • @monad_tcp
      @monad_tcp 2 года назад +9

      what ethics ? its just a tool, and its highly dependent on human input.

    • @paultapping9510
      @paultapping9510 2 года назад

      @Luiz remember the AI chatbot that became incurably racist because it was trained on data scraped from 4chan amongst other places? That sort of thing.

    • @purplewine7362
      @purplewine7362 2 года назад +7

      that sums up every industry. you think people didn't copy art before ai? it's just a tool

    • @paultapping9510
      @paultapping9510 2 года назад +6

      @@purplewine7362 lol, not even close to the point I was making. Never mind.

    • @purplewine7362
      @purplewine7362 2 года назад +1

      @@paultapping9510 you weren't trying to make any point, otherwise you would have clarified. You were just trying to sound smart.
      Also, liking your own comments is pathetic.

  • @jenka1980
    @jenka1980 Год назад +1

    Love Mikes explanations, somehow he manages explain so complicated stuff in so simple and understandable way.
    It will be interesting to know Mikes opinion om Midjourney as it's seems like the winner for now among the picture creation AIs.

  • @_inetuser
    @_inetuser 2 года назад

    this is so interesting and has so many unexplored use cases

  • @HerleifJarle
    @HerleifJarle Год назад +1

    Thanks for the explanations of how AIs are being trained. I can see a slight hint of a neural network here. I think the advantage now is that companies like Bluewillow is utilizing discord to quickly gain testers free of charge even.

  • @DeKubus
    @DeKubus 2 года назад

    Immediately recognized the book on Dr. Ponds desk - Prof. Paar was one of my teachers when I studied IT sec. Nice to see it outside of Germany too!

  • @heurve
    @heurve Год назад +2

    On line 56, the image is coming from the sample property of the DecoderOutput, change to
    55: with torch.no_grad():
    56: image = vae.decode(latents).sample

  • @angeleeh
    @angeleeh Год назад +2

    Mike is a legend, truly great videos with him

  • @Tymon0000
    @Tymon0000 2 года назад +3

    I generated thousands of images with stable diffusion. It's really fun and inpiring.

  • @aorusaki
    @aorusaki Год назад

    This video finally explained the code to me in a simple way! Now im less confused!!! Amazing extra documentation from you guys

  • @Mutual_Information
    @Mutual_Information 2 года назад +12

    Anyone else surprised that diffusion models are the clear winners for image generation? And GANs have almost completely fallen from favor? I haven’t seen them in any recent SOTA work..

    • @timmyt1293
      @timmyt1293 2 года назад +6

      Mmm isnt it still kinda a GAN? Stable diffusion uses a transformer block not just for the diffusion but for identifying what the actual image is from the diffusion output too. So isn't that technically a GAN? Generate images from the diffusion model, then try to categorize them through an adversarial transformer network?

    • @erikp7378
      @erikp7378 2 года назад +9

      @@timmyt1293 Actually there is no adversarial training in diffusion models in general (in particular for stable diffusion model). The condition processing is used only for guidance (free classifier guidance in this case) and from a theoretical perspective the diffusions models are closer to hierarchical variational autoencoders where the encoders are fixed diffusion steps and decoders are denoising steps with the trained noise estimation model.

    • @JadeNeoma
      @JadeNeoma 2 года назад

      @@erikp7378 I wonder if you could impliment stable diffusion inside a GAN. So have the generator define the parameters for the stable diffusion based on an input and then give that to the classifier mixed in with non ai generated images

    • @dibbidydoo4318
      @dibbidydoo4318 2 года назад

      @@JadeNeoma I don't know how that would work.

    • @erikp7378
      @erikp7378 2 года назад

      @@JadeNeoma its depends on which parameters you have in mind but the main point is that the operations must remain differentiable in order to optimize the model. And in the case of hyper parameters inference it is not trivial in many cases (e.g. the number of steps)

  • @serta5727
    @serta5727 2 года назад +14

    Mikes explanations Aretha best ❤

  • @johnnyw525
    @johnnyw525 Год назад +1

    I didn't realise that this is basically the next evolution of the "AI Upscaling" technology that has been used to in videogame mods: Take an image and then add detail until it looks like what I think it's supposed to. It's still mind-bending how it results in what it does, but AI Upscaling wasn't so scary, so I suppose this feels a bit less scary now.

  • @RelaxingSerbian
    @RelaxingSerbian 5 месяцев назад

    The notebook can still work with a few minor tweaks: The text prompt should be multiplied by the batch size; The scheduler step takes in "t" instead of "i", and now it prefers scaling via scheduler.scale_model_input(latent_model_input, t) rather than with explicit sigma. Also, torch.autocast did not work on my local machine for some reason.
    Anyway, thanks a lot for the code.

  • @theemathas
    @theemathas 2 года назад +39

    I doubt DALL-E 2 is the “biggest” image generator. Stable Diffusion is probably bigger. In my circle, the biggest one is NovelAI, which is a Stable Diffusion variant specialized in anime-style images. Notably, its training data is probably the best image dataset out there in terms of detailed labels.
    It’s already been causing a lot of drama in the community. One notable case involved someone feeding a WIP drawing to img2img, posting it, claiming it as their own drawing. When the actual artist posts their finished image, this person then proceeds to accuse the artist of copying “their” art.

    • @dibbidydoo4318
      @dibbidydoo4318 2 года назад

      Imagen by Google and NUWA-infinity by Microsoft are probably superior.

    • @felixjohnson3874
      @felixjohnson3874 2 года назад +5

      Would your "circle" happen to fit after rule 33 and before rule 35?

    • @nicoliedolpot7213
      @nicoliedolpot7213 2 года назад +4

      The danbooru property labeling format, to be exact. Training is rather easy as the images in the booru databases are human-labeled.

  • @OliverHempel-r7p
    @OliverHempel-r7p 9 месяцев назад

    great video. today SORA was launched, nad youvideos help to understand whats going on the background. many thanks!

  • @lolerskates876
    @lolerskates876 Год назад

    Thank you for trying to fix the code after the API update broke it

  • @dakotaknutson
    @dakotaknutson 2 года назад +15

    For anyone trying to get the notebook to work and is getting this error: "TypeError: unsupported operand type(s) for /: 'DecoderOutput' and 'int'" change "image = (image / 2 + 0.5).clamp(0, 1)" to "image = (image.sample / 2 + 0.5).clamp(0, 1)". As noted at the top of the notebook it seems the huggin API has changed.

    • @hipposhark
      @hipposhark 2 года назад

      wow thank you very much
      can confirm that this indeed solves it👍

    • @koh8614
      @koh8614 2 года назад

      In my case it outputs a Hugging Face Tokens page warning? It says that I need a token? Is it free?

    • @hipposhark
      @hipposhark 2 года назад

      @@koh8614 yes it is free. you need to create an account on the hugging face website and generate a token from your profile.

    • @JavadZahiri
      @JavadZahiri 2 года назад

      Thank you

  • @3dlabs99
    @3dlabs99 2 года назад +7

    We need an entire "Frogs on stilts" channel.

  • @martinoandreascarpolini5128
    @martinoandreascarpolini5128 Год назад +3

    [notebook error] Hello, Thanks for the fantastic video. I noticed that as of today the notebook does not run since there are some errors. I do not why, probably some library changed a bit.The first error is at line 50 of the cell with the first inference loop. Instead of 'i' there should be 't'. The second error appears at line 59. Now to access the image's tensor you have to write 'image["sample"]' instead of just 'image'.

  • @vanderkarl3927
    @vanderkarl3927 2 года назад +6

    Seeing that GPT-2 vid reminded me: we haven't had Robert Miles on in a fair while. Is he just too busy?

  • @slimjimbigfoot589
    @slimjimbigfoot589 2 года назад

    Amazing so stable diffusion helps un clutter all that extra pixel during the process of facial recognition.

  • @serta5727
    @serta5727 2 года назад +5

    So amazing ❤ I love stable diffusion
    Playing around the few last weeks

  • @peekpen
    @peekpen Год назад

    I'll copy your transcript and feed it to open.ai's playground and ask him to re-interpret your addresss for images but for my own audio interpolation in music. Brilliant.

  • @CyberMuzHR
    @CyberMuzHR 2 года назад +15

    Great video! Can anyone recommend any other videos that explain the text encoding and the whole clipping process used to guide the image generation based on input prompt?

  • @toohardtowatch
    @toohardtowatch 2 года назад +3

    What surprises me is how primitive a lot of these techniques seem to be under the hood, and how much further it can obviously be taken. These techniques are still in their infancy.
    For instance, there seem to be a lot of potential image-generating procedures that might converge faster than random high frequency noise. What if there could be stages with simulated random brush strokes, or generating geometric shapes, or input to 3d modelling software. If the tools that humans use to create digital art could be algorithmically leveraged by an AI, if might be even more effective.
    Also, if you could spatially embed the tags in the source image in a way it could be coupled to the segmentation, maybe it could be used as a tool to 'compose' an image. A blob of one color is tagged as a dog, a blob of another is tagged as a bench, and the AI interprets it with those spatially defined weights to start.

  • @FusionDeveloper
    @FusionDeveloper 2 года назад

    Thanks for this video.
    So the Steps is actually the Noise Level.

  • @heurve
    @heurve Год назад +3

    On line 50, i should be changed to t (as we need the FloatTensor) 50: latents = scheduler.step(noise_pred, t, latents)["prev_sample"]

  • @PunmasterSTP
    @PunmasterSTP Год назад

    Stable Diffusion in code? More like “Super great explanation that’s solid gold!” 👍

  • @nkronert
    @nkronert 2 года назад +1

    This is literally the first episode of Computerphile ever that I didn't understand anything of what was explained. And judging from the comments I'm the only one. Looks like I totally missed the boat on this topic.

    • @dibbidydoo4318
      @dibbidydoo4318 2 года назад

      what was confusing?

    • @nkronert
      @nkronert 2 года назад +2

      @@dibbidydoo4318 it wasn't actually confusing because there wasn't anything to confuse. I had literally never heard of these developments before.

    • @zwe1l1nkehaende
      @zwe1l1nkehaende 2 года назад

      @@nkronert this is the followup video on the topic, check out the first one, where the whole thing is explained.

    • @nkronert
      @nkronert 2 года назад +1

      @@zwe1l1nkehaende thanks. I already found it. But I still don't really get it 😊
      Doing some "best fit" on noise until a photorealistic image comes out still sounds like magic to me.

  • @ShankarSivarajan
    @ShankarSivarajan 2 года назад +5

    Another cool thing you can do is _negative prompts,_ that you can put in place of the "unconditioned" embedding.

    • @Onihikage
      @Onihikage 2 года назад +2

      Yep, negative prompts are great for things like getting hands right. It turns out Stable Diffusion, at least the 1.4 model everyone's been using so far, has trouble identifying where a hand or finger is supposed to stop, so you often get hands with too many fingers or fingers coming out of fingers as it keeps trying to "complete" a partial finger. Including a negative prompt for "hands" or "too many fingers" tends to produce much better results.

    • @ShankarSivarajan
      @ShankarSivarajan 2 года назад +2

      @@Onihikage Yes, that is precisely what I use it for too. I expect we got that advice from the same place.

  • @jaymalby
    @jaymalby 2 года назад +11

    Well, xkcd did pick the number 4 by die roll. Seems a random enough seed to me.

    • @reinei1
      @reinei1 2 года назад +2

      I had to scroll far too much to see this mentioned, but yes I agree 4 seemed quite a good random seed there...

  • @miltiadiskoutsokeras9189
    @miltiadiskoutsokeras9189 2 года назад +2

    I don't know if this is more amazing or more frightening. Brilliant stuff.

    • @andybaldman
      @andybaldman 2 года назад +1

      If you aren’t frightened, you aren’t paying attention.

    • @purplewine7362
      @purplewine7362 2 года назад +3

      @@andybaldman if you're frightened, you're a luddite

    • @andybaldman
      @andybaldman Год назад

      @@purplewine7362 Or you've worked in the tech field long enough to know how dangerous this is, and how it will be used against people eventually. As happens with all tech.

  • @vorlon478
    @vorlon478 2 года назад +2

    13:47 reminds me of the wave function collapse algorithm.

  • @aiartbx
    @aiartbx Год назад +1

    Hi Mike. This is the by far the most technically clear explanation of SD that I have seen so thank you for this! Now as you would be aware by now, the art community is up in arms against this tech and I would love to hear your opinion based on the factual knowledge you have.
    The main issue that keeps coming up is that SD tech is art theft because it steals copyrighted artwork then companies profit using the images. Another point artists are making is that SD is just a mish mash collage of original art so nothing generated by Ai is brand new.
    Would you agree or disagree with these points and why strictly based on from your technical knowledge.

  • @6DAMMK9
    @6DAMMK9 2 года назад +1

    Thank you for the SCIENTIFIC video!
    It got outta control after the "novelaileak", which it is very important to leave some information as realistic as it can.
    I'm quite sad about the sub-culture but I still have hope on the artist / researcher to snap out from the chaos.

  • @Jianju69
    @Jianju69 2 года назад +3

    A hybrid frog/snake is properly called a *SNOG*, obviously.

  • @acobster
    @acobster Год назад

    > There are questions about ethics. There are questions about how these were trained. Maybe we deal with them another time.
    I really hope there is a discussion of this at some point. As a discipline that skews very white/male and enjoys relatively posh working conditions, it's very easy to insulate ourselves from the very real problems of the world. And because computers are so powerful it's also simple to automate oppression of many kinds, helping it continue to run smoothly. I think we have a responsibility to talk about these issues and I would love to see this channel model that in a constructive way.

  • @TaranovskiAlex
    @TaranovskiAlex 2 года назад +3

    Awesome explanation, thank you!

  • @YeloPartyHat
    @YeloPartyHat 2 года назад

    Good timing with the NovelAI leaks

  • @jytou
    @jytou 8 месяцев назад

    Excellent explanations, as always! Thanks!

  • @alikaperdue
    @alikaperdue Год назад

    @14:47 - idea: hand draw your animation sequence.Give the first to image and text to AI and get the result. Then hand the resulting image, your next hand drawn frame and the text to generate the 2nd frame. Continue the process so that each new frame is a combination of the last and what you want it to look like combined. In this way the "flicker" might be reduced.
    But I haven't seen what you're talking about. I may be off.

  • @ArcadianCatharsis
    @ArcadianCatharsis Год назад +1

    Such a fun and interesting tool. Wish it wasn't used to do bad things, like stealing people's artworks

  • @briancunning423
    @briancunning423 2 года назад +2

    Great explanation.

  • @thecakeredux
    @thecakeredux 2 года назад

    Only a matter of time until someone adapts this to 3d models. I mean, there are millions of 3d models on the internet in form of assets for all kind of engines and frameworks, all with a description to them, too.

  • @peterw1534
    @peterw1534 2 года назад

    Wow this is actually pretty amazing. Fascinating stuff

  • @cyndicorinne
    @cyndicorinne Год назад

    12:34 beautiful cityscapes 🏙️

  • @gz6963
    @gz6963 2 года назад

    great video and very educational
    I'd love to hear you guys talk about textual inversion

  • @Emperorhirohito19272
    @Emperorhirohito19272 2 года назад +3

    Photoreal rarely works for me because the AI weirdiness is so obvious to the eye. I have really enjoyed creating images with various art styles though, it is extremely good at that. made some really competent artworks that (for me) are indistinguishable from a talented artist.

    • @dibbidydoo4318
      @dibbidydoo4318 2 года назад

      "the AI weirdiness is so obvious to the eye"
      You mean those weird artifacts in the AI caused by Perlin's Noise?

  • @levii2748
    @levii2748 2 года назад

    I was waiting for this 🙏🙏🙏

  • @skyplanet9858
    @skyplanet9858 2 года назад +2

    For those who try the code and get an error with the image putout of the decoder, just add [0], like this
    image = vae.decode(latents)[0]

  • @Lodinn
    @Lodinn 2 года назад +1

    7:20 My man Mike knows that when you use a proper random function, the result would be 4. Guaranteed to be random!

  • @andrewdunbar828
    @andrewdunbar828 2 года назад

    Now Deep Dream Generator has just added a text to image diffusion generator too, and it's actually pretty decent.

  • @semidemiurge
    @semidemiurge 2 года назад

    This was so helpful in understanding this new tech. thank you

  • @gaptastic
    @gaptastic 2 года назад

    this video just put me on a wonderful path, thank you!

  • @RaydenLGX
    @RaydenLGX 2 года назад

    So it is basically a morphing, blending and upscaling algorhythm of compressed/encoded data?

  • @yuxiang3147
    @yuxiang3147 Год назад

    Great video. However, could you explain what this line "latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)" does?

  • @bezmi
    @bezmi 2 года назад +3

    Great video. I would love to see a video about the recent controversy with GitHub copilot and GPL licenses.

  • @brym9159
    @brym9159 2 года назад

    Mike said link to code in description!

  • @MadMan123654
    @MadMan123654 2 года назад +2

    I would do just about anything for more Mike content!

  • @ukranaut
    @ukranaut 2 года назад

    Fascinating.

  • @nocturne6320
    @nocturne6320 2 года назад +4

    Could you do a video about the different samplers? (eg. DDIM, Euler, Euler a, etc.) That part of the process is still a mystery for me

    • @havz0r
      @havz0r 2 года назад +1

      Ddim, euler, lms, heun and dpm all produce identical results. The ones with "a" at the end (euler a, dpm2 a) are ancestral samplers and produce different results

    • @nocturne6320
      @nocturne6320 2 года назад

      @@havz0r I ment how they work under the hood. They've already explained how the network generates images from noise, but not how the different samplers work

  • @uneek35
    @uneek35 Год назад +1

    Would love to see a test to see how it works when it's trained with a limited dataset.

  • @pmo1972
    @pmo1972 2 года назад

    Excellent tutorial. Thank you.

  • @realeques
    @realeques 2 года назад +2

    this whole topic blows my mind even more than when i first heard of deepfakes

  • @joelcarson9514
    @joelcarson9514 2 года назад +1

    Next, we need semi natural language software that can generate 3D vector models for use in Blender or Unreal Engine etc.

  • @rodrigobarraza
    @rodrigobarraza 2 года назад

    Where is the love for Disco Diffusion? I'd argue to be the most power of the bunch in terms of what you can modify within it, especially as programmer and artist.

  • @tbip2001
    @tbip2001 2 года назад

    The future of animation and film making is insane. Combine this with deep fake tech and in a few years you'll be able to produce whatever film you want to see. I think alot of animation studio jobs will be at risk.....

    • @purplewine7362
      @purplewine7362 2 года назад

      no, i don't think these jobs will be at risk for a few decades at least.

  • @timeimp
    @timeimp 2 года назад +1

    "They are essentially the same, but quite different."
    Ah yes, the ol' computer science maxim of "same, but different"

  • @pb-vj1qs
    @pb-vj1qs 2 года назад +3

    The code might have a bug, "TypeError: unsupported operand type(s) for /: 'DecoderOutput' and 'int'" on the line "image = (image / 2 + 0.5).clamp(0, 1)"

    • @alessandro_yt
      @alessandro_yt 2 года назад +1

      Same case here :(

    • @pb-vj1qs
      @pb-vj1qs 2 года назад +2

      change a line before to image = vae.decode(latents).sample, the .sample fixes it but now trying to get it to display

    • @alessandro_yt
      @alessandro_yt 2 года назад +1

      @@pb-vj1qs It worked now, thanks! The image is displayed here...

  • @mylittleparody2277
    @mylittleparody2277 2 года назад

    Thank you for this video, it's really interesting!

  • @ZedaZ80
    @ZedaZ80 2 года назад +1

    7:18 is clearly a reference to xkcd 221

  • @ozorg
    @ozorg 2 года назад

    Great one again!

  • @boringtaskai
    @boringtaskai 10 месяцев назад

    Cool, very clear... but if you run in the notebook in 2024, you need to use the specific diffuser version 0.2.4, !pip install transformers diffusers==0.2.4 lpips accelerate

  • @blenderpanzi
    @blenderpanzi Год назад

    If you mention another video please also link it in the description!

  • @grayaj23
    @grayaj23 2 года назад

    "What amount of frog DO you want in this image?"
    I WANT ALL THE FROG.

  • @Rain-ti7gj
    @Rain-ti7gj 2 года назад

    If you use Artbreeder, it has sliders!

  • @ben_clifford
    @ben_clifford 2 года назад

    3:07 earned my like. I need to go see that now. 😂

  • @Indrikmyneur
    @Indrikmyneur Год назад

    Well, done, I just don't understand how the guiding works. What if I instruct it to create a complex image that certainly wasn't in any training data with many complex relations what should be where in the inquiry? How it can be constructed as a whole instead of creating and merging the parts it may have encountered?

  • @maltimoto
    @maltimoto Год назад +1

    I don't understand at all how the result of this reconstruction process (remove noise) is stored. Sounds a bit like witchcraft to me. Remove some noise, here we go. I mean in which form is the noise reduction saved? In a database? Does it save pixels or what exactly?

  • @SteveGouldinSpain
    @SteveGouldinSpain 2 года назад

    "Picking the nice ones" is doing a lot of heavy lifting here. When you focus more on the failures the truth about this tech becomes more salient.

    • @dibbidydoo4318
      @dibbidydoo4318 2 года назад +9

      nothing is stopping people from testing out the technology itself, SD is open source so there's no truth being hidden.

    • @Onihikage
      @Onihikage 2 года назад +1

      Stable Diffusion is a tool that will greatly accelerate productivity in the creative sector. This is like the invention of the washing machine in terms of how much time this will save artists in producing high-quality work. You can focus more on the result, iterating on layouts and themes at the start, selecting from portions of generated images, adding your own marks and refinements, to produce an image of equal quality to your best work in a fraction of the time. I know several artists who have been quite pleased with the result - one produces serial fiction with illustrations, and Stable Diffusion has allowed them to include many more illustrations in the same span of time, which also leaves them more time for writing.

    • @andybaldman
      @andybaldman 2 года назад +2

      This won’t improve creativity. It will only atrophy people’s actual human creativity by supplanting it with artificial creativity. It’s the same pattern that has undermined many other aspects of humanity in recent years.

  • @t.michaeltracy2046
    @t.michaeltracy2046 2 года назад +4

    Great video, really informative. I was hoping to try out your Google Colab code, although it seems broken at the moment. Are there any updates regarding this announcement regarding the known bugs? "Note: There might be a handful of bugs at the moment. The developers of this stable diffusion implementation keep changing the api. Everyone should know not to make breaking api changes so regularly! I'll do a pass over the code and fix bugs as soon as I can. Am away this week :) thanks to Michael d for bringing this to my attention."

  • @virtualz
    @virtualz 2 года назад

    SD 1.5 is so much better than the previous version, i can make great pics in less than a minute locally (150 steps)

  • @methodof3
    @methodof3 2 года назад

    Cartoons and anime are going to be so amazing in 5 to 10 years

    • @theemathas
      @theemathas 2 года назад

      Anime-style drawings are already a thing and is causing a lot of drama.

    • @bltzcstrnx
      @bltzcstrnx Год назад

      ​@@theemathas well, at least you can have unique wallpapers and profile pictures.

  • @cyndicorinne
    @cyndicorinne Год назад

    I love this

  • @GKinWor
    @GKinWor 2 года назад +1

    thanks for the video

  • @thisismambonumber5
    @thisismambonumber5 2 года назад +1

    for anyone interested the current use standard is using AUTOMATIC1111's (also known as voldy) stable-diffusion-webui

    • @amafuji
      @amafuji 2 года назад

      It's the Gold Standard

  • @PaulFishwick
    @PaulFishwick 2 года назад

    I just watched this video. Obtained a Colab error on this statement: image = (image / 2 + 0.5).clamp(0, 1) . The error was: TypeError: unsupported operand type(s) for /: 'DecoderOutput' and 'int'