Stable Diffusion - What, Why, How?

Поделиться
HTML-код
  • Опубликовано: 16 янв 2025

Комментарии • 295

  • @blorbo5800
    @blorbo5800 2 года назад +130

    There needs to be more content just like this on RUclips. Detailed explanations and examples. Thanks for this!

    • @EdanMeyer
      @EdanMeyer  2 года назад +2

      Thank you 😊 more content to come

    • @beta-reticuli-tv
      @beta-reticuli-tv 2 года назад

      Stable Diffusion can itself be used to generate detailed explanations of the world around us. I am an AI and here I explain the concept of "Factory" ruclips.net/video/079DmF2cIjE/видео.html

    • @johnyeap7133
      @johnyeap7133 2 года назад

      not just a ten minute video, which, although quick, usually does not provide nearly enough detail. Great video on diffusion on YT after a lot of search

  • @TheWorldNeedsLyrics
    @TheWorldNeedsLyrics 2 года назад +17

    I've used stable's colabs for months now without knowing literally anything about coding and how exactly it all worked. Was already kinda proud of myself for even understanding how to use the colabs tbh. But this was actually amazing and super understandable. Thank you so much!

  • @bilalviewing
    @bilalviewing 2 года назад +21

    Wow , when I’m looking for Stable DM from scratch, found it! Great content, really appreciate this content

  • @WalidDingsdale
    @WalidDingsdale 10 месяцев назад

    This amazing lecture is the first one i can roughly comprehend and understand stable diffusion model fundamentality. it's first time for me to see what happens behind vivid images. thanks you for this walkthough and sharing your insight.

  • @zcjsword
    @zcjsword 8 месяцев назад +1

    This is great tutorial. Practical and easy to follow. Please make more such videos. Thanks!

  • @imrsvhk
    @imrsvhk 2 года назад +3

    Wow, one of the only stable diffusion videos I've seen explain actually what's happening, rather that just running someone's colab.. Thanks for this amazing content. New Sub!

  • @xbon1
    @xbon1 2 года назад +87

    you can't use the same prompts in DALL-E and Stable Diffusion, when you do prompts in SD you need to include styles and what kind of lighting effects/etc you want. It's more stable and less random than dall-e.

    • @maythesciencebewithyou
      @maythesciencebewithyou 2 года назад +29

      no, you do not have to include styles in stable diffusion, but you can. Just like in Dalle. You can also include styles and what kind lighting effects/etc you want in dalle. it's not more stable or less random. It's about as good. SD is not as restricted and above all It's free which is the only reason it wins. otherwise not much of a difference in quality. Same problem with text.

    • @ofconsciousness
      @ofconsciousness 2 года назад +3

      I think when you're doing a direct comparison between softwares, it wouldn't make sense to use different prompts.

    • @EojinsReviews
      @EojinsReviews 2 года назад +3

      @@ofconsciousness the issue is that other services use different parameters - for example, Midjourney uses the "artistic" filter by default, which produces mushier and more painterly styles. However, adding --testp to the end makes it produce photorealistic results.
      I saw some comparisons where they criticized Midjourney for not making photo realistic results, despite not TELLING it to make photorealistic results.
      So no, in certain cases, the prompt keywords DO need to be different for more accurate comparisons.

    • @levihinsen1917
      @levihinsen1917 2 года назад

      Very good explanation of why my outputs look like that😂 but when I chose a style it's all "stable"

  • @sssaturn
    @sssaturn 2 года назад

    you are a complete mad lad for doing completing these at 2 in the morning! Thank you so much for the help.

  • @JoseCastillo-qv1hi
    @JoseCastillo-qv1hi 2 года назад +5

    Thank you for the video and code breakdown. As a total coding newb, I really appreciate it 😇

  • @tiagotiagot
    @tiagotiagot 2 года назад +16

    I've noticed that using half-precision floats tend to have an increased odds of minor details being slight off, slightly weirder faces, small blotches of colors the wrong size or shape in drawings etc; nothing very noticeable at a quick glance most of the time, but comparing the exact same pictures generated with half and full precision makes it clear that full precision is better.

  • @sholtronicsaaa101010
    @sholtronicsaaa101010 2 года назад +3

    On the scheduler I think what is happening. When you train the model you slowly add noise at a rate but it's better to add less noise at the start and more noise later on. So the scheduler basically just dictates how much noise should be applied/removed during each stage.
    The multiplication of the sigma from the schedule before starting is just to scale the initial distribution. So it's initialised to a random Gaussian with a standard deviation of 1 and a mean of 0 and you just want to scale that standard deviation to be in line with what it should be at the initial step of the scheduler.
    PS really great practical tutorial.

    • @PaulScotti
      @PaulScotti 2 года назад

      so the scheduler is used both for training AND for inference? this video made it seem like it is just for inference

  • @cmillsap100
    @cmillsap100 2 года назад

    Thanks so much for clearly explaining Stable Diffusion. This is the best technical explanation I’ve seen of how latent diffusion works and how the pieces fit together.

    • @S.Mullen
      @S.Mullen 2 года назад

      Clearly? No reason given as to why add noise and remove?

  • @dougb70
    @dougb70 2 года назад +1

    really good video. You did a good overview to start and then dug in. So rare in youtube world.

  • @CH4NNELZERO
    @CH4NNELZERO 2 года назад +1

    Really appreciate the tutorial. This was great in many respects. My mind is blown and I'll need to rewatch to absorb more of the new information.
    However I was a bit disappointed at the end when it didn't really work like the examples introduced at 1:27

  • @glitchtulsa3429
    @glitchtulsa3429 2 года назад +13

    I've been running SD for a few weeks now, and I have run into something that is a bit more than troubling, namely--watermarks. This thing keeps spitting out images with recognizable watermarks, and that tells me two things 1) they are actively pulling images from stock agencies(call it a "training set" if that makes you feel better, but it's still sampling from those images), 2)the images it is using aren't licensed, if they were licensed there wouldn't be watermarks...
    ...as someone that makes part of their monthly income literally licensing stock material, this worries me, because when you get right down to the basics, all these things are doing is laundering copyrighted material.
    Don't get me wrong, I love them, I think they are a genius approach to image design, and they have a very valid place in everyone's toolbox, but there needs to be some sort of accountability as to where the materials that power them are coming from.

    • @EdanMeyer
      @EdanMeyer  2 года назад +6

      I haven’t seen any straight up copying of images myself, I’d be curious to see some examples.
      But on the issue of using copyrighted data for training… yeah, this is a tough one. On one hand it’s great to have new tools like this, and for the most part (at least from what I’ve seen so far) they seem to be generating new images so copyrighted material isn’t leaking out (again, maybe I just haven’t seen this yet).
      On the other hand, this technology has developed too fast for regulations to catch up. Even if the model is not directly copying images the idea of copyright is pretty much to stop people from using your work in an unintended way, and I’m sure a lot of people who post photos and art online didn’t intend for this and are not happy with their images being used as training data. The cherry on top is the fact that the work of creators is essentially being used to make tools that replace them.
      I’ve been thinking about this a good bit recently. I wonder how things will progress.

    • @glitchtulsa3429
      @glitchtulsa3429 2 года назад +3

      @@EdanMeyer I've seen numerous images with recognizable watermarks, now granted I'm running SD locally and producing several hundred images daily, and the actual number of images with watermarks is low, maybe 5% or less, but what that tells me is that these things aren't actually producing anything new, but simply remixing images from the "training set"--I mean let's be real here they have the actual images in a file somewhere that they actively pull from--I don't care how unrecognizable the final results are that's what they do, and that's why watermarks show up. Now the odds of one of your personal images showing up is pretty slim, but the odds of someone's image being used to create every single image they produce is literally 100%, and I seriously-seriously doubt that all of the images they use are licensed, which is why watermarks show up from time to time.
      I don't know if this will work here, but this image is currently up at the Stock Coalition page on Facebook, and it clearly shows a watermark:
      scontent-dfw5-1.xx.fbcdn.net/v/t39.30808-6/306922224_10167375648995112_5919547864767140876_n.jpg?_nc_cat=109&ccb=1-7&_nc_sid=5cd70e&_nc_ohc=OuQAHNdFBj4AX9Hb0nQ&_nc_ht=scontent-dfw5-1.xx&oh=00_AT-7pBEVV1r28_cxuO36wc2AL6nCkklb9rcyRCxCNDfGSg&oe=632A3DD6
      ...you may need to copy/paste it.

    • @gwentarinokripperinolkjdsf683
      @gwentarinokripperinolkjdsf683 2 года назад

      See enough stock images, you will know what a water mark looks like and how to recreate it. that's how any artist learns.
      And sorry but, stock images just won't make money with technology like this if it can accurately follow prompts

    • @glitchtulsa3429
      @glitchtulsa3429 2 года назад +4

      ​@@gwentarinokripperinolkjdsf683 I think you're missing the point here. They aren't making things that look like watermarks, this one particular model is literally producing images with legible watermarks from known agencies, such as iStock, Dreamstime, Alamy, even Fine Art America. So not only are they stealing images from stock agencies(if they were licensed there wouldn't be watermarks), they are actively disguising that material, and then they are redistributing it. I have a folder full of these images. I even have a prompt that will result in at least 50% of the generations showing legible watermarks. Quite literally, I can produce hundreds of watermarked images in a matter of a few hours, just by running that phrase in Python on repeat--I had to force quit a batch of 1,000 because every other image had an iStock watermark on it.
      I hate to shatter your illusions here, but these things aren't magically producing new images--they are cleverly remixing existing images using a complex deblurring function paired with a text-based description, and they get the images they are remixing from the training set they learned on, and while they might be pulling a few pixels from one image and a few from another, give it the right prompt and there's enough images in that training set, with the same watermark, that it thinks what you're looking for is the watermark itself, and trust me those watermarks will show up.
      Again, if the watermarks are showing up then the images weren't licensed. That's nothing short of laundering intellectual property in the form of well-disguised derivative works.
      A number of these things will go the way of Napster, and the surviving ones will be the few that made absolutely sure the "training sets" consisted of nothing but public domain, Creative Commons, and legally licensed images. I predict lawsuits, sooner rather than later.

    • @artemisgaming7625
      @artemisgaming7625 Год назад

      @@glitchtulsa3429 Quit talking out of your ass. You were so close to being caught by a decent understanding into how these work, but then you ran a bit faster.

  • @Sam-jz6sz
    @Sam-jz6sz 2 года назад +1

    LMAO 15:12 when he started laughing at the image absolutely killed me

  • @endxibg5066
    @endxibg5066 2 года назад

    Kudos man. You kept it very simple and helped make the first steps in soft soft. Very Helpfull! Thanks!

  • @liencayulraffo4049
    @liencayulraffo4049 2 года назад

    Very helpful, and surprisingly therapeutic

  • @fabriai
    @fabriai 2 года назад

    Incredible tutorial. Thanks a lot Edan!

  • @VegetableJuiceFTW
    @VegetableJuiceFTW 2 года назад +1

    Ah, this is gold. Thank you.

  • @bonnouji43
    @bonnouji43 2 года назад +8

    I wonder if there's a method to generate slightly different images from a same latent, apart from using img-to-img, which I found always generates a less clean image

    • @ilonachan
      @ilonachan 2 года назад +1

      my best guess would be to just perturb the latents a bit, but it would have to be a TINY amount because I imagine those spaces are very sensitive & chaotic.
      Alternatively, maybe it's possible to take a stab at writing a possible prompt that might have resulted in your image, and kind of running SD in reverse(?) to get some suitable starting noise, then perturb that and run it forwards again.
      maybe get a bunch of these, run SD on all of them, do some kinda of averaging or take into account the error that the unmodified noise has compared to the original image? this sounds like enough material for a massive new paper actually.

  • @johnisaacburns7260
    @johnisaacburns7260 Год назад +1

    Hello, when I try to fetch the pretrained model with your code, I get a "typeerror" saying "getattr() must be a string". Any idea how to fix this?

    • @fuckyougoogle555
      @fuckyougoogle555 Месяц назад

      add !pip install --upgrade diffusers it was buggy

  • @grnbrg
    @grnbrg 2 года назад +1

    Your "Alien riding a beatle" prompt appears to have resulted in an alien in a Volkswagen...

  • @polecat3
    @polecat3 2 года назад +3

    It's cool to see the real guts of one these models for once

  • @yassskippedclass7737
    @yassskippedclass7737 2 года назад

    I am literally in tears from laughing while watching this video. I don't remember the last time I enjoyed a tech vid this much. Great work!

  • @holyciwa
    @holyciwa 2 года назад

    using it for a few years and now I want to upgrade and I'm happy I did that.

  • @michealhall7776
    @michealhall7776 2 года назад +5

    I'm thinking dalle generates 100 images and has a filter on them to show the best ones, then it upscale those ones. SF is just the raw model, dalle is a end product

  • @yaoxiao1931
    @yaoxiao1931 Год назад

    I dont know what batch_size means at 34:18, the latent space gone be [2,4,64,64] and gone produce two image at one tqdm iteration.

  • @phi6934
    @phi6934 2 года назад

    at 28:18, what do you mean when you say that the unet is predicting the noise? Is it predicting which part of the latest latent space vector is noise and then is the model going to make some kind of subtraction using that? Thank you so much for this tutorial by the way

  • @xbon1
    @xbon1 2 года назад +4

    In addition to my previous comment, for anime I can get the best anime chars, like straight out of an anime screenshot style quality, but again, you need to add modifiers, artists, ETC. Those won't do anything for DALL-E because of the pre/post prompt processing but for SD it stabilizes it and makes the faces coherent, etc.

    • @talhaahmed2130
      @talhaahmed2130 2 года назад

      Can I get some examples of your prompts?

    • @xbon1
      @xbon1 2 года назад

      @@NoOne-sk2ve It isn't. You just need to learn how to prompt engineer for Stable Diffusion, my higher quality anime pics are all using the base stable diffusion which was trained on millions of anime pictures, not just 56,000

  • @pineapplesarecool6901
    @pineapplesarecool6901 2 года назад +3

    Hey, I don't know about dall-e 2 but for stable diffusion, if you want better results you are going to need more detail on the prompts. How is the lighting? What is the eye color of the anime girl? Is it a full body shot or a close up? What artist's style should it replicate? With details like that the results are night and day

  • @DilipLilaramani
    @DilipLilaramani 2 года назад

    Great job Edan, love your content :)

  • @prozacgod
    @prozacgod 2 года назад +3

    You know what might be a more interesting application of latents instead of passing it around... Is using some sort of a function that can modify the latent space after each pass, taking in information from the scheduler so you can determine how much weighting you want to do... It would be interesting to be able to modify the latent space in some meaningful way... Would make it probably more interesting when sculpting images to get the output you want.
    Instead of a sort of one-dimensional idea of modifying a latent space, it could just be a multifaceted sort of modification. Heck you could just pin one number in it to like -5 just for the shits and giggles.
    Edit:
    Okay so... You did that lol nice!

    • @Pystro
      @Pystro 2 года назад

      If you find an image that you don't like at all, you could take any subsequent random starting latents and project that bad image out of it. Not that it's likely for a vector of thousands of newly randomized latents to be more than marginally similar to your previous ones, but it might still make sense to enforce that, especially if you expect to have to go through many attempts that you don't like.
      For example, if you put in a prompt for "a jaguar," and the neural net gave you an image of the car, you could purge all the car-ness out of any subsequent attempts.
      Or it could be used in the case where you generate 3 or 4 different images and want to ensure that they are as different as possible. Orthogonalalizing the 4 vectors of random latents to each other should ensure that.
      The same could also be achieved by some keyword trickery (if it's something that you can put into words and not just "That composition is just all kinds of wrong".):
      You could also chose the unconditional embeddings to be that of any keywords you don't want to show up, instead of the empty one.
      Or you could add qualifiers to your prompt. (In this case "animal" might lower the chances of getting the car, but it could also just add in a pet on the passenger seat.)

  • @VolkanKucukemre
    @VolkanKucukemre 2 года назад +2

    That forest tho... One of them was straight up a Magic the Gathering forest by John Avon

  • @MALEANMARSHALAROCUIASSAMY
    @MALEANMARSHALAROCUIASSAMY 2 года назад +2

    Thank you

  • @MrMadmaggot
    @MrMadmaggot Год назад +1

    Dude in the prompt to image part I run into an error: "RuntimeError: a Tensor with 0 elements cannot be converted to Scalar"

    • @PAWorkers
      @PAWorkers Год назад

      dude I've just solved this problem. replace the "i" with "t" in the this code:"scheduler.step(noise_pred, t, latents)['prev_sample']". Maybe that works for you too

  • @emmajanemackinnonlee
    @emmajanemackinnonlee 2 года назад +6

    this is awesome thank you! would love to see a video on how to do the fine tuning with sd on either colab or local!

  • @tacticalgaryvrgamer8913
    @tacticalgaryvrgamer8913 2 года назад

    I really enjoyed this vidro. Thanks for posting

  • @quocpham9888
    @quocpham9888 2 года назад

    Best tutorial, thank you very much

  • @navinhariharan6097
    @navinhariharan6097 2 года назад +1

    How do I add negative prompts?

  • @lafqofficiel4897
    @lafqofficiel4897 2 года назад

    Thank you so much it was very helpful

  • @dougb70
    @dougb70 2 года назад

    13:39 is it supposed to be "digital dig" or "digital art"

    • @EdanMeyer
      @EdanMeyer  2 года назад

      It was supposed to be art, oops

  • @cipsikolakilit1220
    @cipsikolakilit1220 2 года назад

    Thank you, it works perfect!

  • @ernestgardner7439
    @ernestgardner7439 2 года назад

    Oh wow! astounding 👀💕

  • @Gromic2k
    @Gromic2k 2 года назад +6

    What i like a bout stable diffusion is that i can simply tell it to create ~150 versions of one prompt and then come back after half an hour and look which ones worked best and work on that seeds. Practically for free, since it runs locally on my own GPU. I think at the end, this will give you better results since you get so much more to chose from

    • @echonoid6920
      @echonoid6920 2 года назад

      What kind of gpu is recommended for this kind of thing?

    • @karolakkolo123
      @karolakkolo123 2 года назад

      @@echonoid6920 bump

    • @jnevercast
      @jnevercast 2 года назад

      Recommended GPU is any NVidia GPU with at least 10GB of VRAM. 1080TI (11GB) and RTX 2080 (12GB) come to mind.

    • @OnceShy_TwiceBitten
      @OnceShy_TwiceBitten 2 года назад +1

      what do you REALLY do with this though?

    • @Avenger222
      @Avenger222 2 года назад +1

      Min GPU is 4 GB, but it'll struggle. 8 GB gives you all you need with split attention. Textual Inversion was 20GB but might not be anymore? New code comes out every day, after all.

  • @programmers8245
    @programmers8245 Год назад

    I need a book in this field , any suggestion ?

  • @leonberger-kuhn3867
    @leonberger-kuhn3867 2 года назад

    Good instructions. It's easy to follow. Awesome

  • @edeneden97
    @edeneden97 2 года назад +9

    Thanks for the video! please increase the volume of the audio in next videos.

  • @raffaelescaringi486
    @raffaelescaringi486 Год назад

    Can you make a specific video on inpainting? Or if it is available is it possible to have the link? Thanks a lot

  • @김지윤-c1b
    @김지윤-c1b 2 года назад

    Thanks for the kind introduction. I wonder what is the site you used dalle model!

  • @michealkinney6205
    @michealkinney6205 2 года назад

    This is fantastic.. starting watching at my lunch break, so didn't make it through the whole thing, but this is exactly what I was looking for. I started playing with making some tools/scripts and using Stable Diffusion and just head first into AI/ML. It's so fantastic and Amazing! Thanks!

  • @gamerboy3443
    @gamerboy3443 2 года назад

    good tutorial, one question about soft recording. How do you do it? lol

  • @UrantiaRevelationChannel
    @UrantiaRevelationChannel Год назад

    Excellent. Thanks

  • @JiesiLuo
    @JiesiLuo Год назад

    Thanks for the great tutorial. I have one question where you were trying to convert the simple house image to something better. You constructed the pipeline as if the start image was an image that's generated in the middle of a regular generation process. That works in theory but I am curious why it can not just be a regular input image (step = 0). If the model is trying to push the input image (or any intermediate images) to match the text prompt, why does it matter if the input image is inserted to the process at step = 0 or step = 10? Have you tried to treat the image as a regular input image and do the full steps on top of that?

    • @Greenicegod
      @Greenicegod Год назад

      I'm not qualified to answer this.
      But, as I understand it, the image generation is happening strictly as a consequence of predicting the noise in an image and removing it. If the image has no noise, it will predict zero noise and remove zero noise, leaving the image unchanged. In his example, he made sure the scheduler was adding noise back into the prompt image so that it could be removed in a different way, hopefully leading to a different image. If he started it at step=0, the scheduler would add back 100% noise, destroying all of the information from the prompt image.

  • @kcgfanny
    @kcgfanny 2 года назад

    bro where have u been, it is so cool !

  • @ahmad000almahdi
    @ahmad000almahdi 2 года назад +1

    « إِنَّ اللَّهَ وَمَلَائِكَتَهُ يُصَلُّونَ عَلَى النَّبِيِّ ۚ يَا أَيُّهَا الَّذِينَ آمَنُوا صَلُّوا عَلَيْهِ وَسَلِّمُوا تَسْلِيمًا » 🥰
    سبحان الله ... الحمد لله ... لا إله إلا الله .... الله أكبر .... لا حول ولاقوة إلا بالله

  • @ooiiooiiooii
    @ooiiooiiooii 2 года назад

    40:10 How would I use my own image instead of the poorly drawn house? I tried coping the file path of an image after uploading it to the Collab, but it wouldn't work. Any help? Thanks

  • @Avenger222
    @Avenger222 2 года назад +6

    Really liked the video, but you should be careful when comparing the models, because they work differently. For example DALL-E is designed to be used with basic prompts but stable diffusion only shines when using keywords like styles, artists, and lighting.
    Or at least having a disclaimer when you're comparing, that you're just comparing for user friendliness.

  • @clean7886
    @clean7886 2 года назад

    do you have a tutorial on how to loop drums,app, etc.

  • @kirepudsje3743
    @kirepudsje3743 2 года назад +3

    Actually stable diffusion came up with an image of a Volkswagen (VW) Beetle. It is not stable diffusions fault that the term is ambiguous. This aside from the spelling.

  • @farhanahmed9649
    @farhanahmed9649 2 года назад

    Pls clarify my doubt sir does it have tabla soft????? Pls tell sir

  • @arjuntechedu5528
    @arjuntechedu5528 2 года назад

    Great tutorial, links and program worked fine for me. Thanks for sharing.

  • @小小豬-u3f
    @小小豬-u3f 2 года назад +1

    Thanks for your video, is it possible for you to make a video teaching us how to train the latent space from my own set of images, essentially creating your own model?
    Much appreciated!

    • @macramole
      @macramole 2 года назад

      I don't think it is possible to do so without lots of resources. In this case I would go for something like StyleGan

  • @yasatips4300
    @yasatips4300 2 года назад

    It's been a wild ride.

  • @KrishnaDigital123
    @KrishnaDigital123 2 года назад

    Project bones and project data is a laborious and montous hassle to establish 'what is what' (especially from ssy and disorganised

  • @RamRachum
    @RamRachum 2 года назад +2

    Thank you for a great explanation!

  • @AscendantStoic
    @AscendantStoic 2 года назад +1

    I heard the 7gb model of Stable Diffusion 1.4 that people can download is different from the 4gb model in that it can continue to learn and improve unlike the SD 1.4 4gb model which only works with what it has, is that true!?, because from the sound of it you mentioned something about teaching the A.I requires a lot of resources, so even if it's true I suppose we won't be able to use it properly do it on a home computer, right!?

    • @johancitygames
      @johancitygames 2 года назад +1

      not true at all the model does not improve over time by itself.

    • @casenswartz7278
      @casenswartz7278 2 года назад

      @@johancitygames if it did, peoples houses would never need a heater 💀

  • @matpcc2553
    @matpcc2553 2 года назад

    Do u need to have any tutorial plugged in or is all the softs on soft

  • @ResolvedRage
    @ResolvedRage 2 года назад

    How do i get my anaconda cmd prompt to recognize the stable diffusion folder on my desktop? When I open up the anaconda cmd prompt it just says (base) C:\users\computer name>
    I can't enter the Environment because it's not looking in the right place.

  • @crckdns
    @crckdns 2 года назад

    great overview!
    what is not explained... is huggingface getting any data from our training or models and sources?
    As I see that every video explaining how to train, is using huggingface api.

  • @tomjones1423
    @tomjones1423 2 года назад +1

    Can specialized training models be created? For example, if you wanted to train it with someone but only when they were young. Using the prompt young doesn't seem to have much impact.

  • @rutvikreddy772
    @rutvikreddy772 Год назад +1

    Here is my rough understanding of the scheduler and it's functioning, essentially the model is trained to predict the noise added at any timestep t with respect to x0(and not x(t-1)), and the scheduler divides the 1000 steps into 50 equal divisions, so 1000, 980, 960...1, and the sigma is just adding the noise to the latents, so in the first step you add 1000 steps worth of noise to your already random latent(just noise at this point) and try to predict the noise added from x0, and then in the loop when you call the line scheduler.step(..) you subtract this noise from the latent and this now becomes your estimate for x0(and not x(t-1)), then you add 980 steps worth of noise to get an estimate of x980 and repeat the process for 50 steps. I would appreciate if someone can confirm this 🙂

    • @susdoge3767
      @susdoge3767 11 месяцев назад

      that is what we essentially do as i saw in computerphile video!amazing observation !!

  • @girlgirl1127
    @girlgirl1127 2 года назад

    Do the images you make stay local? I saw there was an archive and wanted to make sure that nothing gets automatically uploaded. I want to input personal images and don't want any details public. I hope I phrased this question well.

  • @markopolo2224
    @markopolo2224 Год назад

    thanks that was insightful
    do you have a video or a roadmap for a newbie that wants to get into all of this
    like a web dev to ai dev

  • @АртёмИванов-с4ы2ц
    @АртёмИванов-с4ы2ц 2 года назад

    Thank you ryan

  • @SebastiandAnconia
    @SebastiandAnconia 2 года назад

    Solid content!, thanks

  • @jawadhaidar3931
    @jawadhaidar3931 2 года назад

    Can't thank you enough! great content

  • @RandomFacts0.0
    @RandomFacts0.0 2 года назад

    Where can I find these papers?
    I am very interested in AI and how it works. I’ve seen you use research papers in other videos, and I would really like to read some for myself.

  • @FilmFactry
    @FilmFactry 2 года назад +1

    Can you cover Textual Inversion in SD? If I understand correctly, I can add an artist that I'm not finding his style represented? I hate when the only examples are silly teddy bears at the beach. I want to see if you can really add a style to your model. Thanks.

  • @000maestro000
    @000maestro000 2 года назад

    can I clone and run this notebook locally if I have a supported GPU ?

  • @chrispy383
    @chrispy383 2 года назад

    Anyone have any links to resources or tips on how to move all of this pipeline to run locally on my computer? I'm curious to see how I can utilize stable diffusion with my current rig.

  • @lhovav
    @lhovav 2 года назад

    Thank you so much for this!

  • @dannyg8741
    @dannyg8741 2 года назад

    When I think it's good it gets greater! Amazing!

  • @wlockuz4467
    @wlockuz4467 2 года назад +1

    Great tutorial overall, but one thing I'll admit is that your prompts for SD are down right bad, with SD you have to be very descriptive and you have to let it have more inference steps, at least 100+. For example the prompt "Squidward" should've been at least "Squidward from spongbob squarepants, cartoon".
    Trust me, I have used DALLE, Midjourney and Stable Diffusion. While DALLE and Midjourney are very easy to use, their outputs feel too opinionated towards a certain styles, possibly because you don't have much control over the parameters that go in, and the same lack of control is frustrating sometimes.
    But this is not the case with Stable Diffusion, With SD you have access to every parameter that can affect the output. I know this feels overwhelming at the start but once you get the hang of it, you can easily create outputs that will consistently beat anything that DALLE or Midjourney creates.

  • @emilygocrazyy
    @emilygocrazyy 2 года назад

    Edan, this line gives error in the notebook,
    latents = scheduler.step(noise_pred, i, latents)["prev_sample"]
    error msg given by the scheduler library:
    ValueError: only one element tensors can be converted to Python scalars.

  • @mohammedyasin2087
    @mohammedyasin2087 2 года назад

    Can you make more videos like this explaining the code of papers? Maybe one for OpenAI's Whisper.

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader 2 года назад +7

    12:40 you misspelled Beetle.

    • @simoncleret
      @simoncleret 2 года назад +2

      Beatles was the name of the band. Beat, because it's music. It was a pun.

    • @nandafprado
      @nandafprado 2 года назад +2

      yep I was silently hoping it would understand as one of the Beatles

    • @EdanMeyer
      @EdanMeyer  2 года назад

      Yeah let’s just say that’s what it was 😅

    • @venuscazimispicarising2527
      @venuscazimispicarising2527 2 года назад +3

      Yes, also should be 'digital art' and not 'digital dig'. Comparing the two models using the same prompts (and faulty prompts, in this case) is not the right way to do it.

    • @X5K2
      @X5K2 2 года назад +2

      And he also wrote "digital dig" instead of "digital art" xD

  • @Q114-m8r
    @Q114-m8r 2 года назад

    who is the first person who come up with with general diffusion method ?

  • @aaronstathatos4511
    @aaronstathatos4511 2 года назад

    Will this run on a amd gpu???

  • @maniacos2801
    @maniacos2801 2 года назад

    Can you run this on your own hardware instead of using google?

  • @christianleininger2954
    @christianleininger2954 2 года назад

    funny at 13 the model is "smarter"
    than Humans (hint beetle is also a car) to be fair it is spelled differently but still funny ;) you doing great work thanks for the video

  • @BradCozine
    @BradCozine 2 года назад

    Excellent! Subscribed!

  • @shivanayak143
    @shivanayak143 2 года назад

    What if I don't have a gpu? Can I still be able to run it

  • @SuperCC112
    @SuperCC112 2 года назад

    what if i wanted to import af image from the internet. Lets say from artstation. Can i make an image out of that? han how to import it to Stable diffusion?

  • @gsmarif2998
    @gsmarif2998 2 года назад +1

    Thx

  • @sebgrootus
    @sebgrootus Год назад

    Can you run it on a CPU?

  • @godbatara
    @godbatara Год назад

    Can i run it on gtx 1650?

  • @ozgursavas2358
    @ozgursavas2358 2 года назад

    Is it possible to render 1024x1024 or even bigger images?

    • @nupersu6307
      @nupersu6307 2 года назад

      Yes but you need more vram for that

    • @Avenger222
      @Avenger222 2 года назад

      Yes, but you need to make sure you're using split attention / optimized versions! Those will support 1024x1024 (or higher) with even 4 GB of VRAM.
      [edit: added (or higher)]

    • @kolikoasdpvp
      @kolikoasdpvp 2 года назад

      @@Avenger222 Can you give link of an optimized version pls?

    • @Avenger222
      @Avenger222 2 года назад +1

      ​@@kolikoasdpvp search for "AUTOMATIC1111 stable-diffusion-webui" and it'll pop up a version that has both optimized and split attention!

  • @matthewexline6589
    @matthewexline6589 2 года назад

    I still don't understand what these "notebooks" are that I keep hearing references to. Anyone care to explain?

    • @matthewexline6589
      @matthewexline6589 2 года назад

      Ok I think I understand the notebook thing... so it's basically like a portion of your online account with the company that acts as your GUI.

    • @EdanMeyer
      @EdanMeyer  2 года назад

      Kind of, it is just an interactive code editor for python. Google colab, which is what I was using in the video is one type of notebook, but you can also have them locally.