How Stable Diffusion Works (AI Image Generation)

Поделиться
HTML-код
  • Опубликовано: 25 июн 2023
  • Get NordVPN 2Y plan + 1 months free here ➼ nordvpn.com/gonkee It’s risk-free with Nord’s 30-day money-back guarantee!
    Support me on Patreon: / gonkee
    Kaggle fish dataset: www.kaggle.com/datasets/croww...
  • НаукаНаука

Комментарии • 220

  • @Gonkee
    @Gonkee  11 месяцев назад +12

    Get NordVPN 2Y plan + 1 months free here ➼ nordvpn.com/gonkee It’s risk-free with Nord’s 30-day money-back guarantee!

    • @jungervin8765
      @jungervin8765 11 месяцев назад +2

      Could you do more videos about 2D soft bodies, 2D physics simulation? Your videos super helpful.

    • @algee2005
      @algee2005 3 месяца назад

      that was an interesting experience... i've seen some videos on this topic, most left me with more question marks then i had before. but this was somewhat like opening my brain and pouring a full book in there. like in matrix. i can't really tell what i've just learned, just that it was alot, and that it will be helpful later on. you surely are some dark magic ninja teacher :D

  • @ahumon9947
    @ahumon9947 11 месяцев назад +154

    It’s insane just how informative this is. When I watch a Gonkee video, I expect a funny video, but this has taught me more than any wikipedia article ever could.

    • @treasureking8142
      @treasureking8142 3 месяца назад

      Agreed 👍

    • @dreamyrhodes
      @dreamyrhodes 3 месяца назад

      To be fair WP articles are mostly pretty shit on anything unless you already bring basic knowledge in the methodology a certain science field uses, it's pretty hard to make anything of a detail out of it.

  • @ujjwalkumar-uf8nj
    @ujjwalkumar-uf8nj 11 месяцев назад +24

    Buddy, youtube needs more creators like you, I watched a lot of videos on stable diffusion but non of them made this much sense, please upload more deep learning content like this, the way you summarized computer vision is just amazing.

  • @KBRoller
    @KBRoller 3 месяца назад +17

    An important aspect of this that a lot of people misunderstand is that the things which are learned are kernels and visually semantic text embeddings. Which are very different from just memorizing/copying the training data, despite so many people insisting these models are, in fact, just copying/stealing training data. This is much more akin to how humans learn to make art: studying existing works, extracting those observations into features and techniques (kernels), learning how words describe visuals by... learning to talk, basically (visual semantic embeddings), and then combining it all to make new images. Although the diffusion model is more like "learning to remove stuff that doesn't match the requested visuals", there's a famous quote attributed to Michelangelo about how his sculpting is just "removing the marble that is not David" from the block -- very similar.

    • @algee2005
      @algee2005 3 месяца назад +3

      yea, by that logic every artist is basically stealing art, because he learns on and from already existing art and have that art influence his own work. those thinking ai is stealing art need to reconsider the way art is actually made. people don't understand the concept at hand, and they are afraid of things they don't understand, and that is the actual problem here.

    • @KBRoller
      @KBRoller 2 месяца назад +1

      @@algee2005They're afraid of the consequences of capitalism, but they either don't realize that or they don't want to blame capitalism, so they project their fears onto the AI that's making them face it and make up rationalizations for why they think we should fear the AI.
      Capitalism is what enforces the idea of "if you can't sell it, then it's not worth doing and you deserve to die"; AI isn't what says that.

    • @Katatonya
      @Katatonya Месяц назад +1

      @@algee2005Yeess! That was exactly what I was thinking when I saw artists say I demand to get payed for these models trained on my art. But duude what about all the people that like your art and are learning how to replicate it, should they pay you as well? NO, lmao. You put it up for the public. One consequence of that is that other people (meat or machine) can learn your style. And I'm pretty sure he himself learned from others.

    • @user-uj2iq7rc3p
      @user-uj2iq7rc3p 5 дней назад

      Although i get the idea that "
      An important aspect of this that a lot of people misunderstand is that the things which are learned are kernels and visually semantic text embeddings" - what I do not understand is in the training process how text is injected (it is almost at the end 29:37)"

    • @KBRoller
      @KBRoller 4 дня назад

      @@user-uj2iq7rc3pLike it says in the video: where an attention head would normally calculate its key and query vectors from the same input -- the same text, for instance, or the same image -- these models instead calculate the query vector from the image tokens, and the key vector from the text embeddings. As such, at each attention block, both the image and text semantics are used when determining the meaning of the overall prompt.

  • @andrewstephens9787
    @andrewstephens9787 7 месяцев назад +16

    please make more videos like this regarding AI. I've learned so much from this in 30 min than the past 3 hours I've been studying. Informative, entertaining and hilarious. You sir a gem 🙌

  • @notanengineer
    @notanengineer 11 месяцев назад +36

    Yay Gonkee video

    • @JohnJCB
      @JohnJCB 9 месяцев назад

      Yay Gonkee video

    • @Omsip123
      @Omsip123 7 месяцев назад

      Yay Gonkee video

    • @mister_meatloaf
      @mister_meatloaf 16 дней назад

      Yay Gonkee video.

  • @dexterathferth
    @dexterathferth 8 месяцев назад +4

    Excellent, excellent video. So refreshing to have someone who pulls back a bit from the camera and doesn't talk in an obnoxious, gamer way, and instead explains very clearly and precisely, using helpful accompanying materials. Definitely one of the best deeper SD explanations.😃

  • @SombreroMan716
    @SombreroMan716 11 месяцев назад +6

    Before watching this video, I had no idea how any of this stuff worked. You were able to explain these concepts to me in a way that many other videos failed, and in only 30 minutes as well. Wow.

  • @NotTofuFood
    @NotTofuFood 11 месяцев назад +33

    I found your channel close to 3 years ago off a reddit post and commented on your very first video. I did not know this was the same person going into this video, and I was absolutely MIND BLOWN. Insane progress and I am so proud of your videos. This video was incredible and actually blew my mind. Great job.

  • @danieljunginger
    @danieljunginger 5 месяцев назад +3

    21:01 yo I had to stop the video just after you said 'if you don't get anything from here on now don't stress out...', to tell you that I think you are amazing for daring to making videos like this, where you maybe don't have the biggest audience (yet), but provide people who actually care with so much value. I feel like you really stand out with this combination of explaining it in easy terms (in relation to the complexity of the topic) but sufficiently enough to give others an actual grasp of the topic and enabling people to make more advanced research afterwards. I think that in some weird sense the saying: "Give a Man a Fish, and You Feed Him for a Day. Teach a Man To Fish, and You Feed Him for a Lifetime". describes pretty well how I think about your content. Thank you very much for your efforts Gonkee and keep it going!

  • @chemaguerra1635
    @chemaguerra1635 11 месяцев назад +19

    Absolutely wonderful effort. Much liked and appreciated, Gonkee.

  • @naytron210
    @naytron210 8 месяцев назад +3

    This is, by far, the best explanation I've come across in my search to understand how SD is doing what it does. Bravo sir, and thank you!

  • @allygg
    @allygg 9 месяцев назад +1

    I've spent the last few weeks consuming 10s of hours of content on deep learning from neural nets through transformers and found this when trying to get into stable diffusion. Honestly, this is some of the best informative content I've ever consumed - totally outdoes everything else on coursera/youtube/wherever for visual explanation. Plus you got bants as a bonus, what a guy, please keep it up!

  • @Nuwiz
    @Nuwiz 10 месяцев назад +2

    Thanks for the video!
    The receptive field of each convolution is also effectively enlarged by doing several convolutions in sequence, not just by pooling operations.

  • @MrRobot-xs5jf
    @MrRobot-xs5jf 8 месяцев назад +1

    Literally the BEST video on this topic on the sight, everyone else fills too much jargon but you keep it straight forward and break down concepts Very well

  • @nintishia
    @nintishia 11 месяцев назад

    This is the first video I have watched on this channel and I am absolutely wowed by the way you have explained it all. Thanks a ton.

  • @BinaryDash
    @BinaryDash 11 месяцев назад +7

    I should note that if you are having trouble connecting to a wireless printer (or a chromecast in my case) check that you're _not_ connected to a VPN

  • @ingyukoh
    @ingyukoh 10 месяцев назад +1

    very impressed by Gonkee's intuitive grasp of semantic segmentation and U-net. One of the best lecture

  • @arnavjain7564
    @arnavjain7564 11 месяцев назад +1

    This was a fabulous explanation! Makes it way more trivial to wrap my head around!

  • @DruidpathXo2
    @DruidpathXo2 9 месяцев назад +2

    This is an amazing video, I've been struggling with the concept, well, and laziness but this video has inspired me to continue ny project which contains some aspect of stable diffusion, thank you

  • @CrushedAsian255
    @CrushedAsian255 11 месяцев назад +1

    This is such an impressive video, I remember you from your first videos and it’s so impressive how far you’ve come

  • @hurtjonnegut
    @hurtjonnegut 10 месяцев назад +1

    Thank you for the encouragement, I’m at the 21 minute mark and I’ve struggled despite your clear and straightforward explanations thus far. Great video.

  • @thrillscience
    @thrillscience 4 месяца назад

    Thank you! I've read half a dozen blog posts where it was clear the author had no idea how it all worked, then finally I saw your video and got a great overview of the whole thing!

  • @ManderO9
    @ManderO9 11 месяцев назад +7

    bro god bless you for how much effort you put into this, you have learned what people study in uni for years, i look up to you, thank you so much for the educational content

  • @BiosensualSensualcharm
    @BiosensualSensualcharm Месяц назад +1

    27:51 break; literally 😂 ❤love your style. Congratulations

  • @3KnoWell
    @3KnoWell 4 месяца назад +1

    Excellent presentation. I especially like your statement, "If you have made it this far in the video? That's already pretty impressive."
    What is truly impressive is the level of detail that you deliver in a short amount of time.
    Thank you,
    ~3K

  • @Zylop6
    @Zylop6 11 месяцев назад +2

    Congratz on the 110.000 my dude, absolutely crazy!

  • @WebGrrrlToni
    @WebGrrrlToni Месяц назад

    Gonkee, you are a gifted trainer/explainer! This video makes so much sense (even though I am a total noob) because your visualizations walk through the concepts so beautifully. I am so glad I found this!!
    I now follow you and look forward to reviewing your other vids. Thank you for helping me (not a math person, but an artist and visual learner) become more familiar with how AI image generation works.

  • @OgatRamastef
    @OgatRamastef 5 месяцев назад

    It clarifies a lot about how these generative ais works. Thank you!!

  • @meeradad
    @meeradad 3 месяца назад

    You have an amazing ability to teach with visuals to make intricate concepts accessible.

  • @pretzelbat.m
    @pretzelbat.m 7 месяцев назад +2

    Amazing job breaking this down, highly comprehensible!

  • @inzaynclasses7129
    @inzaynclasses7129 5 месяцев назад

    the best video on stable diffusion out there , that deserve so much more what it gets now , i can see how much efforts he put frame by frame

  • @rahulpant9807
    @rahulpant9807 11 месяцев назад +4

    wake up babe new gonkee vid just dropped

  • @saeidanwar8587
    @saeidanwar8587 11 месяцев назад +4

    One of the best video tha I have ever seen about AI.

  • @pharos640
    @pharos640 11 месяцев назад +1

    Finally, you didn't make the video too informative or too funny. I like this kind of videos. Keep up the good work! (im waiting for 1M subs)

  • @greenrabbit4075
    @greenrabbit4075 10 месяцев назад +1

    the "2 hours to connect to a printer" got me! Yesterday I tried it for the 4th time with my super smart hp knobhead. The little smart effer finally made it ^^

  • @pcguidelk
    @pcguidelk 3 месяца назад

    Appreciate your work. Although I could not grasp many things explained, I realized this is a darn lot of processing & amazing to just being able to type and get a video clip or an image!

  • @PaperpapaSmurf
    @PaperpapaSmurf 9 месяцев назад +1

    this blew my mind! But was totally digestible. Thanks for making this!

  • @svg98
    @svg98 10 месяцев назад

    Awesome explanation, really clear and excellent visuals. Sure to be a source of information for many in the future, thanks!!

  • @DrDaab
    @DrDaab 9 месяцев назад

    Terrific Explanation, Thanks ! Now I understand how clicking a button will give you a verbal description of an image (just the reverse of the text prompt).

  • @dirtandpines
    @dirtandpines 9 месяцев назад +1

    Thanks for making this! Learned a ton and didn't need to get a computer science degree first to understand it!

  • @TheGamingHungary
    @TheGamingHungary 3 месяца назад

    Even though I probably won't make use of the knowledge in the close future, I watch your videos in general becase the topics and your explanation are top notch. I really wish we had uni teachers/tutors like you. Straight to the point, no bullshit, keeping it interesting, maintaining the attention of the viewer, well done! 10/10

  • @lukerobertson01
    @lukerobertson01 2 месяца назад

    This really made a lot of sense. I understood self-attention before, and now I magically know cross-attention! Thanks

  • @thomas.milburn
    @thomas.milburn 11 месяцев назад +2

    Great video very well explained. Would love to see a similar video on LLMs

  • @imKaku
    @imKaku 9 месяцев назад

    Thanks a lot for the video, this was really informative overview of the things i feel that i need without diving into the direct math, since there is a few years since ive done it by now ...
    Definitely went quite a bit into things ive budget into and have had questions on how they work together in a conceptual level.

  • @TupacsStepSisterlocoman
    @TupacsStepSisterlocoman 11 месяцев назад +1

    awesome video, helped me grasp a pretty complex concept

  • @robbysun3137
    @robbysun3137 3 месяца назад

    Bravo! Well done. The right dose of mathematics and the mention of U-Net make it really easy to understand the basic concepts of how stable diffusion works.

  • @endgameyt7735
    @endgameyt7735 6 месяцев назад +1

    Simply the best video to stable diffusion I found so far. Thanks a lot! I like your humor.

  • @JasemMutlaq
    @JasemMutlaq 2 месяца назад

    Stable Diffusion finally make sense. I watched many other videos and they always talk about "denoising" an image which makes absolutely no sense when it comes to image generation (by itself). This is exactly when some creators explain generative AI as just "next-token-predictor" as if this is sufficient to explain how it really works. Thank you for the wonderful video and the geniuses behind these technologies.

  • @thefellowbreather
    @thefellowbreather 4 месяца назад

    This is next level! I owe my understanding of these fundamentals to you. Keep up the amazing work bro!

  • @greenlight2k
    @greenlight2k 4 месяца назад

    The "London - England + Japan = Tokyo" example really blew my mind, because it makes totally sense, yet I never saw this logic approach this way before.
    Think of "London" and every aspect of it: This is the Vector (Mind-Map). Now remove the Vector of "England" from that London-Vector,
    and you will be left with the most prominent aspect of it: "A Capital City". Add the "Japan"-Vector to that and you basically end up with "Capital City of Japan"

  • @dexterman6361
    @dexterman6361 11 месяцев назад +2

    Wow this video is amazing, really well made, and informative, thank you

  • @rishiktiwari
    @rishiktiwari 10 месяцев назад

    Excellent Explanation, fully worth the time. Thanks a lot 🙏🏼

  • @seyedmatintavakoliafshari8272
    @seyedmatintavakoliafshari8272 11 месяцев назад +3

    I've seen your previous videos and, as a grad student majoring in AI, I'm very impressed by how you delivered! Learned a lot here, so definite thumbs up and share!!

  • @pranavarora1799
    @pranavarora1799 10 месяцев назад +1

    Great explanation for people of all levels. Keep it up!!

  • @chunji2321
    @chunji2321 11 месяцев назад +4

    Love your hoodie

  • @RohanVetale
    @RohanVetale 7 месяцев назад +1

    Damn the video is detailed but at the same time easy to understand for a beginner person like me. Thanks for the video!

  • @FreePal334
    @FreePal334 28 дней назад

    Finally, i have some intuition of what query key value might mean 🙏

  • @Grow_Channel
    @Grow_Channel 9 месяцев назад

    The only explanation which I was looking for, Ausome Brother❤

  • @jamesli6875
    @jamesli6875 9 месяцев назад +1

    This is really good! Deep,conceptually accurate, and clear! Thank you for sharing!

  • @Gamamaha
    @Gamamaha 3 месяца назад

    Wow your explanations are brilliant, thank you.

  • @cekuhnen
    @cekuhnen 9 месяцев назад

    Super detailed video - very much enjoyed it!

  • @gameofpj3286
    @gameofpj3286 11 месяцев назад +2

    Amazing! Thank you so much for making this :D

  • @ThankYouESM
    @ThankYouESM 11 месяцев назад +1

    I'm working on a very new approach I call Focus-Select-Image-Generation, which is an image-evolution process from often clicking real-time generated thumbnails almost like the image-enhancing process. Main difference is... we can have full control to get what results we want, although not as a text-to-image generator. My version will still require A.I. for it to capture important details such as correct light refractions, correct proportions, correct shadowing... etc.

  • @neonelll
    @neonelll 11 месяцев назад +1

    Insane explanation. One of the best.

  • @connor4440
    @connor4440 11 месяцев назад +2

    Crazy good video nice job gonk

  • @MohamedKrar
    @MohamedKrar 10 месяцев назад

    A great self explanatory video full of information...

  • @ALDUIINN
    @ALDUIINN 7 месяцев назад

    excellent video, i'm ready to go in the most absolute techinical part of it, thanks

  • @laura-gm5ur
    @laura-gm5ur 7 дней назад

    thank you very much for this video. It was so informative and soo well made. Thank you!!

  • @FunGuyInDFW
    @FunGuyInDFW 10 месяцев назад +2

    @Gonkee
    This is a great video. However, I've watched a lot of videos on Stable Diffusion recently, and I've found that there are a few important questions that none of them have addressed. Perhaps you could make an addendum and address these questions because I'm sure many people want to know.
    1. Is the "seed" for Stable Diffusion the same as an image containing maximum noise?
    2. Does Stable Diffusion use a randomly generated image with maximum noise as the starting point for the new image it will generate?
    3. If so, are the pixels in this maximum noisy image truly random, or do the pixels already have a pre-chosen mathematical relationship to adjacent pixels or to images from the training library?
    4. The training of the neural network appears to work in 2D with the pixels of the training images, but the images are usually 2D representations of 3D objects. Does the network develop an understanding of the 3D characteristics of the objects in the images, or does it strictly build up probabilities of pixel colors (or something else) based on pixel color patterns from the training images?
    5. When building a new image, does Stable Diffusion just find ways to insert the pixels of components or subcomponents from objects in images from the training library into the new image, or does it build the images strictly based on the likely pixel colors (pixel by pixel) deduced from the neural network, or some other process entirely?
    6. Do you think the way Stable Diffusion builds new images is similar to how the human brain can see objects or animals in a Rorschach inkblot test or in the shape of clouds in the sky, or would you say the process is not similar?

    • @sallami6627
      @sallami6627 6 месяцев назад +1

      Thanks for asking these questions as many of them have perplexed me and I cant find any resources that give an explanation.

  • @evaar440
    @evaar440 10 месяцев назад

    bruh .. I can't describe in words how amazing this video is!

  • @Eswarramesh2428
    @Eswarramesh2428 11 месяцев назад +5

    Love your channel bro

  • @hendrysconovianus8546
    @hendrysconovianus8546 Месяц назад

    This is very informative!! Thank you very much!

  • @davidniquot6423
    @davidniquot6423 6 месяцев назад

    Very good explanation .. ! Thanks.

  • @itssoaztek4592
    @itssoaztek4592 11 месяцев назад

    Thank you for this masterpiece of STEM education. Incredibly well made.

  • @musthavechannel5262
    @musthavechannel5262 2 месяца назад

    Great explanation. Thank you

  • @StonerSquirrel
    @StonerSquirrel 5 месяцев назад

    Basically like traditional painting using the ala prima method, but instead of handling brush strokes everywhere in the canvas evenly, it's a handles every damn pixel in the screen.

  • @sandanable
    @sandanable 2 месяца назад +1

    Can you please share how you did research for this video or what sources did you use most heavily?

  • @FergalByrne
    @FergalByrne 11 месяцев назад

    Sure you’re not some future tech AGI sent to help us get to your timeline? No other explanation for such an extensive lossless compression into 30mins. Bravo!

  • @pedrogorilla483
    @pedrogorilla483 5 месяцев назад

    I’m revisiting this video after having learned ComfyUI. Much easier to understand Stable Diffusion now.

  • @feztrath9470
    @feztrath9470 11 месяцев назад

    best explanation i've watched

  • @carlovonterragon
    @carlovonterragon 10 месяцев назад

    Very good explanation ☺️ thanks

  • @Sebastian-lw1ei
    @Sebastian-lw1ei 2 месяца назад

    Excellent primer!

  • @artmusic6937
    @artmusic6937 5 месяцев назад

    great video. Could youplease provide relevant papers for each concept you explained?

  • @walkieer
    @walkieer 10 месяцев назад

    This is really good. I actually got most of it until the cross attention part.

  • @RandomZzzz
    @RandomZzzz 10 месяцев назад

    This is pure quality this is insane.I want you to explain transformer architecture for next

  • @1218omaroo
    @1218omaroo 4 месяца назад

    To me, the base processes now make sense. Thank you, very much-appreciated.

  • @laucoinna2415
    @laucoinna2415 3 месяца назад

    Such a wonderful explanation. Bravo! I learned a lot from you. I wish i could find your personal website or google scholar😅

  • @sagsolyukariasagi
    @sagsolyukariasagi 10 месяцев назад

    what was the sources that you used to get this information? How can get a clear explanation on NNs ?

  • @adrianfels2985
    @adrianfels2985 23 дня назад

    17:28 - What Input did you gave the network here?
    Just noise? Because the network denoised images which didn't exist before didn't they?

  • @billy.n2813
    @billy.n2813 9 месяцев назад

    Thank you very much for this.

  • @haedrichowen
    @haedrichowen 4 месяца назад

    This is one of the best videos on ai

  • @yehonatan8371
    @yehonatan8371 Месяц назад

    Great video, thanks!

  • @lebronjames635
    @lebronjames635 11 месяцев назад +2

    I understood like half of it, can’t wait to start with CSE 👍 great video regardless

  • @christopherhornle4513
    @christopherhornle4513 10 месяцев назад

    So what you mean at around 30:00 is that the Unet is trained (VAE frozen) to reproduce the clip images injecting the clip text embeddings into the Unet Layers via cross attention? Is that correct? @Gonkee @Everyone

    • @user-is7fl8zx9h
      @user-is7fl8zx9h 10 дней назад

      CLIP is a dual-purpose encoder that can encode text sentences and images. During the training process, an image encoding is calculated with CLIP and injected through the U-Net. During the inference process, the user inputs a text prompt that is encoded by CLIP. This text encoding is used for cross-attention during the U-Net denoising process. With cross-attention, the denoising process steers to match the text prompt. Also, the self-attention layer steers toward a more coherent image, attending image tokens to other image tokens. There are some modifications to boost the ability of self-attention, such as PAG (perturbed attention guidance).

  • @sizex1966
    @sizex1966 8 месяцев назад

    Thankz 4 uploading, I'm having fun with these Ai generators, I saw a Diffusion footage a month ago & said 'how' funny enough I shot some footage 2day that I hope I will b able 2 put Diffusion 2 work, thankz 4 the explanation, it makes sense 2 me cause I'm a computer geek from college dayz 1984 when I left & I've been working with them until now, the future is Ai like your self I'm not concerned about Ai taking over, cause u can just pull the plug.
    Guidance & Protection
    Mannerz & Respect!!

  • @theworldsco-creators7073
    @theworldsco-creators7073 9 месяцев назад

    Amazing video. Thanks

  • @anomalousdelirium
    @anomalousdelirium 2 месяца назад

    It's kinda amazing that i'm also learning this video using neural network of my brain.
    Like, AI exist because of brain to begin with afterall.

  • @MikeSieko17
    @MikeSieko17 3 месяца назад

    query key and value are just SVD, they correspond like query = U, key = Sigma and value = V 26:12

  • @alaad1009
    @alaad1009 5 месяцев назад

    This man has an IQ of 200 and explains complex stuff like a champ !