Stable Diffusion - How to build amazing images with AI

Поделиться
HTML-код
  • Опубликовано: 1 авг 2024
  • This video is about Stable Diffusion, the AI method to build amazing images from a prompt.
    If you like this material, check out LLM University from Cohere!
    llm.university
    Get the Grokking Machine Learning book!
    manning.com/books/grokking-ma...
    Discount code (40%): serranoyt
    (Use the discount code on checkout)
    0:00 Introduction
    1:27 How does Stable Diffusion work?
    2:55 Embeddings
    12:55 Diffusion Model
    15:00 Numerical Example
    17:39 Embedding Example
    19:37 Image Generator Example
    28:37 The Sigmoid Function
    34:39 Diffusion Model Example
    41:03 Summary
  • НаукаНаука

Комментарии • 46

  • @krajanna
    @krajanna 5 месяцев назад +2

    I am a fan of your work. I read your "Grokking Machine Learning". It's awesome. I am totally impressed. I stopped watching other AI videos and following you for most of the stuff. Simple and practical explanation. Thanks a lot and grateful for spreading the knowledge.

  • @thebigFIDDLES
    @thebigFIDDLES 7 месяцев назад +7

    These videos are always incredibly helpful, informative, and understandable. Very grateful

  • @shafiqahmed3246
    @shafiqahmed3246 2 месяца назад +1

    Serrano you are a genius bro your channel is so underrated

  • @jasekraft430
    @jasekraft430 5 месяцев назад

    Always impressed with how understandable, but detailed your videos are. Thank you!

  • @enginnakus9550
    @enginnakus9550 7 месяцев назад +2

    I respect your concise explaination

  • @wanggogo1979
    @wanggogo1979 7 месяцев назад +2

    Amazing, I hope to truly understand the mechanism of stable diffusion through this video!

  • @avijitsen8096
    @avijitsen8096 6 месяцев назад +1

    Superb, so elegant explanation. Big thanks Sir!

  • @MikeTon
    @MikeTon 4 месяца назад

    Really incredible job of stepping through the HELLO WORLD of image generation, especially how the video compresses the key output a 4x4 pixel grid and clearly hand computes each step of the way!

  • @anthonymalagutti3517
    @anthonymalagutti3517 7 месяцев назад +3

    excellent explanation - thank you so much

  • @kyn-ss4kc
    @kyn-ss4kc 7 месяцев назад +1

    Amazing!! Thanks for this high level overview. It was really helpful and fun 👍

  • @reyhanehhashempour7157
    @reyhanehhashempour7157 7 месяцев назад

    Amazing as always!

  • @skytoin
    @skytoin 7 месяцев назад

    Great video, it gives good intuition to deep network architecture. Thanks

  • @amirkidwai6451
    @amirkidwai6451 7 месяцев назад +4

    Arguably the greatest teacher alive

  • @abhaymishra-uj6jp
    @abhaymishra-uj6jp 5 месяцев назад

    Really amazing work easy to understand and grasp doing a great deal for the community thanks alot..

  • @NigusBasicEnglish
    @NigusBasicEnglish 2 месяца назад

    You are the best expainer ever. You are amazing.

  • @priyankavarma1054
    @priyankavarma1054 7 месяцев назад

    Thank you so much!!!

  • @samirelzein1095
    @samirelzein1095 7 месяцев назад

    Amazing deep dismantling job of complex structures. that s real ML/AI democratization.

  • @qwertyntarantino1937
    @qwertyntarantino1937 6 месяцев назад

    thank you

  • @AravindUkrd
    @AravindUkrd 7 месяцев назад

    Thank you for such wonderful visualization that conveys an overview of complex mathematical concepts.
    Can you please do a video detailing the underlying architecture of the neural network that forms the diffusion model?
    Also, are Generative Adversarial Networks (GANs) not used anymore for image generation?

  • @aswinosbalaji4224
    @aswinosbalaji4224 Месяц назад +1

    In intermediate result it is said that after sigmoid, we will not get sharp image of ball and bat. How can there be fractional pixel values. Since it is monochromatic, it should be either in 0 or 1 right. Rounding off to nearest integer will give same result as before sigmoid. Even if it's not monochrome, pixels can't be in fractions right?

  • @BigAsciiHappyStar
    @BigAsciiHappyStar 2 месяца назад

    Muy BALL-issimo 😄 Loved the puns!!!!!😋😋😋

  • @olesik
    @olesik 7 месяцев назад +2

    Thanks for teaching Mr Luis! I still remember fondly you teaching me machine learning basics over drinks in SF

    • @SerranoAcademy
      @SerranoAcademy  7 месяцев назад

      Thanks Jon!!! Great to hear from you! How’s it going?

  • @melihozcan8676
    @melihozcan8676 5 месяцев назад

    Serrano Academy: The art of Understanding
    Luis Serrano: The GOD of Understanding

    • @SerranoAcademy
      @SerranoAcademy  5 месяцев назад +1

      Thank you so much, what an honour! :)

    • @melihozcan8676
      @melihozcan8676 5 месяцев назад

      @@SerranoAcademy Thank you, the honour is ours! :)

  • @ASdASd-kr1ft
    @ASdASd-kr1ft 7 месяцев назад

    Could be that the diffusion model is trained to learn what amount of noise have to be removed from the input image instead the image with less noise? That is what i understended from others sources, cause they say that that is more easy for the model. Thank you, and good video, very enlightening

  • @hamidalavi5595
    @hamidalavi5595 2 месяца назад

    thank you for your amazing educational videos!
    I have a questions though, is there any transformers (+ attention mechanism) involved in the text2image generator (the diffusion model)?
    If no, then how the semantic in the text is captured??

  • @olesik
    @olesik 7 месяцев назад +2

    So can we just use the diffusion model to denoise low quality or night time shots?

    • @SerranoAcademy
      @SerranoAcademy  7 месяцев назад +1

      Yes absolutely, they can be used to denoise already existing images.

  • @abhishek-zm7tx
    @abhishek-zm7tx 4 месяца назад

    Hi @Louis. Your videos are very informative and I love them. Thank you so much for sharing your knowledge with us.
    I wanted to know if "Fourier Transforms in AI" is in your pipeline. I request you to please give some intuitions around that in a video. Thanks in advance.

    • @SerranoAcademy
      @SerranoAcademy  4 месяца назад

      Thanks for the suggestion! It's definitely a great idea. In the meantime, 3blue1brown has great videos on Fourier transformations, take a look!

  • @AI_Financier
    @AI_Financier 7 месяцев назад

    Finally the diffusion penny dropped for me, many thanks

  • @NVHdoc
    @NVHdoc 7 месяцев назад

    (at 17:25), the image on the right, baseball and bat should have 3 gray squares right? Very nice channel, I just subscribed.

    • @SerranoAcademy
      @SerranoAcademy  7 месяцев назад

      Thank you! Yes, ball and bat should be three gray or black squares. Since these images are not so exact, there could also be dark gray, or some variations.

  • @parmarsuraj99
    @parmarsuraj99 7 месяцев назад +1

    🙏

  • @maxxu8818
    @maxxu8818 5 месяцев назад

    Hello Serrano, is there paper like attention is all you need for Stable diffusion?

    • @SerranoAcademy
      @SerranoAcademy  4 месяца назад +1

      Good question, I'm not fully aware. There's this but I'm not 100% sure if it's the original: stability.ai/news/stable-diffusion-public-release
      I always use this explanation as reference, there may be some good leads there jalammar.github.io/illustrated-stable-diffusion/

    • @maxxu8818
      @maxxu8818 4 месяца назад

      thanks @@SerranoAcademy 🙂

  • @850mph
    @850mph 3 месяца назад

    This is wonderful…
    Perhaps the best low-level description of the diffusion process I’ve seen….
    But discrete images of bats and balls represented as single pixels- are a long way away from a PHOTO REALISTIC pirate standing on a ship at sunrise.
    What I can’t get my head around is how these discrete images (which actually exist in the multi-dimensional data set space) are combined, really, grafted together (parts pulled from each existing image) into a single image with correct composition, scaling, coloring, shadows, etc.
    If I lay even a specifically chosen (by the NN) bat and ball pictures over each other to produce a “fuzzy” combined image (composition) and then use another NN to sharpen the fuzzy image into a crisp composition with all the attributes defined in the prompt and pointed to by the embeddings….
    There’s still too much magic inside the DIFFUSION black box which I just don’t understand…. Even understanding the denoising and self-attention processes.

    • @850mph
      @850mph 3 месяца назад

      I guess what I have not been able to determine after watching maybe 30-35 hours of Diffusion videos.. is specifically how the black box COMPOSES a complicated scene BEFORE the process begins which “tightens” the image up by removing noise between the given and target in successive passes of the decoder.
      I get the fact (one) that the prompts correspond to embeddings, and the embeddings point to some point in multi-dimensional space which contains all sorts of related info and perhaps a close image representation of the prompted request….. or perhaps not.
      I get the fact (two) that the diffusion process is able to generate virtually any complicated scene starting from random noise when gently persuaded to a target by the prompt….
      What I don’t understand is how the black box builds a complicated FUZZY image once the various “parts” of the composition are identified.
      Does the composing process start with a single image if available in the dataset and scale individual attributes to correspond with the prompt…?
      -or-
      Does the composing process start with segmented attributes, scale all appropriately, and combine into a single image…?
      A closer look at how the scene COMPOSITION works would be a great addition to your very helpful library of vids, thnx.

    • @850mph
      @850mph 3 месяца назад

      Ok… for those with the same “problem…”
      The missing part, at least for me, is the “classifier” portion of the model which I have NOT seen explained in the high-level Diffusion explanation vids.
      This tripped me up…
      Here is good vid and corresponding paper which helps understand the “feature” set extraction within the image convolution process which penultimately creates an “area/segment aware” data-set (image) which can be directed to include the visual requirements described in a text prompt.
      ruclips.net/video/N15mjfAEPqw/видео.htmlsi=6sZxibtFvjrVNHeE
      In a nutshell… the features extracted from each image are MUCH more descriptive than I had pictured allowing for much better interpolation, composition and reconstruction of multiple complex forms in each image.
      Of course the queues to build these complex images all happen as the model interpolates its learned data, converging on the visual representation of the text prompt, somewhere in the multi-dimensional space which we can not comprehend… so in a sense it’s still all a black box.
      I don’t pretend to understand it all… but it does give the gist of how certain abstract features within the models convolutional layers blow themselves up into full blown objects.

    • @850mph
      @850mph 3 месяца назад

      Another good short vid which shows how diffusion accomplishes image COMPOSITION:
      ruclips.net/video/xtlxCz349WU/видео.htmlsi=PJl_vWueiQdZxLn1

    • @850mph
      @850mph 2 месяца назад

      Another good vid which gets into composition:
      ruclips.net/video/3b7kMvrPZX8/видео.htmlsi=AwNQJAjABKn-iV4F

    • @850mph
      @850mph 2 месяца назад

      Another good set of vids which get into IMAGE COMPOSITION:
      ruclips.net/video/vyfq3SgXQyU/видео.htmlsi=ShiOXaQH_0baU8Z-
      Especially helpful is the last vid.. url posted above.