What is Stable Diffusion? (Latent Diffusion Models Explained)

What's AI by Louis-François Bouchard

Просмотров 91 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 ноя 2024

Комментарии • 68

@WhatsAI Месяц назад
Get your copy of "Building LLMs for Production": amzn.to/4bqYU9b
@paralucent3653 2 года назад ⁺³³
Thank you for cutting through the hype. The aim of every new AI model is to do things not just better but also more efficiently than the competition. In that respect Stable Diffusion wins hands down. SD is also free of the censorship that is hampering users of the other models, whose content policy is so vague that they don’t know if they are violating it or not.
@WhatsAI 2 года назад
Agreed! Thanks to Emad and everyone behind SD.
@WhatsAI 2 года назад ⁺⁴
References:
►Read the full article: www.louisbouchard.ai/latent-diffusion-models/
►Rombach, R., Blattmann, A., Lorenz, D., Esser, P. and Ommer, B., 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684-10695), arxiv.org/pdf/2112.10752.pdf
►Latent Diffusion Code: github.com/CompVis/latent-diffusion
►Stable Diffusion Code (text-to-image based on LD): github.com/CompVis/stable-diffusion
►Try it yourself: huggingface.co/spaces/stabilityai/stable-diffusion
►Web application: stabilityai.us.auth0.com/u/login?state=hKFo2SA4MFJLR1M4cVhJcllLVmlsSV9vcXNYYy11Q25rRkVzZaFur3VuaXZlcnNhbC1sb2dpbqN0aWTZIFRjV2p5dHkzNGQzdkFKZUdyUEprRnhGeFl6ZVdVUDRZo2NpZNkgS3ZZWkpLU2htVW9PalhwY2xRbEtZVXh1Y0FWZXNsSE4
►My Newsletter (A new AI application explained weekly to your emails!): www.louisbouchard.ai/newsletter/
@WhatsAI 2 года назад
To get better!
@WhatsAI 2 года назад
Oh wow, that is a lot of hate in a single message. I’m sorry you cannot stand how I speak. It is hard for me to speak a second language and I do my best at it. Hopefully will get better over time as I am now also able to chat with people in English, which will surely help too.
@WhatsAI 2 года назад ⁺¹
Wow, that is a first! I actually also write the articles if you hate the voiceover so much, you don’t have to listen to it.
Really surprised that it is that hard to understand too! I’m sorry that a video can hurt you this much.
@WhatsAI 2 года назад
I’m not sure whether to be amused by your comments or be sad about your reality that is causing you to insult a random person online like this. I hope you’ll figure how to feel good and be happy as much as I am! And maybe you should try focusing on yourself for a little while but you should talk to a specialist and not listen to me.
@neelbanga 2 года назад ⁺⁴
Nice! Your quality is great. I’m trying to get my quality to this level on my own RUclips channel
@WhatsAI 2 года назад ⁺¹
Thank you! I am sure you can do even better haha!
@DarkNinja-24 2 года назад ⁺³
This video is short and super to the point!
@WhatsAI 2 года назад
That was the goal! We don’t play around 😎
@Mutual_Information 2 года назад ⁺⁵
Just discovered this channel - well done! Excellent coverage of Stable Diffusion. I like that you didn't skimp on the technical details
@WhatsAI 2 года назад ⁺¹
Glad you think so! That’s exactly my goal :)
@WhatsAI 2 года назад ⁺¹
Hey guys! Please have a look at Qwak's website for me, I'm grateful to have my friends sponsoring this video and I'm sure their tool will be useful to some of you :)
www.qwak.com
@nicholaszustak6299 2 года назад ⁺¹
I deleted my comment after asking my colleagues a similar question, but before realizing you replied, since I didn’t want to add to anyone’s confusion with the question 😅
But, I was able to read most of your response through the YT notification. Thank you for replying!
@WhatsAI 2 года назад ⁺¹
My pleasure! :)
@RegularRegs 2 года назад ⁺²
Hey, I think you may have gotten one fact wrong. The diffusion models were trained on "hundreds of GPUs" I'm sure, but I don't think it's running the prompts through a bunch of GPUs, or we would probably get them back instantly. I say this because I have the free GRisk Stable Diffusion, that one uses your own graphics card (GTX 1080 in my case) and it works just as good if not better than some of the others. It's just limited to 512x512. But most of them are, unless you upscale it within the model. GRisk is limited, but it's really good and I encourage everyone to try it, especially if you have a beefy GPU. And if you don't you could use Topaz Gigapixel AI trial mode to upscale your 512x512 images.
@WhatsAI 2 года назад
You are right and I may have said that wrong, sorry for this error in the video! I haven’t heard of GRisk, thank you for sharing, will check it out!
@juz882010 2 года назад ⁺¹
this tool is about to end careers, its incredible. The size of the models are only about 2-5GB which is just insane. I wonder if you can train it with videos and noise/denoise the frames.
@Sammysapphira Год назад
There are custom models out there that are 40gb+ trained on 'special' art websites that exhaustively tag all of their art with various highly descriptive parameters. The results are better than ever but most of the models are more or less 'secret'
@bluesailormercury Год назад
@@Sammysapphira what models are these?
@1Shalnark1 2 года назад ⁺³
I wonder if there will be a rough sketch to detailed image AI in the future.
@WhatsAI 2 года назад
Stable diffusion can in fact already do this! You have an image to image gesture which you can feed a rough sketch and it works quite well. Not perfect obviously but really cool!
@juz882010 2 года назад
can do a very simple image (stick figures) and it will spit out gold.
@Kaijufruit Год назад
There is an app/platform that you can get on the waitlist for which does this. I forgot the name but you can Google it
@joncapriola5019 2 года назад ⁺²
This rocks thank you for the insight
@WhatsAI 2 года назад
Always my pleasure Job! 😊
@maltimoto Год назад ⁺¹
To learn reconstruct the original image....I would be very interested how this "learn result" is saved because it has to be save somewhere. In a database? Since an image can have let's say 12 megapixels, how is this even possible to save the learn process? This is not clear to me, in which form the result is stored. As a 3D model?
@TheChinese97 3 месяца назад
Isn't it saved via model weights?
@WhatsAI 3 месяца назад
Yes it is saved in the whole models weights, which has millions of parameters and does transformations through simple functions. Adding up all those functions and memorized parameters, you are able to reconstruct an image starting from the right point in a « latent space ». This is basically a very tiny representation of this image that is learned during training, which moves thanks to the text prompt you give it or a conditional image :)
The model then reconstruct the image thanks to the millions of functions together that basically represents a very complex function, which, in theory, could « predict » any signal (if big enough and trained enough). And in this case our signal is a 12MP image :)
@brosjay2131 Год назад ⁺¹
please I can't find the video you discussed attention as stated at 04:54. Can you show me or you just mentioned it on the side in the other video?
@WhatsAI Год назад
Hi! I discuss the attention process in two videos.
Here a long time ago with transformer network: ruclips.net/video/sMCHC7XFynM/видео.html
And here more recently when vision transformers were introduced:
ruclips.net/video/QcCJJOLCeJQ/видео.html
@brosjay2131 Год назад ⁺¹
@@WhatsAI Thank you very much
@tobyeglesfield4403 2 года назад ⁺²
Can someone explain to me or point to a very high level explanation of how the text prompt is combined with the latent space data to create a new image?
@abail7010 Год назад ⁺⁴
You transform the text into token embeddings the same shape as the image. This aims to put the text information into a higher dimensional space. Then you add this information to the latent representation of the image by either multiplication, addition or other techniques. To not overwrite the complete model, skip connections are used.
Hope that helped a bit!
@tobyeglesfield4403 Год назад ⁺²
@@abail7010 Thank you - that does help a bit - this aspect seems to be less covered in the high level explanations so I appreciate that - cheers.
@arifkuyucu 10 месяцев назад
Holy shit! Brother, I just wanted to know what Stable Diffusion is! What did you do like that!
@M_Jema4703 2 года назад ⁺⁸
Impressive man🔥
I just tried to generate an image using this tool hosted on hugging face. Guess what im in the long queue 😂 people are going crazy
First, text-to-text language models like gpt...
Next, text-to-image models like dall-e...
I think text-to-video is upcoming...
@WhatsAI 2 года назад ⁺¹
Oh yes it is! In fact there was the Transframer model shared a few days ago by deepmind doing just that haha, and another one too. They are just the first steps but it is definitely coming
@M_Jema4703 2 года назад ⁺²
@@WhatsAI yeah I read about it. For now it can only generate a short clip in low resolution. We may see high resolution full movie in near future. Having all these tools make me wonder how AI can compromise some jobs and may disrupt some industries. Time will tell
@WhatsAI 2 года назад ⁺¹
Indeed, only time will tell!
@deadbeat_genius_daydreamer 2 года назад ⁺¹
How do you compare it with SDEdit paper from Stanford University??
@WhatsAI 2 года назад ⁺¹
I actually covered SDEdit on my channel! Stable diffusion is different in the way that it learns through a dataset to denoise an input (in this case an image, in the latent space (encoded space using a VAE)) using UNets that will learn Gaussian parameters to remove the noise step by step and then you can simply send noise and have images in milliseconds. SDEdit works similarly but within the image space directly, so much slower, and uses stochastic differential equations to sample Gaussian parameters to learn to remove the noise instead of unets to predict Gaussian parameters.
@deadbeat_genius_daydreamer 2 года назад ⁺¹
@@WhatsAI thanks for reply, really appreciate your time, would you make a video or blogpost about how researchers Comes up with neural network architecture pertaining to specific job like GANs, diffusion model etc, I'm really curious to know how they approach the problem and then work their way out of it.
@WhatsAI 2 года назад
All my pleasure, thanks to you for following my work!
That is a great subject to cover, thank you for the suggestion! I feel it will be quite complex, maybe an interview format would work best 🤔
Would love to know what you think about how should this be done for the best possible outcome video format.
@deadbeat_genius_daydreamer 2 года назад ⁺²
@@WhatsAI yes please as long as it serves the purpose, the intuition or key Idea behind their approach. Interviewing will help to gain an insight into the approach they follow.
@漢聲黃 Год назад ⁺¹
Hi Louis, thanks so much for your great video :D, I saw a somehow similar model for the brain anomaly detection and segmentation from the paper "Fast Unsupervised Brain Anomaly Detection and Segmentation with Diffusion Models". From what I observe it also utilizes encoder decoder architecture with the latent diffusion model to learn the latent distribution. My question is, during the training can I train the encoder decoder (VQ-VAE) separately from the diffusion model? let say, I first train the VAE model and then freeze the VAE weight to train my diffusion model. I'm not sure how it's done for this paper, I've been checking the code but looks like the autoencoder and diffusion model here are trained separately, but I might misunderstand the code :D Thank you
@WhatsAI Год назад ⁺²
Hi! Thank you very much.
From what I understood, that are being trained separately! :)
The VAE was trained to encode and decide a signal only and then the diffusion part is trained using fixed VAEs !
@WhatsAI Год назад
Are you working in the medical field? Because I am too and the paper you referred seems pertinent for my work haha!
@漢聲黃 Год назад ⁺¹
@@WhatsAI Thanks for your fast response Louis, yes I've been working on unsupervised brain anomaly segmentation, and the paper I mentioned seems to be one of the newest MICCAI 2022 related to the topic :D. I'm currently doing the code re-implementation but there is one section of the inference trick mentioned in the paper I still cant reproduce T.T Please let me know if you also decided to re-implement the paper :D
@WhatsAI Год назад
Will do! Could you message me on twitter, LinkedIn or by email? Would love to work on that with you if you are working using hit or just share results! I also work in a very similar application so definitely worth staying in touch!
@jimj2683 Год назад ⁺⁴
I understood nothing.
@Komandant_Drakov 10 месяцев назад
Same to me
@nixonmanuel6459 2 месяца назад
Wow!
@ZainAli-wd4or Год назад ⁺¹
😊
@nedsilva1238 2 года назад
░p░r░o░m░o░s░m░ 🎶
@RamuluUppala-uy9dp 7 месяцев назад ⁺¹
😮😮🎉q
@lorrislegal2500 10 месяцев назад
tu serais pas français par hasard hahaha
@WhatsAI 10 месяцев назад
Quasiment! Québécois :)
@geonsuresh2924 2 года назад ⁺²
first
@WhatsAI 2 года назад ⁺¹
Congrats!! 😉
@privacyadvicegermany4257 Год назад
this is way too complicated to understand
@HOWTOMAKEMONEYONLINE-js6zf Год назад ⁺¹
Sorry I would love to watch your video, but I simply cannot undertand your english, so I am off
@WhatsAI Год назад
That is unfortunate! I didn’t know it was a hard accent to understand.
@javierrueda9043 Год назад ⁺²
@@WhatsAI I understand your perfectly and clear.
@charlesmendeley9823 Год назад ⁺²
After reading the comment, I listened to your pronunciation in detail. (I work in speech recognition, so pronunciation is a familiar topic to me). I think your pronunciation is not too far off from a regular English pronunciation, but one weak point is sentence prosody, i.e. the pitch of your voice in the course of the sentence. It seems you do not think about the sentence as a whole, but only about short segments at a time, which leads to an unnatural and "chunky" prosody, which almost sounds like a last generation speech synthesis. If you want to improve, mainly focus on sentence prosody, pronunciation is only a secondary and minor issue in my humble opinion.
@WhatsAI Год назад ⁺¹
Thank you very much for this amazing feedback Charles! I think it may come from reading a script and not having the whole sentence in mind while saying it! It would be incredible if you have a listen to some of my recent longer podcast form and let me know if there is the same problem or how I could improve there!

Следующие

Автовоспроизведение

How Stable Diffusion Works (AI Image Generation)