I first saw this in Chinese media with Chinese subtitles, then just came back to subscribe to the original author. The most clear introduction ever seen with such nice proper animation. It will blow up for sure.
Love the clarification at 00:13, because I've also felt that the misconception is wide spread. I've heard people say: I am not using GANs anymore, I am using Generative AI. The word "Generative" is literally what G in GAN stands for. 😂😂
Great video. Looking forward to your video on contrastive learning, it is my favourite subject in deep learning. Your videos combine great production skills (animations, colour selection, movement between frames) with in-depth understanding of complex concepts.
This is a really good video, and the animations are top-notch. I feel this video is good not just for those learning about AI but also those learning Statistics.
Thank you for finally making me understand the reparametrization trick!! It was thrown at me several times during a DRL class I took last year and I never really understood what we did. This made it much much clearer, thank you! Also: great video overall!
Thanky you again for the great content and the amazing animations 🎉💪👍 keep going, hopefully your Channel explode with more subscribers .. i will recommend it for sure to other people
hey, so well explained thanks for the video!! really nailed those animations as well, would be cool to make a video on adam/rmsprop as well, i have a hard time properly understanding why they work. anyway much love to you my friend
Basically a linear interpolation between two points, with "t" in front of one of the points, and "(1-t)" in front of the other. The set of all these points is convex, hence "convex interpolation" :)
just a tip can you make it super clear that the reason we sample in the middle is to produce a nice continuous latent space where the different dimensions encode different meaning
I find math speak very hard to grok. I was always good at math, but always got turned off by the navel gazing and geekery. You do a great job keeping it engaging without assuming that I am a math geek
@@HenrikVendelbo Yeah sometimes it do be like that in math classes. I think it's important to look at equations when they tell us something about the models, but computational tricks or complex equations are not that interesting.
Furthermore, in 11:39 and 12:39 you are referencing σ as variance. But is it σ the standard deviation and σ^2 the variance? (Nevertheless, the video is perfect. Excellent work!)
At 7:45 why is the assumption for p(z) as a Normal Distribution important ? Without that are further calculations not possible ? At 8:01 why is the posterior assumed to be Gaussian ?
@@rishidixit7939 Hi again, indeed further calculations are intractable without assuming both the prior and the posterior to be Gaussian. Some other research works have replaced these assumptions by other well known distributions such as mixtures of Gaussians, which results in another training objective.
@@Deepia-ls2fo does it learn from existing data ? , if yes how does it generate data from scratch , denoising involves learning the state and adding some randomness in that state only 🤔
Is there a statistical property or proof that might show a graphRAG "transfer function" to be the same as a VAE or maybe a CVAE? Perhaps in terms of entropy? It would be interesting to make two identical systems, one using a VAE and one using graphRAG, and see if they can match up statistically. I can't shake the idea that software 3.0 might be the more sound approach for developing new GenAI tools vs software 2.0.
Hi Jason ! Unfortunately I know close to nothing about RAG so I have no idea if what you describe might be feasible. I here about RAG everywhere these days, I should get up to date on that.
Thanks for your comment, I would love to but I have many other topics I want to talk about first, and not much time on my hand! There are very good ressources on RUclips though, if you want to start to learn Manim. :)
@@Deepia-ls2fo Thank you, but I hope that you have enough time to create a course to learn manim, even if there is one video every week, and this will contribute to increasing the number of your views more because your explanation is very beautiful and clear, and I can understand it easily even though I am an Arab 🤍☺️
Amazing Graphics and explanation. I have one question - if we use MNIST dataset (like what is shown in the video) does it mean that the mu and sigma are vectors of dimension 10x1? What if we use a dataset where the number of different classes are unknown? What will be the dimension of mu and sigma in that case?
Thank you, the latent dimension is not directly related to the number of classes in your dataset. In fact a very good encoder could very well classify perfectly the 10 classes on a single dimension, but it makes things way harder to reconstruct for the decoder. As you mention in most datasets we don't even know the number of classes or the number of relevant features, so we just take ad hoc latent dimensions (16, 32) and see if it's enough for the encoder to produce a useful representation, and for the decoder to reconstruct correctly.
This question came to my mind: What would happen if we ignored the encoder part and tried to train only the decoder? For example, by sampling from a standard Gaussian vector and attempting to reconstruct a digit. I don't really understand the purpose of the encoder.
If you don't condition at all the latent space from which you are sampling, I'm not sure the model will be able to learn anything. Here the encoder explicitly approximate the posterior distribution in order for us to then sample from the distribution of images. This is all a theoretical interpretation of course, but learning to reconstruct any digit from pure unconditioned noise seems a bit hard! Diffusion models kind of do it (in image space), but this usually takes a lot of steps. Anyway, the experiment you describe would be very easy to implement, if you want to try it out. :D
Great video! However, I am slightly confused: For your loss function you are subtracting KL divergence rather than adding it. Wouldn't you want to add it to penalize the difference between the latent distribution and the standard normal distribution? At least, in all implementations I have seen they add KL divergence rather than subtract it. Edit: I understand my mistake now!
Hi ! Thanks for the comment, I'm afraid I might have flipped a sign at one point. When you derive the ELBO (which you then maximize via training), there is a minus sign appearing in front of the KL. But in practice you minimize the opposite of this quantity, which is equivalent to minimizing the L2 plus the KL. I hope it's not too confusing. :)
@@Deepia-ls2fo Oooooh I understand, so ELBO is the quantity that should be maximized, and you were denoting the ELBO quantity with L(x), not the loss itself. I understand now, thanks!
Hi, unfortunately I don't know anything about Reinforcement Learning, so I don't think I'll be able to make videos about that any time soon. However, I believe Steve Brunton has very good videos on the topic :)
DeepIA absolutely killed it with this video on Variational Autoencoders. As a government official, medical doctor, and law PhD, it's not often I come across something that genuinely teaches me something new. But this video? Wow. The way Variational Autoencoders map data to a latent distribution instead of a fixed point, and the balance between reconstruction loss and Kullback-Leibler divergence, was explained so clearly that I picked it up right away. Whether I'm shaping policies, treating patients, or analyzing legal cases, this video added value in ways I didn’t expect. Props to DeepIA for delivering content that even someone as busy (and brilliant) as me can appreciate! And let’s not forget the genius behind it all. Honestly, the mind that creates content like this is nothing short of extraordinary. I don’t say this lightly, but DeepIA might just be the most insightful, brilliant, and generous creator on RUclips. The precision, the depth, the clarity-it’s rare to find someone who can not only understand such complex topics but also make them accessible to mere mortals like us. It’s an honor to witness this level of mastery. Truly, we’re not worthy.
There are so many trashy channels with AI generated nonsense, while some channels (like this) has clear explanation and just few views. I think RUclips should add some "peer-review" feature, and while there is no such tool I encourage support such good channels with likes and comments and hit dislikes for useless AI "blah-blah" channels. I'm not against AI as helper tool (like script writing/voice generation), but if there is no fact checks from authors, that make it garbage and the platform doesn't have proper garbage collector yet.
Please create videos on Auto Regressive Models, Particularly RNN, LSTM, PixelCNN as soos as you can. I have a mid exam in third week of October which will cover these topics.
No to be honest that would take way too much time on my side, so it's probably never going to happen. Hopefully text to speech services get better over time!
3:25 - 6:20 is so distracting. Just assume your audience knows these. No need to conform your target group to general public. Just assume senior-year undergraduate please.
this account is seriously underrated, will definitely blow up soon
Agree, I just subscribed ❤
Agreed 💯
💯
I can't believe how calmly and clear you explain difficult topics!
Well that's the magic of text to speech 😁
haha I knew it was AI voice
Finally, the concept of VAE is clear. Thanks a ton.
You're welcome, thanks for the comment !
I first saw this in Chinese media with Chinese subtitles, then just came back to subscribe to the original author. The most clear introduction ever seen with such nice proper animation. It will blow up for sure.
Well thank you ! Can you send me more info about that through my email ? ytdeepia@gmail.com
This channel is amazing, you should be very proud of what you have produced Thibaut!!
Thanks !
The amount of effort you put into these works is really commendable. You are a blessing to humanity.
Wow it must be a lot of work to obtain such amazing animations! The video is really dynamic and easy to follow, congratulations
The program is called manim, it’s from 3blue1brown
Love the clarification at 00:13, because I've also felt that the misconception is wide spread. I've heard people say: I am not using GANs anymore, I am using Generative AI. The word "Generative" is literally what G in GAN stands for. 😂😂
I made this intro because I couldn't stand the number of "GenAI experts" on my LinkedIn feed :(
Great video. Looking forward to your video on contrastive learning, it is my favourite subject in deep learning.
Your videos combine great production skills (animations, colour selection, movement between frames) with in-depth understanding of complex concepts.
Thanks for the kind words !
You're a master at your craft, it is a testament to your studies!
This channel needs more subscribers
Thank you !
This is a really good video, and the animations are top-notch. I feel this video is good not just for those learning about AI but also those learning Statistics.
Amazing quality, hope the channel takes off ! Great use of manim
Thanks !
3blue1brown specifically for AI models??? sign me up!!! I'll fs be linking to you in my own vids whenever relevant, this was great
Your videos and explanations are really excellent, please keep doing this.
Thank you for finally making me understand the reparametrization trick!! It was thrown at me several times during a DRL class I took last year and I never really understood what we did. This made it much much clearer, thank you! Also: great video overall!
Glad it helped !
cant wait for the next video! this was great!!!!!
Thank you !
Thank you very much for providing and sharing the lecture. Excelent explanation and so a high-quality video!
Thank you !
Thanky you again for the great content and the amazing animations 🎉💪👍 keep going, hopefully your Channel explode with more subscribers .. i will recommend it for sure to other people
Thank you :)
Great visualizations, and good explanation! Congratulations and thanks for the nice video :)
Thank you !
12:40 we scale by standard deviation not variance
perfect video, its easy to understand the VAE, subscribed!
Thanks !
Thanks to you everything is clear now, thank you Deepia
thanks miss conti
Really amazing content, thank you for spreading knowledge! Thanks a lot :)
Thanks for the comment, keeps me motivated :)
Great content! It was a really good explanation
hey, so well explained thanks for the video!! really nailed those animations as well, would be cool to make a video on adam/rmsprop as well, i have a hard time properly understanding why they work. anyway much love to you my friend
Thank you , still looking for VAE variants videos
15:49 - What is convex interpolation?
Basically a linear interpolation between two points, with "t" in front of one of the points, and "(1-t)" in front of the other. The set of all these points is convex, hence "convex interpolation" :)
just a tip can you make it super clear that the reason we sample in the middle is to produce a nice continuous latent space where the different dimensions encode different meaning
Thanks
Ho my, thank you so much !
I find math speak very hard to grok. I was always good at math, but always got turned off by the navel gazing and geekery. You do a great job keeping it engaging without assuming that I am a math geek
@@HenrikVendelbo Yeah sometimes it do be like that in math classes.
I think it's important to look at equations when they tell us something about the models, but computational tricks or complex equations are not that interesting.
Great video 🎉. I've never had such a great explanation of VAE. Waiting for VQVAE.....
Thank you !
Furthermore, in 11:39 and 12:39 you are referencing σ as variance. But is it σ the standard deviation and σ^2 the variance? (Nevertheless, the video is perfect. Excellent work!)
Thanks, indeed there might be some mistakes !
Amazing content :D
I hope you'll do your next videos on VQ VAE and VQ VAE 2, I enjoyed so much reading those papers !
Thanks, I really gotta take another look to the paper
Thank you very much! It's pretty clear
Incredible explanation!! Thank you for sharing your knowledge! 😁😁
Thanks !
thanks for the wonderful animations and explanation
Thanks
Finally I understand this concept.
Audio comes out from under water at 2:09 btw
Thank you, I had some issues with copyrighted music which led to RUclips removing it but also degrading the audio...
Great vid! Commenting for algorithmical reasons
Thanks !
Watching this while my first VAE is training
Wowowowowowow 🎉🎉🎉 amazing video for VAE. Pls ~ make more videos
@@griterjaden Thanks, I'm on it :)
This is excellent. Thank you!
Thanks !
Excellent video, I subscribed because of it :)
thanks !
At 7:45 why is the assumption for p(z) as a Normal Distribution important ? Without that are further calculations not possible ?
At 8:01 why is the posterior assumed to be Gaussian ?
@@rishidixit7939 Hi again, indeed further calculations are intractable without assuming both the prior and the posterior to be Gaussian.
Some other research works have replaced these assumptions by other well known distributions such as mixtures of Gaussians, which results in another training objective.
0:23 does it create data from scratch?
Yep, basically modern image generation techniques (diffusion models / flow matching) create new data starting from pure noise !
@@Deepia-ls2fo does it learn from existing data ? , if yes how does it generate data from scratch , denoising involves learning the state and adding some randomness in that state only 🤔
Thanks!
Is this manim?!!! Nice work dude!
It is indeed Manim, thank you !
good stuff! keep it going
It is the best VAE visualization.
Great content ! What software are u using to animate ?
Thanks ! For most animations I use Manim, a python module originally made by Grant Sanderson from 3blue1brown.
@@Deepia-ls2fo thank you
Great Video!
Thanks !
Great content
Next video on diffusion models please , thanks in advance ❤
It's on the to-do list but the next 3 videos will be about self-supervised learning !
Is there a statistical property or proof that might show a graphRAG "transfer function" to be the same as a VAE or maybe a CVAE? Perhaps in terms of entropy? It would be interesting to make two identical systems, one using a VAE and one using graphRAG, and see if they can match up statistically. I can't shake the idea that software 3.0 might be the more sound approach for developing new GenAI tools vs software 2.0.
Hi Jason ! Unfortunately I know close to nothing about RAG so I have no idea if what you describe might be feasible. I here about RAG everywhere these days, I should get up to date on that.
@@Deepia-ls2fo I'd love to hear your take on it if you ever do a deep dive.
Great video. What is your educational background?
Thanks ! Bachelor in math, bachelor in computer science, master in AI/ML, currently doing a PhD in applied maths and deep learning
@ legendary. Good luck on the PhD! I’m 3rd year EE PhD student, you have phenomenal content. Looking forward to watching your channel grow.
Thanks for the video ❤😊
Thank you !
🤍Please, can you create a course to learn the manim library from scratch to professionalism, because I need it very much? Please reply ❤😊
Thanks for your comment, I would love to but I have many other topics I want to talk about first, and not much time on my hand! There are very good ressources on RUclips though, if you want to start to learn Manim. :)
@@Deepia-ls2fo Thank you, but I hope that you have enough time to create a course to learn manim, even if there is one video every week, and this will contribute to increasing the number of your views more because your explanation is very beautiful and clear, and I can understand it easily even though I am an Arab 🤍☺️
Can you share video code?
The link is in the description!
Hey, could you make a video talking about swav in unsupervised learning?
great content bro.
Thanks
Amazing Graphics and explanation. I have one question - if we use MNIST dataset (like what is shown in the video) does it mean that the mu and sigma are vectors of dimension 10x1? What if we use a dataset where the number of different classes are unknown? What will be the dimension of mu and sigma in that case?
Thank you, the latent dimension is not directly related to the number of classes in your dataset.
In fact a very good encoder could very well classify perfectly the 10 classes on a single dimension, but it makes things way harder to reconstruct for the decoder.
As you mention in most datasets we don't even know the number of classes or the number of relevant features, so we just take ad hoc latent dimensions (16, 32) and see if it's enough for the encoder to produce a useful representation, and for the decoder to reconstruct correctly.
@@Deepia-ls2fo Thanks a lot for your response. Can't wait for your next video.
Could you please make a video talking about why diffusion model, GAN, and VQVAE can make the image sharper
Hey, nice video!
This question came to my mind: What would happen if we ignored the encoder part and tried to train only the decoder? For example, by sampling from a standard Gaussian vector and attempting to reconstruct a digit. I don't really understand the purpose of the encoder.
If you don't condition at all the latent space from which you are sampling, I'm not sure the model will be able to learn anything.
Here the encoder explicitly approximate the posterior distribution in order for us to then sample from the distribution of images.
This is all a theoretical interpretation of course, but learning to reconstruct any digit from pure unconditioned noise seems a bit hard!
Diffusion models kind of do it (in image space), but this usually takes a lot of steps.
Anyway, the experiment you describe would be very easy to implement, if you want to try it out. :D
Great video! However, I am slightly confused: For your loss function you are subtracting KL divergence rather than adding it. Wouldn't you want to add it to penalize the difference between the latent distribution and the standard normal distribution? At least, in all implementations I have seen they add KL divergence rather than subtract it.
Edit: I understand my mistake now!
Hi ! Thanks for the comment, I'm afraid I might have flipped a sign at one point.
When you derive the ELBO (which you then maximize via training), there is a minus sign appearing in front of the KL. But in practice you minimize the opposite of this quantity, which is equivalent to minimizing the L2 plus the KL.
I hope it's not too confusing. :)
@@Deepia-ls2fo Oooooh I understand, so ELBO is the quantity that should be maximized, and you were denoting the ELBO quantity with L(x), not the loss itself. I understand now, thanks!
is there we have videos with the same approach, for Reinforcement Learning :D ???? !
Hi, unfortunately I don't know anything about Reinforcement Learning, so I don't think I'll be able to make videos about that any time soon. However, I believe Steve Brunton has very good videos on the topic :)
do u use manim for animations ?
@@shashankjha8454 Yes indeed!
top tier video!
13:05 That’s so funny VAE and Adam both are proposed by same person, Kingma..
He's quite the man, also co-author on some key diffusion models paper :)
'now that we've got the basics down' ... lol yea ok, professor.
Now if only the latent space could be a variable size, and be discrete, then maybe we could do effective ai lossy/lossless compression 🤔
Hi, I don't know about variable dimension latent space, but discrete sure sounds like VQ-VAE :)
DeepIA absolutely killed it with this video on Variational Autoencoders. As a government official, medical doctor, and law PhD, it's not often I come across something that genuinely teaches me something new. But this video? Wow.
The way Variational Autoencoders map data to a latent distribution instead of a fixed point, and the balance between reconstruction loss and Kullback-Leibler divergence, was explained so clearly that I picked it up right away.
Whether I'm shaping policies, treating patients, or analyzing legal cases, this video added value in ways I didn’t expect. Props to DeepIA for delivering content that even someone as busy (and brilliant) as me can appreciate!
And let’s not forget the genius behind it all. Honestly, the mind that creates content like this is nothing short of extraordinary. I don’t say this lightly, but DeepIA might just be the most insightful, brilliant, and generous creator on RUclips. The precision, the depth, the clarity-it’s rare to find someone who can not only understand such complex topics but also make them accessible to mere mortals like us. It’s an honor to witness this level of mastery. Truly, we’re not worthy.
thx 🤖
Rien que ça 🤣
You need that music I dont remember it
There are so many trashy channels with AI generated nonsense, while some channels (like this) has clear explanation and just few views. I think RUclips should add some "peer-review" feature, and while there is no such tool I encourage support such good channels with likes and comments and hit dislikes for useless AI "blah-blah" channels.
I'm not against AI as helper tool (like script writing/voice generation), but if there is no fact checks from authors, that make it garbage and the platform doesn't have proper garbage collector yet.
Comment to up this channel
Please create videos on Auto Regressive Models, Particularly RNN, LSTM, PixelCNN as soos as you can. I have a mid exam in third week of October which will cover these topics.
Is this voice AI
Yes I cloned my voice using a text to speech service called elevenlabs
The auto-generated voice-over is super annoying. Any chance a real human can narrate it?
No to be honest that would take way too much time on my side, so it's probably never going to happen. Hopefully text to speech services get better over time!
3:25 - 6:20 is so distracting. Just assume your audience knows these. No need to conform your target group to general public. Just assume senior-year undergraduate please.
Thanks