- Видео 6
- Просмотров 106 546
Deepia
Франция
Добавлен 17 мар 2024
Welcome to Deepia, where I animate deep learning concepts with Manim.
Contrastive Learning with SimCLR | Deep Learning Animated
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/Deepia .
You’ll also get 20% off an annual premium subscription.
In this video you will learn the basics of contrastive learning, and how these approaches were used successfully in SimCLR.
If you want to know more about contrastive learning, you should definitely read the following papers:
- FaceNet: A Unified Embedding for Face Recognition and Clustering arxiv.org/abs/1503.03832
- Deep metric learning using Triplet network arxiv.org/abs/1412.6622
- A Simple Framework for Contrastive Learning of Visual Representations arxiv.org/abs/2002.05709
And for self-supervised learning in general, I strongly recommend t...
You’ll also get 20% off an annual premium subscription.
In this video you will learn the basics of contrastive learning, and how these approaches were used successfully in SimCLR.
If you want to know more about contrastive learning, you should definitely read the following papers:
- FaceNet: A Unified Embedding for Face Recognition and Clustering arxiv.org/abs/1503.03832
- Deep metric learning using Triplet network arxiv.org/abs/1412.6622
- A Simple Framework for Contrastive Learning of Visual Representations arxiv.org/abs/2002.05709
And for self-supervised learning in general, I strongly recommend t...
Просмотров: 4 066
Видео
Variational Autoencoders | Generative AI Animated
Просмотров 34 тыс.2 месяца назад
In this video you will learn everything about variational autoencoders. These generative models have been popular for more than a decade, and are still used in many applications. If you want to dive even deeper into this topic, I would suggest you read the original paper from Kingma, and an overview he wrote later on: - Auto-Encoding Variational Bayes arxiv.org/abs/1312.6114 - An Introduction t...
Latent Space Visualisation: PCA, t-SNE, UMAP | Deep Learning Animated
Просмотров 46 тыс.3 месяца назад
In this video you will learn about three very common methods for data dimensionality reduction: PCA, t-SNE and UMAP. These are especially useful when you want to visualise the latent space of an autoencoder. If you want to learn more about these techniques, here are some key papers: - UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction arxiv.org/abs/1802.03426 - Stochast...
Autoencoders | Deep Learning Animated
Просмотров 13 тыс.5 месяцев назад
In this video, we dive into the world of autoencoders, a fundamental concept in deep learning. You'll learn how autoencoders simplify complex data into essential representations, known as latent spaces. We'll break down the architecture, training process, and real-world applications of autoencoders, explaining how and why we use the latent space of these models. We start by defining what an aut...
CNN Receptive Field | Deep Learning Animated
Просмотров 6 тыс.5 месяцев назад
In this video, we explore the critical concept of the receptive field in convolutional neural networks (CNNs). Understanding the receptive field is essential for grasping how CNNs process images and detect patterns. We will explain both the theoretical and effective receptive fields, highlighting how they influence network performance and design. We start by defining the receptive field and its...
Convolutional Neural Networks | Deep Learning Animated
Просмотров 4 тыс.6 месяцев назад
In this video we will dive into the inner workings of Convolutional Neural Networks. These networks are one of the most widely used basis in Deep Learning, and are at the origin of the rapid growth of this field around 10 years ago. We will first see how the convolution operation works, then how it integrates into a neural network architecture. We will also see some operations specific to CNNs,...
Audio comes out from under water at 2:09 btw
Thank you, I had some issues with copyrighted music which led to RUclips removing it but also degrading the audio...
Thanks!
A really solid explanation. Well done! You are a wonderful communicator and your visualizations are top notch. I do have one very small suggestion that might help. When sweeping through hyperparameters and showing their effect on the embedding it can be helpful to correct a bit of the stochastic nature of layout. When transitioning between your embeddings in low dimensions it can be helpful to a user for you to run a procrustes algorithm on the two embeddings. This will just flip, rotate and scale the point clouds to be best aligned. It really helps users see consistent patterns as hyperparameters change without altering the embedding in any meaningful ways. Keep up the fantastic work. I'll definitely be following your channel.
Thanks for the tips !
Watching this while my first VAE is training
Thank you , still looking for VAE variants videos
Dude awesome video! I did a similar one a few days ago.
Thanks I'll check it out :)
Great video. What is your educational background?
Thanks ! Bachelor in math, bachelor in computer science, master in AI/ML, currently doing a PhD in applied maths and deep learning
@ legendary. Good luck on the PhD! I’m 3rd year EE PhD student, you have phenomenal content. Looking forward to watching your channel grow.
Great video
Thank you very much! It's pretty clear
I can visualize autoencoders better now. Keep doing animations. My brain just encodes animation data easily And I need to decode them in exam paper / seminar
wow, that was such a good video! Thanks for that
I am sharing this and will endorse people in my contact to subscribe, as exaplaining this "not so common" topics, with this much ease is really a art and this efforts of yours deserves a great amount of respect and appreciation.
top tier video!
do u use manim for animations ?
@@shashankjha8454 Yes indeed!
good stuff! keep it going
At 7:45 why is the assumption for p(z) as a Normal Distribution important ? Without that are further calculations not possible ? At 8:01 why is the posterior assumed to be Gaussian ?
@@rishidixit7939 Hi again, indeed further calculations are intractable without assuming both the prior and the posterior to be Gaussian. Some other research works have replaced these assumptions by other well known distributions such as mixtures of Gaussians, which results in another training objective.
For the applications like Domain Adaptation and Image Colorization how does the loss function look like for an AutoEncoder ? Also you said that the MSE Loss is used but then in that case a trivial solution exists where the image is copied pixel by pixel and the Network Learns Nothing. How is that problem taken care of ?
@@rishidixit7939 Hi, I'm not familiar with those two tasks, but for Image Colorization an MSE would probably do just fine ? For preventing the Network to simply copy the image pixel by pixel, we have the bottleneck layer! Remember that this layer has a lot fewer neurons than there are pixels, so you can't just "copy" the values :)
Amazing wow!
3:25 - 6:20 is so distracting. Just assume your audience knows these. No need to conform your target group to general public. Just assume senior-year undergraduate please.
Your channel is astounding brobro thank you
I SWEAR I was trying to understand BYOL just a few minutes and was struggling, then this video came up, THANK YOU! CAN'T WAIT! Also, please do SwaV as well!
when's next video. Love these visualizations!
@@dhurbatripathi6924 Thanks ! By the end of November!
The GOAT
You're a master at your craft, it is a testament to your studies!
4:10 Latent Space.
You are amazing. The visualisations in your lectures are top notch
Great video as always
Very nice video , please continue with this wonderfull work ! thanks a lot.
youtube is not performing well, the view count for this video suggests a change is needed for its CEO
hey, so well explained thanks for the video!! really nailed those animations as well, would be cool to make a video on adam/rmsprop as well, i have a hard time properly understanding why they work. anyway much love to you my friend
How does the model/programmer know if two pictures are a positive or negative pair without labels?
@@user-ht4rw5wp4x Well you have several ways of defining the pairs, for instance you create positive pairs with data augmentation as in SimCLR !
Amazing how high quality your videos are. Hope you will have much more subscribers soon enough. This quality definetly deserve that.
Really nice video. Love your presentation style, so clean and well explained!
This is a really good video, and the animations are top-notch. I feel this video is good not just for those learning about AI but also those learning Statistics.
How exactly in the original contrast loss, does y = 0 in the positive case and the y=1 in the negative case? In addition what is y representing here? 6:27
Is it just a positive and negative pair label which forces the contrastive loss to focus on the positive and negative metrics in the loss function?
Yes exactly !
Amazing presentation again 🎉 thank you for your efforts and time
Great video ! You mention that the contrastive loss pushes/pulls points, how does the loss function "push away" a point exactly ?
Thanks, it pushes negative pairs apart until their distance reaches the margin, by minimizing the difference between the margin and the distance between the points. This is the quantity in red at 06:40 :)
awesome content!
I like that you're focusing on computer vision
InfoNCE loss at 11:14 looks odd as Dp is the distance notation at 9:00, but you say its related to probabilities. It would break the flow to introduce new notation though. But as it stands it was a little confusing to me to see that the loss would be minimized by maximizing Dp. I checked the paper and it seems the term is an approximator for "mutual information" which we want bigger for positive samples. At least thats my rough understanding... Thanks for the video its a fantastic explanation!
Indeed I should have taken the time to introduce it properly and use the correct notations
At 12:04 you say that SimCLR select multiple negative pairs and then you show a picture of a cat, and a dog. I am confused, the second dog picture is also considered as a negative pair even though it's the same animal? If yes, does this mean the model train to lower the distance ONLY with the original image even though other could be dogs?
Exactly! The negatives can be any other image in the batch, including very similar objects
@@Deepia-ls2fo That is very interesting, thank you for your answer, I have another question if you do not mind At the end when comparing classification accuracy you compare supervised, SimCLR+finetune and SimCLR, the last one have me confused, how can the model without any finetuning even work for classification? Or do they not count a trained dense layer that learn to use the latent space of SimCLR for classification, and SimCLR+finetune mean finetuning the latent space instead? My question is that does fine-tune mean finetuning a dense layer or the latent space? Your videos are high quality and I really love them, sometimes I just wish they would be longer and slightly more into the implementation details, thank you! Edit: Regarding my first question, since the negative pair can be the same class (if we imagine the ultimate goal is classification), would a low amount of class (let's say only 2) lower the quality of the latent space due to a high amount of class "collision" ? And in the opposite if there is hundreds of class it will rarely select the same class as a negative pair and improve latent space representation?
@@itz_lucky6472 I strongly advise you to read the SimCLR paper as it is a very easy read and they detail everything. About the classification task: for SimCLR they use what we call "linear eval", meaning they plug a fully connected head on the model and train only this part. The difference between "SimCLR" and "SimCLR fine-tune" is that the weights of the backbone are modified in a supervised fashion with a small portion of the data for "SimCLR fine-tune". For your second question I did not read a lot about this, and I'm myself new to self-supervised learning in general, so I can't answer for sure. I guess you could easily do the experiment with 2 MNIST classes though. Intuitively I think taking many semantically similar objects and treating them as negatives is bad for the representation space.
Augmentations ARE the labels, labels of "ignore".
Awesome Video :D
Nice explanation! It still isn't clear to me how to choose the metric to determine how similar or dissimilar two samples are, is it also learned by the network?
You can choose any differentiable metric, that's one of the strength of this framework :)
Day by day, we inch closer and closer to creating The Great Compressor.
Like the one in Silicon Valley TvSeries.
I'd love to be compressed between my robot anime waifu's thighs 🤤
Outstanding technique :D thank you, it was not wrong to subscribe the channel :D
is the voice in the vid the output of a TTS model?
Yes ! It's my voice though :)
Amazing content! Looking forward to the next videos 😄
Insane technique! awesome video thanks for explaining this with tons of examples.
Hmmmmmm YES