Wow that mammoth 2D visualization using UMAP looked like it was opened up and flattened, you could tell it was a living thing of some sort. Incredible!
My only comment to the video is that PCA real advantage is not speed, is interpretability. It's easy to read a principal component in terms of how it correlates with the original variables. Something you cannot do with t-SNE or UMAP. The video is an excellent work!
I must say, the way you teach is just brilliant! Those visualizations and all, i mean even a 10yo could understand it if he focuses just a lil bit! Can't wait till you reach the level of teaching us the Transformer models!
A really solid explanation. Well done! You are a wonderful communicator and your visualizations are top notch. I do have one very small suggestion that might help. When sweeping through hyperparameters and showing their effect on the embedding it can be helpful to correct a bit of the stochastic nature of layout. When transitioning between your embeddings in low dimensions it can be helpful to a user for you to run a procrustes algorithm on the two embeddings. This will just flip, rotate and scale the point clouds to be best aligned. It really helps users see consistent patterns as hyperparameters change without altering the embedding in any meaningful ways. Keep up the fantastic work. I'll definitely be following your channel.
Incredible visualization and simplification on the topic, especially with UMAP! The superiority of UMAP over t-SNE in terms of the final results and its lower sensitivity to hyperparameters really shows the power of math
Your videos are such high quality! Thank you so much for putting this effort into them. I do data visualization, and I would love to start including more advanced machine learning models in what I make. I got a lot out of this video. I can't wait to see what is next :)
Awesome video! I was hoping for a bit more of a friendly, intuitive explanation of the equations in t-SNE. Instead of just jumping into the equations, it would be great to get a sense of why they work the way they do and how they fit into the whole picture.
Hey I’ve been looking for a visual representation of feature selection en dimensionality reduction and your video is just amazing. The tone and animation make me thing about 3b1b, and that’s a compliment ! 😊
Great video, very well explained. One question for UMAP; I understand the concept behind the loss function creating a lower level representation with similar distances between points, but how do we represent an autoencoder for MNIST as a vector space? Is there some kind of transformation we can perform on the weights to visualize them as such?
@@loicmurumba8493 Thanks, we don't project the network directly, but rather the inner representation it has of the MNIST data. For this, you encode several numbers, which results in vectors of 16 dimensions (for instance, but it could be any dimension). Then you apply UMAP to these vectors. We don't really visualize the weights directly.
@@Deepia-ls2fo Really appreciate the quick answer; I had been thinking about the encoder as a neural net classifying the images into their numbers for some reason. In fact, the autoencoder simply compresses the data as accurately as it can into some lower dimensional space (16 dimensions in this example), then we're able to visualize those 16-D representations by running the algorithms described here
Thank you ! 99% of the video is animated using Manim, a python library. Some 3D stuff is animated using Blender. I'll publish the code for each video soon.
loved the video and it came at a great time for me, thank you! one small detail: at 8:12 the distance of the y's in the denominator should also be squared, right?
Hi thanks for the comment, indeed it should be squared that's a mistake. Another detail I did not take time to mention in the video is that there are no sigma paremeter in the low dimensional representation.
Great explaination and visualization! I'm looking at your code to learn how to perform these visualizations. How did you manage to visualize the rotating 3D mammoth in the right at 17:25? I'd like to do it with the t-rex data
Thank you ! Unfortunately Manim is very bad at handling 3D, so I loaded the data into Blender using a script. If you're struggling you can reach out by email and I'll provide the code (youtube does not like external links in the comments).
Is it safe to say that when "choosing your own adventure" in an agentic RAG workflow, if you can visualize the high dimensional latent space, you can make decisions on iterating your workflow that would be more conservative the closer they are clustered together? For example, if I'm adjusting the parameters of a 3D model and pulling from a dataset of known shapes and "widgets", working within the boundaries of a closely clustered embeddings might be safer than pulling a wacky far-off embedding as my decision outcome for my next iteration of whatever I'm doing in the workflow? If I'm understanding this correctly, the visualizations can help guide a generative workflow.
Hi thanks for the comment, I'm not familiar with all the terms you used. But if I understand correctly yes, checking that the objects you are dealing with are within the known distribution that your model knows how to handle can be done using these visualizations. The broad topic would be out of distribution detection.
@@Deepia-ls2fo I'm still formulating my thoughts on how exactly to implement some of what I'm working on, but the idea is to allow users of my generative workflow to have some intuitive guidance as they progress through iterations. I think exposing the latent space as part of the user experience might not be a bad thing. Rather than it being inside the black box. Also, great videos, new subscriber here!
Hello! Thanks for the superb explanation, but I'm wondering, do the datasets really have to be vector spaces for these methods to work? Wouldn't they work with only a metric space where we only have distance information? Like levenshtein word distances between files?
Hi thanks for the comment, I don't know about this particular distance. Since t-SNE and UMAP both convert vector data into distances, I don't think that you could directly work with the distance information. But what you could do is modify UMAP to include this distance in the first step (the one turning the dataset into a weighted graph). I think t-SNE works only with the standard euclidan distance though. :)
@@Deepia-ls2fo In many UMAP implementations you can use metric="precomputed" and pass a distance matrix rather than input vectors. The Python implementation also supports using a precomputed knn-graph, so you need not compute all-pairs distances, just distances to nearest neighbors. These approaches would allow you to use UMAP with a (precomputed) Levenshtein distance.
11:59 and elsewhere - when you are running PCA, are the classes not included in your dimensions? Eg. in the 3D spiral with color (actually 4 dimensions), are you only using 3 (x,y,z) for PCA (and SNE)?
Yes it would Indeed separate the representations very cleanly. We usually don't use them when testing a method because the idea is to see if the method is able to show relationships that we know are true and exist in the high dimensional space.
In the UMAP case, how does the high-dimensional graph representation have more points than the lower-dimensional graph representation? I am definitely missing something here. 16:10
Ho no that's my bad, they are supposed to have the same number of points of course ! These are just illustratives examples though, not results of the actual aglorithm.
Thanks for the comment, I don't know about any dimensionality reduction technique that would work with one sample only except a simple projection. :/ Edit: I think I misunderstood your question, the only online technique that would come to my mind is deep learning models ? Once trained you can feed them one sample at a time.
@@Deepia-ls2fo I was wondering about an algorithm that does not wait to get the entire dataset but learns the reduction as it gets the samples one at a time (or a small batch at a time), kinda like online learning. But yeah sounds like some trick with deep learning might be the closest possible solution at this time. Probably some pretrained VAE
Hi, what you are refering too can be described using many words. Maybe "representation learning" is what you are looking for ? Edit: maybe "embedding" is the word you're looking for.
Hi, thanks for your comment. Could you send me the infos ? I can't seem to find anything online except the tag on youtube :/ Edit : Nevermind I found it :)
@@deltamico I have never heard gaussian with a hard s in academia, it’s always with an sh. But “correct” is a soft condition when it comes to pronunciation.
Wow that mammoth 2D visualization using UMAP looked like it was opened up and flattened, you could tell it was a living thing of some sort. Incredible!
Thank you ! Indeed it looks like a fossil in the ground :)
Super cool
My only comment to the video is that PCA real advantage is not speed, is interpretability. It's easy to read a principal component in terms of how it correlates with the original variables. Something you cannot do with t-SNE or UMAP. The video is an excellent work!
Thanks for the comment !
Absolutely clear and crisp visualization of PCA!!
Amazing visualization for a very difficult topic grasp. Many thanks!
Thank you !
Prediction - A channel that's going to explode.
Watched multiple videos. Very crisp clear explanation with good animation. Thank you :)
I must say, the way you teach is just brilliant! Those visualizations and all, i mean even a 10yo could understand it if he focuses just a lil bit! Can't wait till you reach the level of teaching us the Transformer models!
thank you for the insight without the fuss. I am a UMAP user and I am glad about your conclusion. Suscribed!
Thank you
You are amazing. The visualisations in your lectures are top notch
A really solid explanation. Well done! You are a wonderful communicator and your visualizations are top notch.
I do have one very small suggestion that might help. When sweeping through hyperparameters and showing their effect on the embedding it can be helpful to correct a bit of the stochastic nature of layout. When transitioning between your embeddings in low dimensions it can be helpful to a user for you to run a procrustes algorithm on the two embeddings. This will just flip, rotate and scale the point clouds to be best aligned. It really helps users see consistent patterns as hyperparameters change without altering the embedding in any meaningful ways.
Keep up the fantastic work. I'll definitely be following your channel.
Thanks for the tips !
Excelent and clear animations, graphs and explanation, keep it on!
Incredible visualization and simplification on the topic, especially with UMAP! The superiority of UMAP over t-SNE in terms of the final results and its lower sensitivity to hyperparameters really shows the power of math
Killing it man, loving these videos, I'm so glad I found your channel!
Thank you :)
Wow, super cool! Love the visualizations! Very informative, much better than the PowerPoint presentations out there lol
Thank you !
Very clearly discussed. Thanks.
Your channel is astounding brobro thank you
Your videos are such high quality! Thank you so much for putting this effort into them. I do data visualization, and I would love to start including more advanced machine learning models in what I make. I got a lot out of this video. I can't wait to see what is next :)
Thank you for the kind words, the next videos will be about VAEs and their variants !
Amazing visualization, pace, and aesthetics. Looking forward to seeing more from you. Best of luck 🤞
Thank you !
I forgot about t-SNE. In my own research I have been using UMAP. But, I haven't heard of TriMAP and PaCMAP before. I am going to dive in deeper!
Awesome video! I was hoping for a bit more of a friendly, intuitive explanation of the equations in t-SNE. Instead of just jumping into the equations, it would be great to get a sense of why they work the way they do and how they fit into the whole picture.
Thanks for the feedback! I would have liked to spend more time on each method, but unfortunately the video got longer than I usually aim for. :/
Hey I’ve been looking for a visual representation of feature selection en dimensionality reduction and your video is just amazing. The tone and animation make me thing about 3b1b, and that’s a compliment ! 😊
@@TheHHadouKen Thank you !
Very nice video , please continue with this wonderfull work ! thanks a lot.
This was amazing! Gotta try this on a dataset.
@@ardhidattatreyavarma5337 thank you :)
Great video! Complex concept in simple language! Do please make more videos about this topic!
Thank you :)
this is suuuuuper helpful!!!!! thank you so much for the work!!!
Thank you !
Just stunning!
Thank you !
Keep it up! Really appreciate this work
Thank you !
WOW AMAZING THANK YOU SO MUCH This helped me a lot 🎉
Thank you I'm glad it helped :)
very good explanation! Cool 😀 I would love to see a video from you explaining how transformers work
Thank you ! It's written on a list somewhere but it's not a priority right now :)
Thank you so much! Straight to the point❤
Thank you !
amazing tutorial!
Whoa ! AI, Data Science and Data Visualization in 3b1b-Manin style. Awesome
Thank you !
Thank you, great video again 🎉 learned many new things 😊
Thank you for watching !
this is really great, thank you very much!
@@jabrikolo Thanks for the comment
Great video! Hopefully this ultimately leads to a video about the interpretability work by Anthropic using VAEs
Thank you, I've not heard of their work I'll look into it :)
masterpiece, also beautiful on OLED monitor. Easy sub from me
Great video, very well explained. One question for UMAP; I understand the concept behind the loss function creating a lower level representation with similar distances between points, but how do we represent an autoencoder for MNIST as a vector space? Is there some kind of transformation we can perform on the weights to visualize them as such?
@@loicmurumba8493 Thanks, we don't project the network directly, but rather the inner representation it has of the MNIST data. For this, you encode several numbers, which results in vectors of 16 dimensions (for instance, but it could be any dimension). Then you apply UMAP to these vectors.
We don't really visualize the weights directly.
@@Deepia-ls2fo Really appreciate the quick answer; I had been thinking about the encoder as a neural net classifying the images into their numbers for some reason. In fact, the autoencoder simply compresses the data as accurately as it can into some lower dimensional space (16 dimensions in this example), then we're able to visualize those 16-D representations by running the algorithms described here
I subscribed and will stay tuned for your next video!
Thank you
Incredible work, I learned a lot. I will say the pacing was a bit fast at times in my opinion (I had to pause to take notes).
Thank you I'll try to adapt the pace
Big thumbs up for the awesome video! May I know how is the video animated?
Thank you ! 99% of the video is animated using Manim, a python library. Some 3D stuff is animated using Blender.
I'll publish the code for each video soon.
@@Deepia-ls2fo That's cool! Your channel deserves millions of subscribers and I'm honoured to be one of the earliest who subscribed!
new sub! great video quality!
Thanks !
Amazing video. I am using a short clip of it for my next project, hope you don't mind. With credit, of course.
Thanks ! Can you send me an email so that we can discuss this ? I usually don't allow my content to be reused on RUclips
unrecognized genius
amazing video! please keep it up :)
Thank you !
Amazing wow!
Would love a video on mixture of a million experts 🙃
loved the video and it came at a great time for me, thank you! one small detail: at 8:12 the distance of the y's in the denominator should also be squared, right?
Hi thanks for the comment, indeed it should be squared that's a mistake. Another detail I did not take time to mention in the video is that there are no sigma paremeter in the low dimensional representation.
Well done. But you did not mention LDA (linear discriminant analysis)
Great explaination and visualization! I'm looking at your code to learn how to perform these visualizations.
How did you manage to visualize the rotating 3D mammoth in the right at 17:25? I'd like to do it with the t-rex data
Thank you ! Unfortunately Manim is very bad at handling 3D, so I loaded the data into Blender using a script. If you're struggling you can reach out by email and I'll provide the code (youtube does not like external links in the comments).
Thank you for your reply.
I resorted to the same method instead of wasting time on manim documentation.
@@arnaldosantoro6812 You're welcome ! Manim is great, just not for 3D :)
great!!
I love this ♥
Very glad you like it
Such a soothing AI voice too!
Thanks ! That's actually my voice cloned into elevenlabs and slightly modified :)
Is it safe to say that when "choosing your own adventure" in an agentic RAG workflow, if you can visualize the high dimensional latent space, you can make decisions on iterating your workflow that would be more conservative the closer they are clustered together? For example, if I'm adjusting the parameters of a 3D model and pulling from a dataset of known shapes and "widgets", working within the boundaries of a closely clustered embeddings might be safer than pulling a wacky far-off embedding as my decision outcome for my next iteration of whatever I'm doing in the workflow? If I'm understanding this correctly, the visualizations can help guide a generative workflow.
Hi thanks for the comment, I'm not familiar with all the terms you used. But if I understand correctly yes, checking that the objects you are dealing with are within the known distribution that your model knows how to handle can be done using these visualizations.
The broad topic would be out of distribution detection.
@@Deepia-ls2fo I'm still formulating my thoughts on how exactly to implement some of what I'm working on, but the idea is to allow users of my generative workflow to have some intuitive guidance as they progress through iterations. I think exposing the latent space as part of the user experience might not be a bad thing. Rather than it being inside the black box. Also, great videos, new subscriber here!
Hey just to let you know that I think I saw a thing on linkedin related to what you were talking about. This seems to definitely be a use case.
Hello! Thanks for the superb explanation, but I'm wondering, do the datasets really have to be vector spaces for these methods to work? Wouldn't they work with only a metric space where we only have distance information? Like levenshtein word distances between files?
Hi thanks for the comment, I don't know about this particular distance. Since t-SNE and UMAP both convert vector data into distances, I don't think that you could directly work with the distance information. But what you could do is modify UMAP to include this distance in the first step (the one turning the dataset into a weighted graph). I think t-SNE works only with the standard euclidan distance though. :)
@@Deepia-ls2fo In many UMAP implementations you can use metric="precomputed" and pass a distance matrix rather than input vectors. The Python implementation also supports using a precomputed knn-graph, so you need not compute all-pairs distances, just distances to nearest neighbors. These approaches would allow you to use UMAP with a (precomputed) Levenshtein distance.
11:59 and elsewhere - when you are running PCA, are the classes not included in your dimensions? Eg. in the 3D spiral with color (actually 4 dimensions), are you only using 3 (x,y,z) for PCA (and SNE)?
Thanks for the comment, indeed only the position information is used, not the class !
@@Deepia-ls2fo so if you added the class to PCA wouldn't it separate very cleanly?
Yes it would Indeed separate the representations very cleanly.
We usually don't use them when testing a method because the idea is to see if the method is able to show relationships that we know are true and exist in the high dimensional space.
Thank you
The video is great, thank you. Btw, I have a small question, what is the name of the background music?
Thank you !
It's a copyright free music I found on Pixabay: Documentary - Coma-Media.
@@Deepia-ls2fo Many thanks!
great video
@@Jinom Thanks !
In the UMAP case, how does the high-dimensional graph representation have more points than the lower-dimensional graph representation? I am definitely missing something here. 16:10
Ho no that's my bad, they are supposed to have the same number of points of course !
These are just illustratives examples though, not results of the actual aglorithm.
@@Deepia-ls2fo oh okay, thanks!
Are there any dimensionality reduction algorithms that work online (i.e. one sample at a time)?
Thanks for the comment, I don't know about any dimensionality reduction technique that would work with one sample only except a simple projection. :/
Edit: I think I misunderstood your question, the only online technique that would come to my mind is deep learning models ? Once trained you can feed them one sample at a time.
@@Deepia-ls2fo I was wondering about an algorithm that does not wait to get the entire dataset but learns the reduction as it gets the samples one at a time (or a small batch at a time), kinda like online learning. But yeah sounds like some trick with deep learning might be the closest possible solution at this time. Probably some pretrained VAE
What's it called when you turn a high dimensional representation into a single point to compare with other similar high dimensional representations?
Hi, what you are refering too can be described using many words. Maybe "representation learning" is what you are looking for ?
Edit: maybe "embedding" is the word you're looking for.
@@Deepia-ls2fo Yes! Thanks.
This area of research is SO fascinating. I feel we're are grasping at something bigger.
very nice, make sure to submit this to #SoMEpi (deadline is Aug18)
Hi, thanks for your comment. Could you send me the infos ? I can't seem to find anything online except the tag on youtube :/
Edit : Nevermind I found it :)
@@Deepia-ls2fo just search for "Summer of Math Exposition," I cannot post the link because RUclips does not let viewers post links
@@Deepia-ls2fo great to see that your entry won one of the Honorable Mentions, congrats!
@@jkzero Thanks glad to see you got one too :)
@@Deepia-ls2fo thanks, I was not expecting this at all. I hope you got exposure to a greater audience and some constructive feedback from reviewers
Bro when did UMap turn into a paper about arxheology and mammoths what in the actual black maths
Awesome !Q
Thank you !
❤❤
ISOMAP and PaCMAp are tow newer algorithms.
those datas at the beginning look like a country map
Thanks for the comment, well it was just MNIST lol
Assumed it was magic
I just realized that I am an absolute nerd because I cringe every time you say Gaussian with a hard s
isn't that correct though
@@deltamico I have never heard gaussian with a hard s in academia, it’s always with an sh. But “correct” is a soft condition when it comes to pronunciation.
sir. you are amazing..
Thank you !
It sounds like LEEbler, not LIEbler.
Thanks for your comment, as a French I always assumed he was German, never occured to me he was American !