Thanks for this -- it is a very nice short succinct description (with good visuals) that still manages to capture all the important core ideas. I'll be sure to recommend this to people looking for a quick introduction to UMAP.
I would like to point out that the statement around 6:44 that says that changing the hyperparameters of tSNE completely changes the result of the embedding is very likely to be the result of a random initialisation on tSNE, whereas the UMAP implementation you are using brings the same initialisation for each set of hyperparameters. It is good practice to initialise tSNE with PCA; if that was the case in the video, the results between hyperparameter changes in tSNE and UMAP would be comparable.
@@AICoffeeBreak yeah, something similar. Actually I found its use in BertTopic very interesting, where we reduce dimensionality of document embeddings (which leverage sentence-transformers) to later cluster and visualize different topics :) towardsdatascience.com/topic-modeling-with-bert-779f7db187e6
UMAP rocks! The only problem I see is the explainability of this high dimensionality reduction, which is easily done in PCA. In other words, you can get the best variables to explain the clustering, which is important when you are focusing on variable selection. What do you think?
This is really fantastic stuff! Thanks for teaching it in such an easy-to-grasp way. I must admit I didn't manage the original paper, since I am "just" a biologist. But this video helped a lot. I would have a question: I wanted to project the phenological similarity of animals at certain stations, to see which stations were most similar in that respect. For each day at each station there is a value of presence or absence of a certain species. Obviously there is also temporal autocorrelation involved here. My first try with UMAP gave a very reasonable result, but I am unsure if is a valid method for my purposes. What do you think, Letitia or others?
I am afraid you did not fully understood the mechanism of Information geometry behind UMAP and how the KL-divergence acts as the "spring-dampener" mechanism. Keenan Crane and Melvin Leok have great educational materials on the topic.
@@AICoffeeBreak Hello :) I thought as much. My background is in theoretical physics, but I am making a living in analyzing neuroscience (calcium imaging) data. It seems that neuroscience is now very excited in using the latest data reduction techniques, hence my interest in UMAP. :) I really like the "coffee bean" idea: friendly, very approachable and to the point.
I think it is import umap-learn instead of import umap. Great video. Just weird I cannot get it to run on google colab. When I run cell with bp variable, it is just blank. No errors. Weird.
I am curious if anyone knows if it is possible to use UMAP (or other projection algorithms) in the other direction: From a low dimensional projection -> a spot in high dimensional space? An example would be picking a spot between clusters in the 0-9 digit example (either 2d or 3d) and seeing what the new resulting "number" looked like (in pixel space).
What you are asking for is a generative model. But let's start from the bottom. I don't want to say that dimensionality reduction is easy, but let's put it like this: summarizing stuff (dim. reduction) is easier than inventing new stuff (going from low to high dimensions). Because the problem you are asking about is a little loser defined since all these new dimensions have to be filled *meaningfully*. Happily, there are methods that do these kinds of generations. In a nutshell, one trains them on lots and lots of data to generate the whole data sample (an image of handwritten digits) from summaries. Pointer -> you might want to look onto (variational) Autoencoders and Generative Adversarial Networks.
@@AICoffeeBreak Thank you for the long response! I am moderately familiar with both GANs and VQ-VAEs but did not know if a generated sample could be chosen from the UMAP low dimensional projected space. For example, the VAE takes images, compresses it to an embedded space and then restores the original. UMAP could take that embedded space and further reduce it to represent it in a 2D graph. So what I want is 2D representation -> embedding -> full reconstructed new sample. I was uncertain if that 1st step is permitted.
@@terryr9052 I would say yes, this is possible and I think you are on the right track, so I'll push further. :) With GANs, this is minimally different, I will focus on VAEs for now: *During training* a VAE does exactly as you say: image (I) -> low. dim. embedding (E) -> image (I), therefore the name AUTOencoder. What I think is relevant for you is that E can be 2-dimensional. The dimensionality of E is actually a hyperparameter and you can adjust it like the rest of your architecture flexibly. Choosing such a low dimensionality of E might only mean that when you go from I -> E -> I, the whole process is lossy. I -> E (the summary, encoder) is simple. But E -> I, the reconstruction or in a sense: the re-invention of information (decoder) in many dimensions is complicated to achieve from only 2 dimension. Therefore it is easier when the dimensionality of E is bigger (something like 128-ish in "usual" VAEs). In a nutshell, what I just described in the I -> E step is what any other dimensionality reduction algorithm does too (PCA; UMAP; t-SNE). But this time, it's implemented by a VAE: The E -> I step is what you want, and here it comes for free. Because what you need is the *testing step*. You have trained a VAE that can take any image, encode it (to 2 dims) and decode it. But now with the trained model, you can just drop the I -> E and position yourself somewhere in the E space (i.e. give it an E vector) and let the E -> I routine run. I do not know how far I should go, because I also have thoughts for the case where you really, really want to use I -> E to be forcibly the UMAP routine and not a VAE encoder. Because in that case, you would need to train only a decoder architecture. Or a GAN. Sorry, it gets a little too much to put into a comment. 😅
Hi! I'm the creator of babyplots. Yes, the library is still actively supported. If you're having issues with getting started, please join the babyplots discord server, which you'll find on our support page: bp.bleb.li/support or write an issue on one of the github repositories. I'll be sure to help you there.
You can't say that PCA "can be put in company with SVD". SVD is one of available implementations of PCA. PCA means "a linear transformation, that transform data into a bases with first component aligned with direction of maximum variation, second component aligned with direction of maximum variation of data, projected on hyperplane orthogonal to first component, etc". SVD is a matrix factorisation method. It turns out, that when you perform SVD you get PCA. But it doesn't mean that SVD is dimensionality reduction algorithm - SVD is a way to represent a matrix. It can be used for many different purposes (ex. for quadratic programming), not necessarily reduction of dimensionality. Same for PCA, it can be performed using SVD, but other numerical methods exist as well.
You make some good observations, but we do not entirely agree. We think there are important differences between SVD and PCA. In any case, there by "put into company" we did not mean to go into the specific details about the relationship between these algorithms. It was meant more like "if you think about PCA, you should think about matrix factorization like SVD or NMF", this is what we understand by "put into company" as we do not say "it is" or "is absolutely and totally *equivalent* with".
Thanks for this -- it is a very nice short succinct description (with good visuals) that still manages to capture all the important core ideas. I'll be sure to recommend this to people looking for a quick introduction to UMAP.
Wow, we feel honoured by your comment! Thanks.
Wow, this channel is a gold mine
I beg to differ. It is a coffee bean mine. 😉
That baby plot really looks amazing!!
I wish you are the teacher of all subjects in the world! Many thanks
Wow, this is so heartwarming! Thanks for this awesome comment! 🤗
didnot know about babyplot...thanks for sharing !
wow! that is a very well dimensionally reduced version of UMAP algo
Haha, good pun! 👍
I have seen and 'interpreted' so many UMAP plots and have not understood its utility until today. Thank you.
I didn't know about this before! Thanks for this video Letitia!
Glad it was helpful! UMAP is a must-know for dimensionality reduction nowadays.
Thanks. I'd never heard of UMAP. Now I'll definitely be trying it as a replacement the next time I reach for PCA.
1st video i saw. Loved it. Subscribed.
The visuals are amazing
You're amazing! *Insert Keanu Reeves meme here* 👀
Thanks for making this clear and entertaining! I love the coffee bean 😂
Love it, thanks Ms. Coffee and Letitia!
This is really good. Absolutely love the simplicity 👍
I would like to point out that the statement around 6:44 that says that changing the hyperparameters of tSNE completely changes the result of the embedding is very likely to be the result of a random initialisation on tSNE, whereas the UMAP implementation you are using brings the same initialisation for each set of hyperparameters. It is good practice to initialise tSNE with PCA; if that was the case in the video, the results between hyperparameter changes in tSNE and UMAP would be comparable.
Great work, Letitia! Needed this kind of introduction to UMAP :) And thanks for the links!
Glad it was helpful, Denis!
Are you interested in UMAP for word embedding visualization? Or for something entirely different?
@@AICoffeeBreak yeah, something similar. Actually I found its use in BertTopic very interesting, where we reduce dimensionality of document embeddings (which leverage sentence-transformers) to later cluster and visualize different topics :)
towardsdatascience.com/topic-modeling-with-bert-779f7db187e6
2 videos in and I’m already a fan of this channel. Cool stuff! 😎
Hey thanks! Great to have you here.
Great introduction to UMAP, thanks
very fun and educative explanation of a difficult method! keep the vids coming ms coffeebean!!
Thank you! 😃 There will be more to come.
Hey Letitia, really amazing Video on UMAP. Love your easy to follow explanations :D Keep up the good work
This is incredibly helpful. Thanks!
Very nice explanation!
Glad you think so! 😊
Thanks for making this video! Very helpful
Thank you!
I finally understand!
Thank you so much !
Hello 😎
sooo well explain, brilliant!
Thanks!
felicitari pentru un canal excelent
Mulțumesc mult pentru apreciere!
Thank you for explaining it wonderfully 😊
So nice of you to leave this lovely comment here! 😊
Fantastic! Such a good explanation, and thanks for the babyplot tip. Awesome channel!!!
So glad you like it! ☺️
@@AICoffeeBreak It'll be very helpful. In geochemistry we usually work with 10+ variable, so having a complement to PCA will make analysis more robust
Find a great channel! Thanks for sharing
Thanks for coming! :)
Very cool, thanks for it!
Glad you enjoyed it!
great explanation!
Happy it was helpful! 👍
Awesome as always!
Amazing. Reminds me of Gephi.
UMAP rocks! The only problem I see is the explainability of this high dimensionality reduction, which is easily done in PCA. In other words, you can get the best variables to explain the clustering, which is important when you are focusing on variable selection. What do you think?
Great video!
This is really fantastic stuff! Thanks for teaching it in such an easy-to-grasp way. I must admit I didn't manage the original paper, since I am "just" a biologist. But this video helped a lot.
I would have a question: I wanted to project the phenological similarity of animals at certain stations, to see which stations were most similar in that respect. For each day at each station there is a value of presence or absence of a certain species. Obviously there is also temporal autocorrelation involved here. My first try with UMAP gave a very reasonable result, but I am unsure if is a valid method for my purposes. What do you think, Letitia or others?
Love you so much.
Thank you very much! that was a great explanation 😊
Very nice explanation. Do you have any other videos with more information about umap? What are the limitations as compared with e.g. deep neural nets?
Hello, What is the complexity of UMAP? . Thanks for the video.
I think the answer to your question is here 👉github.com/lmcinnes/umap/issues/8#issuecomment-343693402
Great vid
Thanks!
I really like your vids
I am afraid you did not fully understood the mechanism of Information geometry behind UMAP and how the KL-divergence acts as the "spring-dampener" mechanism. Keenan Crane and Melvin Leok have great educational materials on the topic.
Thank you!
You're welcome!
Wow! Wow! Ich mag es!
Cold Mirror, is that you? 👀
This almost sounds like an extension of KNN to the unsupervised domain….very cool🥳🧐🤓
Great introduction! What is your background if I may ask?
I'm from physics and computer science. 🙃 Ms. Coffee Bean is from my coffee roaster.
What is your background if we may ask? And what brings you to UMAP?
@@AICoffeeBreak Hello :) I thought as much. My background is in theoretical physics, but I am making a living in analyzing neuroscience (calcium imaging) data. It seems that neuroscience is now very excited in using the latest data reduction techniques, hence my interest in UMAP. :) I really like the "coffee bean" idea: friendly, very approachable and to the point.
Theoretical physicist in neuroscience! I'm impressed.
Thank you
Amazing!
You're amazing! [Insert Keanu Reeves meme here] 👀
Thanks for watching and for dropping this wholesome comment!
interesting how the 2d graph of the mammoth becomes kind of like the mammoth on its stomach with its limbs spread out
I think it is import umap-learn instead of import umap. Great video. Just weird I cannot get it to run on google colab. When I run cell with bp variable, it is just blank. No errors. Weird.
I am curious if anyone knows if it is possible to use UMAP (or other projection algorithms) in the other direction: From a low dimensional projection -> a spot in high dimensional space?
An example would be picking a spot between clusters in the 0-9 digit example (either 2d or 3d) and seeing what the new resulting "number" looked like (in pixel space).
What you are asking for is a generative model. But let's start from the bottom.
I don't want to say that dimensionality reduction is easy, but let's put it like this: summarizing stuff (dim. reduction) is easier than inventing new stuff (going from low to high dimensions). Because the problem you are asking about is a little loser defined since all these new dimensions have to be filled *meaningfully*.
Happily, there are methods that do these kinds of generations. In a nutshell, one trains them on lots and lots of data to generate the whole data sample (an image of handwritten digits) from summaries. Pointer -> you might want to look onto (variational) Autoencoders and Generative Adversarial Networks.
@@AICoffeeBreak Thank you for the long response! I am moderately familiar with both GANs and VQ-VAEs but did not know if a generated sample could be chosen from the UMAP low dimensional projected space.
For example, the VAE takes images, compresses it to an embedded space and then restores the original. UMAP could take that embedded space and further reduce it to represent it in a 2D graph.
So what I want is 2D representation -> embedding -> full reconstructed new sample. I was uncertain if that 1st step is permitted.
@@terryr9052 I would say yes, this is possible and I think you are on the right track, so I'll push further. :)
With GANs, this is minimally different, I will focus on VAEs for now:
*During training* a VAE does exactly as you say: image (I) -> low. dim. embedding (E) -> image (I), therefore the name AUTOencoder. What I think is relevant for you is that E can be 2-dimensional. The dimensionality of E is actually a hyperparameter and you can adjust it like the rest of your architecture flexibly. Choosing such a low dimensionality of E might only mean that when you go from I -> E -> I, the whole process is lossy. I -> E (the summary, encoder) is simple. But E -> I, the reconstruction or in a sense: the re-invention of information (decoder) in many dimensions is complicated to achieve from only 2 dimension. Therefore it is easier when the dimensionality of E is bigger (something like 128-ish in "usual" VAEs).
In a nutshell, what I just described in the I -> E step is what any other dimensionality reduction algorithm does too (PCA; UMAP; t-SNE). But this time, it's implemented by a VAE: The E -> I step is what you want, and here it comes for free. Because what you need is the *testing step*.
You have trained a VAE that can take any image, encode it (to 2 dims) and decode it. But now with the trained model, you can just drop the I -> E and position yourself somewhere in the E space (i.e. give it an E vector) and let the E -> I routine run.
I do not know how far I should go, because I also have thoughts for the case where you really, really want to use I -> E to be forcibly the UMAP routine and not a VAE encoder. Because in that case, you would need to train only a decoder architecture. Or a GAN. Sorry, it gets a little too much to put into a comment. 😅
@@AICoffeeBreak Thanks again! I'm going to read this carefully and give it some thought.
will be here tSNE ?
Nice video! And 784 :D
Thank you very much! Did Ms. Coffee Bean say something wrong with 784? 😅
Ah, now I noticed. She said 764 instead of 784. Seems like Ms. Coffee Bean cannot be trusted with numbers. 🤫
Very nice :D
Thank you! Cheers!
wow.
ValueError: cannot reshape array of size 47040000 into shape (60000,784)
What the matter with this xD
Ok I solved this, I had 6k instead of 60k
Hoffentlich viele Freunde Vertrauen! Ich bringe meine Freundin, ihr Haus zu kaufen!
It looks like we have a strong Cold Mirror fanbase here. Ms. Coffee Bean is also a fan of hers, btw.
Does the Babyplots librari still supported? It does not work for me in all envs I've tried.. :(
Hi! I'm the creator of babyplots. Yes, the library is still actively supported. If you're having issues with getting started, please join the babyplots discord server, which you'll find on our support page: bp.bleb.li/support or write an issue on one of the github repositories. I'll be sure to help you there.
How do you judge the performance of UMAP on your data? In PCA you can look at the explained variance, but what about UMAP?
You can't say that PCA "can be put in company with SVD". SVD is one of available implementations of PCA. PCA means "a linear transformation, that transform data into a bases with first component aligned with direction of maximum variation, second component aligned with direction of maximum variation of data, projected on hyperplane orthogonal to first component, etc". SVD is a matrix factorisation method. It turns out, that when you perform SVD you get PCA. But it doesn't mean that SVD is dimensionality reduction algorithm - SVD is a way to represent a matrix. It can be used for many different purposes (ex. for quadratic programming), not necessarily reduction of dimensionality. Same for PCA, it can be performed using SVD, but other numerical methods exist as well.
You make some good observations, but we do not entirely agree. We think there are important differences between SVD and PCA. In any case, there by "put into company" we did not mean to go into the specific details about the relationship between these algorithms. It was meant more like "if you think about PCA, you should think about matrix factorization like SVD or NMF", this is what we understand by "put into company" as we do not say "it is" or "is absolutely and totally *equivalent* with".
와드
I saw no proof of best so you failed to answer your own question.
that coffee bean looks like a "shit"