UMAP explained | The best dimensionality reduction?

Поделиться
HTML-код
  • Опубликовано: 26 ноя 2024

Комментарии • 107

  • @lelandmcinnes9501
    @lelandmcinnes9501 3 года назад +95

    Thanks for this -- it is a very nice short succinct description (with good visuals) that still manages to capture all the important core ideas. I'll be sure to recommend this to people looking for a quick introduction to UMAP.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +13

      Wow, we feel honoured by your comment! Thanks.

  • @gregorysech7981
    @gregorysech7981 3 года назад +17

    Wow, this channel is a gold mine

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +10

      I beg to differ. It is a coffee bean mine. 😉

  • @dengzhonghan5125
    @dengzhonghan5125 3 года назад +4

    That baby plot really looks amazing!!

  • @fleurvanille7668
    @fleurvanille7668 3 года назад +5

    I wish you are the teacher of all subjects in the world! Many thanks

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +4

      Wow, this is so heartwarming! Thanks for this awesome comment! 🤗

  • @bosepukur
    @bosepukur 3 года назад +3

    didnot know about babyplot...thanks for sharing !

  • @dexterdev
    @dexterdev 3 года назад +10

    wow! that is a very well dimensionally reduced version of UMAP algo

  • @Shinigami537
    @Shinigami537 2 года назад +2

    I have seen and 'interpreted' so many UMAP plots and have not understood its utility until today. Thank you.

  • @ShubhamYadav-xr8tw
    @ShubhamYadav-xr8tw 3 года назад +8

    I didn't know about this before! Thanks for this video Letitia!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +3

      Glad it was helpful! UMAP is a must-know for dimensionality reduction nowadays.

  • @willsmithorg
    @willsmithorg 2 года назад +3

    Thanks. I'd never heard of UMAP. Now I'll definitely be trying it as a replacement the next time I reach for PCA.

  • @20Stephanus
    @20Stephanus 2 года назад +2

    1st video i saw. Loved it. Subscribed.

  • @hiramcoriarodriguez1252
    @hiramcoriarodriguez1252 3 года назад +8

    The visuals are amazing

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +4

      You're amazing! *Insert Keanu Reeves meme here* 👀

  • @dzanaga
    @dzanaga 3 года назад +4

    Thanks for making this clear and entertaining! I love the coffee bean 😂

  • @capcloud
    @capcloud 2 года назад +1

    Love it, thanks Ms. Coffee and Letitia!

  • @CodeEmporium
    @CodeEmporium 3 года назад +3

    This is really good. Absolutely love the simplicity 👍

  • @pl1840
    @pl1840 2 года назад +7

    I would like to point out that the statement around 6:44 that says that changing the hyperparameters of tSNE completely changes the result of the embedding is very likely to be the result of a random initialisation on tSNE, whereas the UMAP implementation you are using brings the same initialisation for each set of hyperparameters. It is good practice to initialise tSNE with PCA; if that was the case in the video, the results between hyperparameter changes in tSNE and UMAP would be comparable.

  • @denlogv
    @denlogv 3 года назад +7

    Great work, Letitia! Needed this kind of introduction to UMAP :) And thanks for the links!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +4

      Glad it was helpful, Denis!
      Are you interested in UMAP for word embedding visualization? Or for something entirely different?

    • @denlogv
      @denlogv 3 года назад +3

      @@AICoffeeBreak yeah, something similar. Actually I found its use in BertTopic very interesting, where we reduce dimensionality of document embeddings (which leverage sentence-transformers) to later cluster and visualize different topics :)
      towardsdatascience.com/topic-modeling-with-bert-779f7db187e6

  • @gurudevilangovan
    @gurudevilangovan 3 года назад +2

    2 videos in and I’m already a fan of this channel. Cool stuff! 😎

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +2

      Hey thanks! Great to have you here.

  • @HighlyShifty
    @HighlyShifty Год назад +1

    Great introduction to UMAP, thanks

  • @ehtax
    @ehtax 3 года назад +4

    very fun and educative explanation of a difficult method! keep the vids coming ms coffeebean!!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +3

      Thank you! 😃 There will be more to come.

  • @ImbaFerkelchen
    @ImbaFerkelchen Год назад +1

    Hey Letitia, really amazing Video on UMAP. Love your easy to follow explanations :D Keep up the good work

  • @python-programming
    @python-programming 3 года назад +3

    This is incredibly helpful. Thanks!

  • @floriankowarsch8682
    @floriankowarsch8682 3 года назад +3

    Very nice explanation!

  • @emanuelgerber
    @emanuelgerber 7 месяцев назад +1

    Thanks for making this video! Very helpful

  • @DerPylz
    @DerPylz 3 года назад +8

    I finally understand!

  • @vi5hnupradeep
    @vi5hnupradeep 3 года назад +3

    Thank you so much !

  • @MachineLearningStreetTalk
    @MachineLearningStreetTalk 3 года назад +3

    Hello 😎

  • @luise.suelves8270
    @luise.suelves8270 11 дней назад +1

    sooo well explain, brilliant!

  • @HoriaCristescu
    @HoriaCristescu 3 года назад +4

    felicitari pentru un canal excelent

  • @rohaangeorgen4055
    @rohaangeorgen4055 3 года назад +3

    Thank you for explaining it wonderfully 😊

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +1

      So nice of you to leave this lovely comment here! 😊

  • @damp8277
    @damp8277 3 года назад +2

    Fantastic! Such a good explanation, and thanks for the babyplot tip. Awesome channel!!!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +3

      So glad you like it! ☺️

    • @damp8277
      @damp8277 3 года назад +1

      @@AICoffeeBreak It'll be very helpful. In geochemistry we usually work with 10+ variable, so having a complement to PCA will make analysis more robust

  • @老师你好-c7t
    @老师你好-c7t 3 года назад +3

    Find a great channel! Thanks for sharing

  • @sumailsumailov1572
    @sumailsumailov1572 3 года назад +2

    Very cool, thanks for it!

  • @BitBlastBroadcast
    @BitBlastBroadcast 3 года назад +2

    great explanation!

  • @ylazerson
    @ylazerson 2 года назад +1

    Awesome as always!

  • @jcwfh
    @jcwfh 3 года назад +2

    Amazing. Reminds me of Gephi.

  • @wapsyed
    @wapsyed 6 дней назад

    UMAP rocks! The only problem I see is the explainability of this high dimensionality reduction, which is easily done in PCA. In other words, you can get the best variables to explain the clustering, which is important when you are focusing on variable selection. What do you think?

  • @babakravandi
    @babakravandi Год назад +1

    Great video!

  • @arminrose4946
    @arminrose4946 3 года назад +5

    This is really fantastic stuff! Thanks for teaching it in such an easy-to-grasp way. I must admit I didn't manage the original paper, since I am "just" a biologist. But this video helped a lot.
    I would have a question: I wanted to project the phenological similarity of animals at certain stations, to see which stations were most similar in that respect. For each day at each station there is a value of presence or absence of a certain species. Obviously there is also temporal autocorrelation involved here. My first try with UMAP gave a very reasonable result, but I am unsure if is a valid method for my purposes. What do you think, Letitia or others?

  • @DungPham-ai
    @DungPham-ai 3 года назад +5

    Love you so much.

  • @marianagonzales3201
    @marianagonzales3201 2 года назад +1

    Thank you very much! that was a great explanation 😊

  • @dionbridger5944
    @dionbridger5944 Год назад +1

    Very nice explanation. Do you have any other videos with more information about umap? What are the limitations as compared with e.g. deep neural nets?

  • @arnoldchristianloaizafabia4657
    @arnoldchristianloaizafabia4657 3 года назад +4

    Hello, What is the complexity of UMAP? . Thanks for the video.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +3

      I think the answer to your question is here 👉github.com/lmcinnes/umap/issues/8#issuecomment-343693402

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader 3 года назад +3

    Great vid

  • @AntiJew964
    @AntiJew964 Месяц назад

    I really like your vids

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 5 месяцев назад

    I am afraid you did not fully understood the mechanism of Information geometry behind UMAP and how the KL-divergence acts as the "spring-dampener" mechanism. Keenan Crane and Melvin Leok have great educational materials on the topic.

  • @furek5
    @furek5 3 года назад +2

    Thank you!

  • @talithatrost3813
    @talithatrost3813 3 года назад +5

    Wow! Wow! Ich mag es!

  • @klammer75
    @klammer75 2 года назад

    This almost sounds like an extension of KNN to the unsupervised domain….very cool🥳🧐🤓

  • @thomascorner3009
    @thomascorner3009 3 года назад +2

    Great introduction! What is your background if I may ask?

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +1

      I'm from physics and computer science. 🙃 Ms. Coffee Bean is from my coffee roaster.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +1

      What is your background if we may ask? And what brings you to UMAP?

    • @thomascorner3009
      @thomascorner3009 3 года назад +1

      @@AICoffeeBreak Hello :) I thought as much. My background is in theoretical physics, but I am making a living in analyzing neuroscience (calcium imaging) data. It seems that neuroscience is now very excited in using the latest data reduction techniques, hence my interest in UMAP. :) I really like the "coffee bean" idea: friendly, very approachable and to the point.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +1

      Theoretical physicist in neuroscience! I'm impressed.

  • @shashankkumaryerukola
    @shashankkumaryerukola Год назад +1

    Thank you

  • @AmruteshPuranik
    @AmruteshPuranik 3 года назад +2

    Amazing!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +1

      You're amazing! [Insert Keanu Reeves meme here] 👀
      Thanks for watching and for dropping this wholesome comment!

  • @cw9249
    @cw9249 Год назад +1

    interesting how the 2d graph of the mammoth becomes kind of like the mammoth on its stomach with its limbs spread out

  • @TooManyPBJs
    @TooManyPBJs 3 года назад

    I think it is import umap-learn instead of import umap. Great video. Just weird I cannot get it to run on google colab. When I run cell with bp variable, it is just blank. No errors. Weird.

  • @terryr9052
    @terryr9052 3 года назад +1

    I am curious if anyone knows if it is possible to use UMAP (or other projection algorithms) in the other direction: From a low dimensional projection -> a spot in high dimensional space?
    An example would be picking a spot between clusters in the 0-9 digit example (either 2d or 3d) and seeing what the new resulting "number" looked like (in pixel space).

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +3

      What you are asking for is a generative model. But let's start from the bottom.
      I don't want to say that dimensionality reduction is easy, but let's put it like this: summarizing stuff (dim. reduction) is easier than inventing new stuff (going from low to high dimensions). Because the problem you are asking about is a little loser defined since all these new dimensions have to be filled *meaningfully*.
      Happily, there are methods that do these kinds of generations. In a nutshell, one trains them on lots and lots of data to generate the whole data sample (an image of handwritten digits) from summaries. Pointer -> you might want to look onto (variational) Autoencoders and Generative Adversarial Networks.

    • @terryr9052
      @terryr9052 3 года назад +1

      @@AICoffeeBreak Thank you for the long response! I am moderately familiar with both GANs and VQ-VAEs but did not know if a generated sample could be chosen from the UMAP low dimensional projected space.
      For example, the VAE takes images, compresses it to an embedded space and then restores the original. UMAP could take that embedded space and further reduce it to represent it in a 2D graph.
      So what I want is 2D representation -> embedding -> full reconstructed new sample. I was uncertain if that 1st step is permitted.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +2

      ​@@terryr9052 I would say yes, this is possible and I think you are on the right track, so I'll push further. :)
      With GANs, this is minimally different, I will focus on VAEs for now:
      *During training* a VAE does exactly as you say: image (I) -> low. dim. embedding (E) -> image (I), therefore the name AUTOencoder. What I think is relevant for you is that E can be 2-dimensional. The dimensionality of E is actually a hyperparameter and you can adjust it like the rest of your architecture flexibly. Choosing such a low dimensionality of E might only mean that when you go from I -> E -> I, the whole process is lossy. I -> E (the summary, encoder) is simple. But E -> I, the reconstruction or in a sense: the re-invention of information (decoder) in many dimensions is complicated to achieve from only 2 dimension. Therefore it is easier when the dimensionality of E is bigger (something like 128-ish in "usual" VAEs).
      In a nutshell, what I just described in the I -> E step is what any other dimensionality reduction algorithm does too (PCA; UMAP; t-SNE). But this time, it's implemented by a VAE: The E -> I step is what you want, and here it comes for free. Because what you need is the *testing step*.
      You have trained a VAE that can take any image, encode it (to 2 dims) and decode it. But now with the trained model, you can just drop the I -> E and position yourself somewhere in the E space (i.e. give it an E vector) and let the E -> I routine run.
      I do not know how far I should go, because I also have thoughts for the case where you really, really want to use I -> E to be forcibly the UMAP routine and not a VAE encoder. Because in that case, you would need to train only a decoder architecture. Or a GAN. Sorry, it gets a little too much to put into a comment. 😅

    • @terryr9052
      @terryr9052 3 года назад +2

      @@AICoffeeBreak Thanks again! I'm going to read this carefully and give it some thought.

  • @divergenny
    @divergenny 2 года назад

    will be here tSNE ?

  • @hannesstark5024
    @hannesstark5024 3 года назад +2

    Nice video! And 784 :D

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +2

      Thank you very much! Did Ms. Coffee Bean say something wrong with 784? 😅

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +2

      Ah, now I noticed. She said 764 instead of 784. Seems like Ms. Coffee Bean cannot be trusted with numbers. 🤫

  • @kiliankleemann4251
    @kiliankleemann4251 Год назад

    Very nice :D

  • @nogribin
    @nogribin 2 года назад +1

    wow.

  • @sonOfLiberty100
    @sonOfLiberty100 3 года назад +1

    ValueError: cannot reshape array of size 47040000 into shape (60000,784)

  • @lisatrost7486
    @lisatrost7486 3 года назад +4

    Hoffentlich viele Freunde Vertrauen! Ich bringe meine Freundin, ihr Haus zu kaufen!

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +2

      It looks like we have a strong Cold Mirror fanbase here. Ms. Coffee Bean is also a fan of hers, btw.

  • @pvlr1788
    @pvlr1788 2 года назад

    Does the Babyplots librari still supported? It does not work for me in all envs I've tried.. :(

    • @DerPylz
      @DerPylz 2 года назад +4

      Hi! I'm the creator of babyplots. Yes, the library is still actively supported. If you're having issues with getting started, please join the babyplots discord server, which you'll find on our support page: bp.bleb.li/support or write an issue on one of the github repositories. I'll be sure to help you there.

  • @Skinishh
    @Skinishh 2 года назад

    How do you judge the performance of UMAP on your data? In PCA you can look at the explained variance, but what about UMAP?

  • @luck3949
    @luck3949 3 года назад

    You can't say that PCA "can be put in company with SVD". SVD is one of available implementations of PCA. PCA means "a linear transformation, that transform data into a bases with first component aligned with direction of maximum variation, second component aligned with direction of maximum variation of data, projected on hyperplane orthogonal to first component, etc". SVD is a matrix factorisation method. It turns out, that when you perform SVD you get PCA. But it doesn't mean that SVD is dimensionality reduction algorithm - SVD is a way to represent a matrix. It can be used for many different purposes (ex. for quadratic programming), not necessarily reduction of dimensionality. Same for PCA, it can be performed using SVD, but other numerical methods exist as well.

    • @AICoffeeBreak
      @AICoffeeBreak  3 года назад +6

      You make some good observations, but we do not entirely agree. We think there are important differences between SVD and PCA. In any case, there by "put into company" we did not mean to go into the specific details about the relationship between these algorithms. It was meant more like "if you think about PCA, you should think about matrix factorization like SVD or NMF", this is what we understand by "put into company" as we do not say "it is" or "is absolutely and totally *equivalent* with".

  • @search_is_mouse
    @search_is_mouse 3 года назад

    와드

  • @joelwillis2043
    @joelwillis2043 Год назад

    I saw no proof of best so you failed to answer your own question.

  • @MrChristian331
    @MrChristian331 3 года назад

    that coffee bean looks like a "shit"