UMAP: Mathematical Details (clearly explained!!!)

Поделиться
HTML-код
  • Опубликовано: 26 ноя 2024

Комментарии •

  • @statquest
    @statquest  2 года назад

    To learn more about Lightning: github.com/PyTorchLightning/pytorch-lightning
    To learn more about Grid: www.grid.ai/
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @mohamedarebi7419
    @mohamedarebi7419 2 года назад +5

    Can't wait to get all the nitty gritty details :)

  • @nourelislam8565
    @nourelislam8565 2 года назад +1

    Amazing!.... we actually need a video explaining the different normalization methods used in scRNA seq analysis especially SCTransform..
    Appreciate your support,
    Thanks

  • @hansenmarc
    @hansenmarc 2 года назад +1

    I have to agree with you. Spectral embedding does sound very cool!

  • @henriquefantinatti4601
    @henriquefantinatti4601 2 года назад +1

    Please, make videos about time series. Would be amazing seeing you talk about it. Love the videos!

    • @statquest
      @statquest  2 года назад

      I'll keep that in mind.

  • @dataanalyticswithmichael8931
    @dataanalyticswithmichael8931 2 года назад +1

    Finally, i like mathematical things and this video popped up in my recomendation

  • @utkarshtrehan9128
    @utkarshtrehan9128 2 года назад

    Level: God Level!
    Could we have a video on Factor Analysis!

    • @statquest
      @statquest  2 года назад

      I'll keep that in mind! :)

  • @mohamedamr8081
    @mohamedamr8081 2 года назад +4

    Hey Josh, thanks for the videos and the great content!
    I was wondering if you can make a video about causal inference, that would be great (for me lol), thanks.

    • @statquest
      @statquest  2 года назад +1

      I'll keep that in mind.

  • @QwakeRunner
    @QwakeRunner 2 года назад +4

    Hi Josh! would you mind showing us how this can be done via Python! Would be very happy to buy a template script from you and support your great work!

    • @statquest
      @statquest  2 года назад +2

      That's a good idea and I'll keep that in mind.

    • @anitat9727
      @anitat9727 2 года назад +1

      @@statquest I'd love this too

  • @cahbe6108
    @cahbe6108 2 года назад +1

    Thank you for the great video! One question: What happens if there are multiple closest neighbors with the same distance? Then there will be multiple similarity scores = 1. Then changing sigma might not help to get close to log2(num_neighbors) for the sum of similarities.

    • @statquest
      @statquest  2 года назад

      I'm not sure what the technical details are exactly, but I would guess it simply finds the value for sigma that gets the sum closest to the ideal value. It doesn't have to be exact.

  • @venkateshmunagala205
    @venkateshmunagala205 2 года назад +1

    Great Video. Thanks Josh

  • @amalnasir9940
    @amalnasir9940 Год назад +3

    Hi Josh, Spectral clustering is an interesting topic. I hope you cover it one day!

    • @statquest
      @statquest  Год назад +1

      I'll keep that in mind! :)

  • @AHMADKELIX
    @AHMADKELIX Год назад +1

    Hi Josh, thank for your explained

  • @mariapelaez3758
    @mariapelaez3758 Год назад

    Hi Josh,
    I saw both of your videos for UMAP, but I have a doubt regarding how you did the low-dimension graph. I knw you mention spectral embedding an what I know from that is that you calculate the Laplacian of the graph and then get the eigenvalues and eigenvector and the coordinates of the new dimension will be the values of the eigenvector for the lowest eigenvalue (ignoring zero eigenvalues). But, when I try that for your data, I am not able to get the values that you showed, so, I wanted to know if you did something different.
    Also I wanted to know if for more than 1 dimension, I will use more than 1 eigenvector right? For the 2d case the x-y coordinates will be the values of the first and second eigenvectors of the lowest eigenvalues.
    Thanks

    • @statquest
      @statquest  Год назад

      To be honest, I just drew the low-dimensional graph in a way that I thought would best highlight how UMAP works, rather than stay faithful to how spectral embedding would have projected the points. In other words, I completely ignored spectral embedding when I drew the low-dimensional graph and only took pedagogical aspects into consideration. I'm sorry if this caused confusion. :(

  • @AyushSharma-tg5co
    @AyushSharma-tg5co 2 года назад +1

    sir I'm looking forward to the data analyst domain, so pls tell me which playlist of yours I should follow to get the stats part done?

    • @statquest
      @statquest  2 года назад

      You can just start at the top of this page and work your way down: statquest.org/video-index/

  • @grinps
    @grinps 2 года назад +2

    waiting for the BAM!

  • @ass_im_aermel
    @ass_im_aermel 2 года назад +1

    Hi Josh! Is it possible, that you make some videos for time series related topics like: serial correlation or the box jenkins method?
    And thank you for all the videos you made in the last years. They are awesome :-)

    • @statquest
      @statquest  2 года назад +1

      I'll keep those topics in mind.

  • @muriloaraujosouza462
    @muriloaraujosouza462 Месяц назад

    In "making the scores symmetrical" part, the operation is done even if the scores are already symmetrical. For example, the score from A->B was the same as the score from B->A. And the score from A->C was the same as the score from C->A. Any ideas why it is necessary to perform this operation in already symmetrical scores?

    • @statquest
      @statquest  Месяц назад +1

      Probably not and it is possible that efficient implementations avoid the extra math.

  • @ieserbes
    @ieserbes Год назад

    Hello Josh, Amazing explanation. Thank you so much. Where does 2.1 (lower dimensional distance between a and b ) come from? I am a bit lost there.

    • @statquest
      @statquest  Год назад

      If you look at the number line that the points are on, you'll see that point 'b' is is at about 1.8 and point 'a' is at about 3.9. Now we just do the math: 3.9 - 1.8 = 2.1. Bam.

    • @ieserbes
      @ieserbes Год назад +1

      @@statquest Hooray 😄 Thank you Josh.

  • @conlele350
    @conlele350 2 года назад

    Hi Josh, could you please take some time to give some explanation about Markov chain decision process and it application in ML and how do we code it in R. Thanks

    • @statquest
      @statquest  2 года назад

      I'll keep those topics in mind.

  • @alexmiller3260
    @alexmiller3260 2 года назад

    Hey, Josh! Can you make video(s) about likelihood and MLE for an unknown distribution, if it can't be easily approximated or it's impossible to approximate? Because everyone talks about well-known distributions, but literally nothing about working with something unknown.
    Of course it would be better, if you decided to make a playlist with everything about unknown distributions, but couple of videos is also OK

    • @statquest
      @statquest  2 года назад +2

      I'll talk about this topic when we cover Bayesian statistics. That said, even when the distribution is unknown, the central limit theorem ( ruclips.net/video/YAlJCEDH2uY/видео.html ) results in a known (gaussian) distribution. And that means it doesn't matter what the original distribution is.

  • @harryhamjaya
    @harryhamjaya Год назад

    Hello could you bring Spectral Embedding topic, and I have a question regarding UMAP how is it possible that UMAP is faster than T-SNE, in which part does it beat the t-sne?
    Since logically T-SNE moves more point in a time compared to UMAP(?)
    Correct Me If I Wrong
    and BTW we are hoping the new updates ❤

    • @statquest
      @statquest  Год назад

      I talk about why UMAP is faster than t-SNE in my other UMAP video here: ruclips.net/video/eN0wFzBA4Sc/видео.html

  • @xavisolersanchis7145
    @xavisolersanchis7145 Год назад

    Hi, thank you very much! Just one question. In your case you could compute the initial distances between the data points as euclidean distances because you are only working wuth two features. How are they computed when you have much more features, do you always start with euclidean distances??

    • @statquest
      @statquest  Год назад +1

      The Euclidean distance works for more than 2 features, en.wikipedia.org/wiki/Euclidean_distance so there's no problem adding more features. That said, if you wanted to use a different distance metric, it would probably be OK.

    • @xavisolersanchis7145
      @xavisolersanchis7145 9 месяцев назад

      Thank you very much! I also realized there is a little error in the video. It is the part that you say the result seems strange to you. To compute the Symmetrical Score they don't take what you call the "Similarity scores". What they do is to iterate over the y nearest neighbours and compute the distance between x and y as the maximum value between 0 and the distance between x and y minus the distance to the nearest neighbour of x; all of this divided by the previously learnt sigmas. Then they exp-1 this dist to get a Similarity score between x and y (saved as the probability of this element in the fuzzy set). Once you have all this fuzzy set you apply the t-conorm you mention over this Similarity scores to have the Symmetrical score. I hope this is helpful to you, and I also hope not being wrong hehe. Thank you very much!@@statquest

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader 2 года назад

    The day you do Hilbert Curves will be a lot of BAMMMMMSSSSSSS

    • @statquest
      @statquest  2 года назад

      I'll keep that in mind! :)

  • @Pedritox0953
    @Pedritox0953 2 года назад +1

    Great video!

  • @3Mus-cat-tears
    @3Mus-cat-tears Год назад

    Hi Josh! Thank you for the info! It's really helpful.
    What to do if I have zeros or NAs in my dataset? I couldnt find anything on imputation before UMAP on Google :(

    • @statquest
      @statquest  Год назад +1

      there might not be a UMAP specific imputation method, so if you just search for imputation methods in general, you might find something that works.

  • @ouryly1541
    @ouryly1541 11 месяцев назад +1

    Amazing!! Thank you

    • @statquest
      @statquest  11 месяцев назад

      Thank you too!

  • @lukesimpson1507
    @lukesimpson1507 2 года назад

    Hi Josh I was wondering how you feel about using some stills from your channel to explain these types of plots prior to displaying them? This would be done I'm an educational setting and I would credit the channel and provide a link if that is okay?

    • @statquest
      @statquest  2 года назад +1

      As long as you provide the link, it's fine with me.

    • @lukesimpson1507
      @lukesimpson1507 2 года назад +1

      @@statquest Thank you! Keep up the great videos!

  • @miguelcampos867
    @miguelcampos867 2 года назад

    Would be great talking about normalizing flows

    • @statquest
      @statquest  2 года назад

      I'll keep that in mind.

    • @miguelcampos867
      @miguelcampos867 2 года назад +1

      @@statquest thanks. In fact that’s the main reason of why I am watching all your videos of likelihoods, gaussian distribution etc jajaja (they are great btw)

  • @yeah6732
    @yeah6732 2 года назад +1

    Thank you!!

  • @meichendong3434
    @meichendong3434 2 года назад +1

    Love it!

  • @markmalkowski3695
    @markmalkowski3695 2 года назад +1

    Excellent! (mr burns style)

  • @annaluizavicente
    @annaluizavicente 2 года назад

    Hi!! Could you, please, make a video about PLSDA? Thanks

  • @veenakumar2384
    @veenakumar2384 2 года назад

    how do i know which is the best curve plotted against similarity scores (Y axis) and the cluster points wrt distances (X axis)?

    • @statquest
      @statquest  2 года назад

      When the sum of the similarity scores = log(num_nearest neighbors).

  • @buffetCodes
    @buffetCodes 2 года назад +2

    BAM!

  • @ridwanwase7444
    @ridwanwase7444 2 года назад

    Hey,do you know pranab k sen?
    He is Bengali and im Bengali too! Im an undergrad of statistics

  • @naman4067
    @naman4067 2 года назад +1

    Nice

  • @THEMATT222
    @THEMATT222 2 года назад +1

    Noice 👍