StatQuest: t-SNE, Clearly Explained

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 747

  • @statquest
    @statquest  5 лет назад +72

    Corrections:
    6:17 I should have said that the blue points have twice the density of the purple points.
    7:08 There should be a 0.05 in the denominator, not a 0.5.
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @linweitao6470
      @linweitao6470 4 года назад +1

      Thanks very much for the informative lecture and it is really helpful. UMAP is more and more popular now, could you explain it and compare with tSNE as well? Thanks in advance.

    • @statquest
      @statquest  4 года назад +6

      @@linweitao6470 I should have a UMAP StatQuest ready in a few weeks. I'm working on it right now.

    • @linweitao6470
      @linweitao6470 4 года назад +1

      @@statquest Thanks again!

    • @CompBioQuest
      @CompBioQuest 4 года назад +2

      @@statquest UMAP is great, I dont know if it is more popular. There are more stringent reductions out there like ICA. I wonder the thoughts of Josh about it?

    • @statquest
      @statquest  4 года назад +2

      @@CompBioQuest I guess it largely depends on the field. Right now, genetics and molecular biology are going bonkers over UMAP. However, ICA is very interesting. Thanks to your question, I found this article which is fascinating: gael-varoquaux.info/science/ica_vs_pca.html

  • @abdulgadirhussein2244
    @abdulgadirhussein2244 4 года назад +84

    I am always blown away by how you make statistics & machine learning algorithms so simple to understand and how you graciously share your knowldege. Keep up the great work man, you are awesome!

    • @statquest
      @statquest  4 года назад +2

      Thank you very much! :)

  • @잠꾸러기-g1s
    @잠꾸러기-g1s 3 года назад +21

    Whenever I find statistics technique I have never seen in scientific article, I always visit your channel. Thanks a lot!!

  • @gustavomorais4489
    @gustavomorais4489 3 года назад +13

    I never leave comments, but I really feel the need to thank you for being able to explain this in such a simple way

  • @jjlian1670
    @jjlian1670 5 лет назад +12

    Josh is so far my favorite RUclipsr that is able to explain complex stats concepts so smoothly.

    • @statquest
      @statquest  5 лет назад +1

      Thank you so much! :)

  • @veronikaberezhnaia248
    @veronikaberezhnaia248 3 года назад +13

    I regret I can't put 1000 likes! I read about 20 articles about t-SNE, they are similar to one another, almost identical - and they don't get me closer to the point. But your video - I watched it 4 times (because the topic is hard, at least for me) with making some and drawing - but finally I understand how it works, up to the point that I can explain it to someone else. So many thanks to you!

    • @statquest
      @statquest  3 года назад

      HOORAY!!! TRIPLE BAM! I'm glad the video was helpful. BAM! :)

  • @RezaRob3
    @RezaRob3 4 года назад +4

    I'm writing this comment while having watched only half way into this video, which is pretty unusual for me!
    It is so clearly explained! I once glanced at the t-SNE paper and didn't understand it. If this is what it does then this is how things like this should be explained!
    Really, we need people explaining science like this! It's possible to read scientific papers, but what they fail to do is properly communicate the core idea to the reader so that the reader quickly grasps the big picture and the intent of the mathematical details without getting lost in the details.
    Frequently, even a missing definition can make reading papers much harder for non experts.

    • @statquest
      @statquest  4 года назад +1

      I'm glad you liked this video so much! :)

  • @atakanekiz
    @atakanekiz 5 лет назад +270

    Great explanations! Can you please do a video explaining UMAP and potentially how it compares to t-SNE? Thanks!

  • @OnSightNoMore
    @OnSightNoMore 4 года назад +7

    It's impressive how you managed to explain the essential concepts of this chain of algorithms in such a clear way! I'm sharing this video with my beginner fellows, who normally flee as soon as I say words like nearest-neighbor or stochastic.
    Thank you very much!

    • @statquest
      @statquest  4 года назад

      Thank you very much! :)

    • @willykitheka7618
      @willykitheka7618 2 года назад +2

      🤣🤣🤣🤣it's that terrifying?!? Barbara Oakley in her book, "a mind for numbers" called them zombies🤣🤣🤣

  • @kass8036
    @kass8036 7 лет назад +241

    I never knew machine learning could be as simple as... BAM

    • @thomasrad6296
      @thomasrad6296 4 года назад +1

      Thats like the most important lesson.

    • @namimiable
      @namimiable 4 года назад +2

      Double bam 💥

    • @kalyanben10
      @kalyanben10 4 года назад +3

      Just a random comment so that someone can say triple bam

    • @kass8036
      @kass8036 4 года назад +5

      Triple bam 💥

    • @birenpatel894
      @birenpatel894 3 года назад +3

      hurayyyy we have made it to the END !!!

  • @gayathrikurada3315
    @gayathrikurada3315 4 года назад +5

    Josh.. Your explanation is always "simple and easy to understand" even for layman.You are simply "The life Saviour" !!!
    Thank you so much :)

    • @statquest
      @statquest  4 года назад +1

      Hooray! I'm glad my video was helpful. :)

  • @sarangak.mahanta6168
    @sarangak.mahanta6168 2 года назад +1

    The only educational channel which brings a smile to my face.

  • @douglasaraujo9763
    @douglasaraujo9763 4 года назад +107

    As entertaining as watching a Walt t-SNE movie!

    • @statquest
      @statquest  4 года назад +14

      You made me laugh out loud! BAM! :)

    • @arenashawn772
      @arenashawn772 8 месяцев назад +1

      Best stat-word-play of the year! 😂

  • @nanopore-sequence
    @nanopore-sequence 4 года назад +7

    I am a student in Japan.
    I'm not good at English, but it was very easy to understand and I learned a lot:)

  • @Kmysiak1
    @Kmysiak1 4 года назад +1

    This explanation almost makes tSME sound like a clustering technique not a reduction technique..... That said, this was by far the best explanation I've heard to date.

    • @statquest
      @statquest  4 года назад

      That's a good observation. In many ways t-SNE is a hybrid method that reduces dimensions by clustering.

    • @Kmysiak1
      @Kmysiak1 4 года назад +1

      @@statquest Now if you can explain how to interpret a tSME plot. This would help immensely as it's virtually impossible to determine the correct perplexity number without understanding how to interpret the plot. This seems like one of those "blackbox" methods which we just trust. Keep up the great work!

  • @edridgedsouza1170
    @edridgedsouza1170 4 года назад +55

    "This is Josh Starmer, and you're watching Tisney Channel!"

  • @DoanQuocHoan
    @DoanQuocHoan 4 года назад +3

    I was so confusing about t-SNE until I watched this. It's clear and very easy to understand! Thank you! Like your BAM. :D

  • @snackbob100
    @snackbob100 4 года назад +4

    Josh, i literally love your videos, they are really helping me get through my ADV CS degree. I am going to buy one of your shirts, and wear it on campus as a thank you!

    • @statquest
      @statquest  4 года назад +1

      That would be awesome!!! Thank you very much! :)

  • @thedrunkprogrammer1474
    @thedrunkprogrammer1474 4 года назад +1

    I really can't appreciate you enough for your videos.
    Books and blogs only make sense after I watch your videos!

    • @statquest
      @statquest  4 года назад

      Thank you very much! :)

  • @alvarovs89
    @alvarovs89 Год назад +1

    Just hear about t-SNE and I did not quite understand how it works so I crossed my fingers hoping that josh did a video of this and of course he did!! haha
    I have my popcorn ready to enjoy this video :)

  • @Ravi5ingh
    @Ravi5ingh 4 года назад +1

    It's rare to come across such a brilliant explanation.

  • @tuongminhquoc
    @tuongminhquoc 2 года назад +1

    Thank you. I am not sure if you remember me from the PCA video. I have a job now. My job do not have high salary, but I could now support you by donating and thank you now. 😊

    • @statquest
      @statquest  2 года назад +1

      WOW! Thank you so much. And congratulations on getting a job!!! HOORAY!!! TRIPLE BAM! :)

    • @tuongminhquoc
      @tuongminhquoc 2 года назад +1

      @@statquest Keep doing great work sir! Also, it would be great if you could make a video about the comparation between clustering methods. 😁

    • @statquest
      @statquest  2 года назад +1

      @@tuongminhquoc Thanks and I'll keep that in mind!

  • @shanthinagasubramanian2866
    @shanthinagasubramanian2866 2 года назад +1

    Very nice way of teaching ! ML concepts CLEARLY EXPLAINED and BAM adds lot of curiosity in the videos :) Thanks for your videos. And not to forget your songs are really nice :)

  • @ramnarasimhan1499
    @ramnarasimhan1499 7 лет назад +1

    Fantastic video. I really appreciate all the slides that you made to get the animation effect. It really helped. Possibly the best explanation of t-SNE around. Keep up the good work.

  • @nikhilgoparapu8183
    @nikhilgoparapu8183 4 года назад +2

    Very clearly explained!
    Loved the way you explained such a complicated concept so intuitively.
    Thank you.

    • @statquest
      @statquest  4 года назад

      Glad it was helpful!

  • @vishnumuralidharan9858
    @vishnumuralidharan9858 Год назад +2

    Hi Josh, I can't thank you enough for how much I have benefitted from your videos even though I do data science as part of my day job. Thank you so much for sharing your knowledge!
    One request for a video: could you do a video of when to use which methods / models in a typical data science problem? Much appreciated.

  • @pierrefoidart5368
    @pierrefoidart5368 4 года назад

    Thanks a lot!! These videos are much more clear than any article!
    A video explaining UMAP (related to t-SNE) would be awesome !

    • @statquest
      @statquest  4 года назад

      I'm working on UMAP. For now, however, know that it is almost 100% the same as t-SNE. The differences are very subtle.

  • @saiakhil4751
    @saiakhil4751 3 года назад +3

    Why I couldn't stop bamming the like button??!! You're the best Josh!!

  • @srishtikumar5544
    @srishtikumar5544 4 года назад +1

    Excellently explained! I really like your simple, clear, concise explanation - those 3 factors make a world of difference. And, great animations.

  • @goeCK
    @goeCK 5 лет назад +2

    Came here for understanding the t-SNE plots used in single cell transcriptomics - which I finally did, thanks! Overall, you helped me out already plenty of times!
    To display cells in during cell fate transition/acquisition e.g. different time points during neurodevelopment, often pseudo-temporal ordering is used.
    Since scRNA seq is becoming more and more popular, this might be a good next topic

    • @erazael
      @erazael 5 лет назад

      Same here, and I did not expect to understand so fast and clearly!

  • @axeleriksson8978
    @axeleriksson8978 7 лет назад +45

    Hey, love your videos!
    Just a typo but it should be 0.05 on the values to the right at 07:19. Confused me for a second so might clear things up for others.

  • @soumitachel3844
    @soumitachel3844 4 года назад +1

    Hello Josh, thank you for coming with such incredible videos. Data scientist’s life becomes easy.😬

    • @statquest
      @statquest  4 года назад

      Thank you! :)

    • @soumitachel3844
      @soumitachel3844 4 года назад

      StatQuest with Josh Starmer Hi a request to do a tutorial of UMAP.

  • @jannelis2845
    @jannelis2845 5 лет назад +2

    Very well explained ! Your video was recommended to us by our professors at ETH-Zürich.:)

  • @scifimoviesinparts3837
    @scifimoviesinparts3837 3 года назад +1

    The Best tutorial and explanation for TSNE so far! It's of great help! Thanks a lot!

  • @sagar_bro
    @sagar_bro 4 года назад +4

    I just love the way you start all your videos! Stat-Questtttttt :)

  • @abhaymathur9332
    @abhaymathur9332 5 лет назад

    this is such an awesome explanation of tsne that i dont need to watch any other video or read any other website/book. I dont think there can be a better explanation. Superlike.

  • @camilaarcu2254
    @camilaarcu2254 3 года назад +1

    You are incredible, Josh Starmer!! I loved this

  • @carlosalfonso5829
    @carlosalfonso5829 6 лет назад +1

    OH God, this is a great explanation, as Radel mention below, it would be nice to have an extended video of the algorithm as the one from PCA!!

    • @statquest
      @statquest  6 лет назад

      Thank you! Yes, one day I'll break the actual equations down and do "step-by-step" explanation of t-SNE.

    • @niteshturaga
      @niteshturaga 6 лет назад

      Looking forward to this.

  • @imamalva5603
    @imamalva5603 4 года назад +1

    you are the hero, keep explaining complex thing into simple. thankss

  • @veeek8
    @veeek8 2 года назад +1

    Brilliant explanation, this has been bugging me all day, thank you!!

  • @bright1402
    @bright1402 6 лет назад

    This is the best video for t-SNE that I have ever seen. Thanks a lot, man

  • @parvezrafi4098
    @parvezrafi4098 6 лет назад

    Thanks a lot. I really struggled to understand the concept first time I came across it in a book. Your video helped a lot. Great job!

  • @ImmutableHash
    @ImmutableHash 6 лет назад

    Awesome explanation, thank you so much! I read a few papers/books multiple times and barely have a clue, but with your vid I understand the concept just by watching it once!

  • @lilmoesk899
    @lilmoesk899 7 лет назад

    Great as always. I've heard of t-SNE before, but this was my first real introduction to it. Definitely want to go look at some more resources now.

  • @HR-yd5ib
    @HR-yd5ib 7 лет назад +19

    Excellent video! Perhaps you could add another video where you go through the actual algorithm and how the moves is actually computed.

  • @octour
    @octour 5 лет назад

    Thanks for such a clear explanation. You know, your channel already in the top list for me and very soon I'll watch all your videos..

  • @abhijitkumbhar1
    @abhijitkumbhar1 Год назад +1

    Difficult concept made so simple. Just brilliant!!!!

  • @chauphamminh1121
    @chauphamminh1121 5 лет назад

    You make a complex idea becomes so simple and understanding ! Great video. Thanks a lot

  • @sudortd
    @sudortd Год назад +1

    I need to watch 3 more times to fully understand. TRIPLE BAM!!!

  • @MrCEO-jw1vm
    @MrCEO-jw1vm 4 месяца назад +1

    Thank you so much for this great resource and how much investment you have made into it. I have understood this well.

    • @statquest
      @statquest  4 месяца назад

      Glad it was helpful!

  • @reedayoungblood
    @reedayoungblood 4 года назад +2

    Great video - thank you! One small insertion that I think would improve it: at ~2:07, right after showing what projecting on to the X or Y axis would look like, show one more example of projecting onto an arbitrary line to try to retain as much variance as possible (basically PCA). I think this could be done in 15-20 seconds, and would be helpful in comparing t-SNE to one of its most popular alternatives, which is helpful in deciding *when* to use an algorithm - one of the hardest things for beginners like myself.

  • @prateekyadav7679
    @prateekyadav7679 3 года назад

    I never thought I'd not understand a statquest video! :(

    • @statquest
      @statquest  3 года назад

      Bummer. What time point was confusing?

  • @redaaitouahmed8250
    @redaaitouahmed8250 4 года назад +2

    Super Mega BAM !! So great at what you do as always ... Tons of love sent your way ! Keep up the amazing work :D

  • @precisionimmunologyincubat2315
    @precisionimmunologyincubat2315 4 года назад +2

    Thank you so much! Right now everyone in our department (Systems Genetics at NYU Langone) is using UMAP. There aren't many great videos about it - it would be awesome if you could help us understand what all the hype is about!

    • @statquest
      @statquest  4 года назад +2

      UMAP is on the to-do list. I hope to get to it in the spring.

  • @thoniageo
    @thoniageo 3 года назад +1

    i am a huge fan of this channel! greetings from brazil ^^

  • @hulaalol
    @hulaalol 3 года назад +1

    thank you so much for this nice explanation. will help me a lot in my exams

  • @arenashawn772
    @arenashawn772 8 месяцев назад +1

    t-SNE in concept is a little dense to me so I am watching this video multiple times to think about the nitty gritty of it… I have three perhaps very naive questions so far: 1) with really high dimensional feature space for some data, how do t-SNE algorithms decide how many dimensions to use for the simplified data? In PCA it can be specified by inspecting the variance of data in each of the components to decide that new feature’s “contribution” in grouping/separating the datapoints, is there a similar measure that is used to decide how many dimensions are used in t-SNE? 2) Why is it only used as a visualization technique and not a true dimension-reduction method for data pre-processing in machine learning pipelines? 3) is it possible that the data do not converge in low dimensional space (i.e., you just could not move the second matrix so that it is similar enough to the first one)?
    I dug out the original 2008 paper from SkLearn citation and as usual was amazed by how you explained the fairly abstract idea in section 2 of the paper in a mere 20-minute long unhurried video, down to the analogy of the repelling and attraction of mapped data in the low dimensional space (the original paper interpreted the gradient decent method used to locate the low dimensional mapping of points as “springs between every point and all other points”) - no important detail is lost in your video yet they are organized in such a way that they follow a clear logic and do not overwhelm. That is mastery of the art of elucidation ❤
    Thanks as always for digesting these complicated items for the benefit of the students and present them in simplified yet informative ways, as always!

    • @statquest
      @statquest  8 месяцев назад

      Thank you very much! For t-SNE, I'm pretty sure it's always used to generate a 2 (or at most 3) dimensional graph that can be visualized. This is because, unlike PCA, where the axes (or PCs) actually represent something (the directions of the most variance), the axes in t-SNE are completely arbitrary. So there's no way to quantify or rank the axes in order of importance. And it is probably possible to have the low dimensional graph fail to converge. That said, if you'd like more details on t-SNE, check out my videos on UMAP - a related technique that is a little more popular: ruclips.net/video/eN0wFzBA4Sc/видео.html and ruclips.net/video/jth4kEvJ3P8/видео.html

  • @chaitanyakulkarni243
    @chaitanyakulkarni243 3 года назад +2

    Wish I could *Triple Bam* like this video! Such a simple explanation. Thanks a lot Josh :-)

    • @statquest
      @statquest  3 года назад +1

      Glad you liked it!

  • @DumplingWarrior
    @DumplingWarrior 2 месяца назад +1

    Hi Josh, great videos as always! I'm not sure if there's a video about this already, but could you do one with all the clustering or classification or dimensionality reduction methods compiled together and then compare their differences and similarities and talk about situations when we should use which? For example, after looking at many of the videos, I think I'm already a little lost on if I should use PCA or MDS or t-SNE on my data. Ty.

    • @statquest
      @statquest  2 месяца назад

      Thanks! I'll keep that in mind.

  • @deepika3389
    @deepika3389 3 года назад +1

    Kudos, I understood so effortlessly....tripple BAM!!!

  • @NirajKumar-hq2rj
    @NirajKumar-hq2rj 6 лет назад

    excellent explanation , this intuition helps to follow maths behind t-SNE

  • @mic9657
    @mic9657 Год назад +1

    Amazing work! perfectly explained!!!

  • @leixiao169
    @leixiao169 4 года назад +1

    your explanation is very very good! thanks!!!

  • @BusinessScience
    @BusinessScience 5 лет назад +2

    Hey, love your videos! We are actually using it to help explain key concepts in our application-focused courses. I'd love to see UMAP (similar to t-SNE), which is a bit more scalable.

    • @statquest
      @statquest  5 лет назад +3

      Thank you so much! It's on the to-do list. :)

    • @BusinessScience
      @BusinessScience 5 лет назад +1

      @@statquest Awesome! I'm using your content in my courses - Students love it. PCA, K-Means, & t-SNE. Will be using your ML videos as well. Your explanations are the best!

  • @RajeshSharma-bd5zo
    @RajeshSharma-bd5zo 3 года назад +1

    One word reaction after watching this video --> AWESOME!!

    • @statquest
      @statquest  3 года назад

      Thank you so much 😀!

  • @benw4361
    @benw4361 5 лет назад +1

    Love the vid. I was wondering how tsne works and you broke it down great and the explanation for the t distribution was short and to the point.

  • @Tony-Man
    @Tony-Man 7 месяцев назад

    Hi Josh, quality content! This channel continuously helps me to understand the idea behind so that the dry textbook explanations actually make sense. I still have a question. When you calculate the unscaled similarity score, how do you exactly determine the width of your guassian? I get it in the example that we already know the cluster. If I only want to visualize the data without having pre-defined clusters, what happens then?

    • @statquest
      @statquest  7 месяцев назад

      I talk more about the details of t-SNE and how it works in my videos on UMAP: ruclips.net/video/eN0wFzBA4Sc/видео.html and ruclips.net/video/jth4kEvJ3P8/видео.html

  • @debajitbhowmick7079
    @debajitbhowmick7079 3 года назад

    1. In Flow Cytometry we use median for almost all data analysis because it best describes the central tendency of the data. Is geo mean anyway better describe Flow Cytometry data or geomean is better for some types of Flow Cytometry experiments?
    2. What are the drawbacks of downsampling? If there are any way to identify when to avoid downsampling?
    3. What is the batch effect? How to identify and remove it? What is the basic principle of identification? What are the strategies to avoid begin with?

  • @UxJoy
    @UxJoy 4 года назад +1

    Dude this is super clear. Love the content! BAM

    • @statquest
      @statquest  4 года назад

      Thank you very much! :)

  • @somethingandapie
    @somethingandapie 6 лет назад +1

    Subscribed because that intro gave me life!

  • @DaniTeba
    @DaniTeba 4 года назад +1

    Thank you a lot for the video Josh.
    Let me point something out, and by minute 10:40, it looks like that t-sne perform a sort of the matrix, instead of minimizing the loss function by gradient descent.

    • @statquest
      @statquest  4 года назад +1

      Good point. I represented it as a matrix because, internally, all of the similarity scores are maintained that way.

  • @cyrilbaudrillart9690
    @cyrilbaudrillart9690 4 года назад

    Great explanation! Thank you so much... I think their is a typo @7:08. Oh oh... On upper part, sum of all scores is 0.24+0.5 instead of 0.24+ 0.05. BAM. Same mistake on the other equation with same denominator. Double BAM. Results are correct. Triple BAM :-)

    • @statquest
      @statquest  4 года назад +1

      Thanks! I added that note to the pinned comment.

  • @abcdefghi2650
    @abcdefghi2650 2 года назад +1

    Great videos! Great channel! Big thumbs UP!

  • @vikramreddy3699
    @vikramreddy3699 4 года назад

    Thank you Josh . I love the way you present concepts with simple examples.
    Could you please explain how you decided the red dot directions to the left, where as the orange on right side @5:30 ?

    • @statquest
      @statquest  4 года назад

      It doesn't matter what side of the curve the points are on, since the distance from the y-axis values on the curve will be the same (normal curves are symmetrical). However, in order for the points to be easily seen, I spread them out on different sides rather than piling them all up on top of each other.

    • @vikramreddy3699
      @vikramreddy3699 4 года назад +1

      @@statquest Thank you again

  • @petersu4869
    @petersu4869 3 года назад +2

    "Bam, I made that terminology up" :D :D , great vid, keep up the good work.

  • @rgarthwood3881
    @rgarthwood3881 5 лет назад +20

    "Clearly Expalined" indeed!

  • @daaronr
    @daaronr 3 года назад

    Love it! A few things could still be clarified (please?):
    At 07:40, which vector of distances must add up to 1 after scaling? The sum of distances from each point to all other points (regardless of cluster)?

  • @teresitaeyzaguirre4741
    @teresitaeyzaguirre4741 2 года назад

    hey Josh! great video as always. Is it necessary to normalize or scale the data before applying this algorithm?

    • @statquest
      @statquest  2 года назад

      I'm not sure. In theory, no, but in practice, PCA is usually used as a first pass to remove noise, and PCA requires things to be on the same scale.

  • @rrrprogram8667
    @rrrprogram8667 6 лет назад +2

    One sincere request .... Can you please make one consolidate video ( could be long video ) which one or two examples of each machine learning concepts you have explained in your channel, also comparing why we are using that particular concept to solve the issue.. what would be issues with other algorithms...
    Comparison video will surely help to further enhance understanding....

    • @statquest
      @statquest  6 лет назад

      That's a good idea, a worked out machine learning example from start to finish, and I'll put it on the to-do list.

    • @rrrprogram8667
      @rrrprogram8667 6 лет назад +1

      StatQuest with Josh Starmer thanks a lot Joshh... Waiting for it

  • @nabeelhasan6593
    @nabeelhasan6593 3 года назад

    i always find tsne difficult to understand but this video felt like a cake walk thank you for this amazing content , Also plz can u make a video on umap

    • @statquest
      @statquest  3 года назад +1

      I'm working on a UMAP video. However, for now just know that they are almost the exact same. The only differences are very subtle in how the matrices are made similar.

    • @nabeelhasan6593
      @nabeelhasan6593 3 года назад +1

      @@statquest BAM !!!

  • @p.b.3697
    @p.b.3697 4 года назад +1

    Thank you very much Josh . You made it easier to understand.

    • @statquest
      @statquest  4 года назад

      Hooray! I'm glad the video was helpful! :)

  • @flossenking
    @flossenking 2 года назад +1

    Hey great explanation! So, do the x and y values in a 2D t sne plot mean anything exactly? (or, for that matter, the value of position on the one axis in the video) Our professor told us that they dont because its reduced from higher dimensions

    • @statquest
      @statquest  2 года назад +1

      They don't mean anything, but for a different reason than your professor told you. It's not that they are reduced from higher dimensions, it's the method of how they were reduced. PCA, in contrast, is a good example of dimension reduction where the axes have meaning. For details, see: ruclips.net/video/FgakZw6K1QQ/видео.html

  • @Bedivine777angelprayer
    @Bedivine777angelprayer Год назад +1

    Thanks really great videos understood concepts so well

  • @sandipansarkar9211
    @sandipansarkar9211 4 года назад +1

    great explanation especially for beginners.Thanks

  • @MathPhysicsFunwithGus
    @MathPhysicsFunwithGus Год назад +1

    This is a great explanation thank you!

  • @abhijeetsinghbarath4248
    @abhijeetsinghbarath4248 4 года назад

    Do we get slightly different tSNE plots if we make them from the same data multiple times?
    Referring to what you mentioned- the points are initially placed randomly on the line, before the analysis moves them a little bit at a time. Thanks for the great video! Grabbing complex concepts with a BAM!

    • @statquest
      @statquest  4 года назад +1

      Yes, every time you make a t-SNE plot you will get a new graph. Thus, if you want to reproduce your graph, you have to set the random number seed before you run t-SNE.

    • @abhijeetsinghbarath4248
      @abhijeetsinghbarath4248 4 года назад +1

      @@statquest Thank you!

  • @jamesang7861
    @jamesang7861 5 лет назад

    the only info that's stuck clearly in my head in BAM..

  • @davidm7765
    @davidm7765 3 года назад +1

    Excellent work, thank you !!

  • @andyn6053
    @andyn6053 Год назад

    Please do videos about density estimation techniques such as GMM and KDE. Would also like to see Anomaly detection algorithms explained like i.e isolated forest etc.

    • @statquest
      @statquest  Год назад

      I'll keep those topics in mind.

  • @Underscore_1234
    @Underscore_1234 3 года назад

    Super clear. Is the small move carresponding to some learning rate multiplying the gradient of some distance between the expected distance matrix and the one we have?

  • @carlmemes9763
    @carlmemes9763 3 года назад +1

    Thanks for this wonderful video❤️

    • @statquest
      @statquest  3 года назад

      Glad you enjoyed it!

  • @nathalychicaizacabezas3055
    @nathalychicaizacabezas3055 3 года назад +1

    I am at the intro and love it already!

  • @sau002
    @sau002 5 лет назад +1

    Excellent intro to tSNE

  • @markcoffer9290
    @markcoffer9290 5 лет назад +1

    Well done! I would love to see videos on handling data outliers for regressions. Thanks!

  • @khaikit1232
    @khaikit1232 Год назад

    Hi Josh
    Thanks for the amazing video and I just have 2 questions that popped up in my mind
    1) Is my understanding correct that t-SNE does not actually know which points are in a cluster (yellow, red, blue)? t-SNE merely look at the 2 matrices of scaled similarity scores and at each step try to make the matrices more similar.
    2) Regarding why the t-distribution is used, you explained that without it the clusters would all be clumped up be difficult to see. I don't really understand why the clusters would be clumped up?

    • @statquest
      @statquest  Год назад

      1) That is correct. t-SNE does not take existing cluster information into account. It just looks as the matrices.
      2) The t-distribution has fatter tails compared to the normal distribution, so things can be further away and still give y-axis coordinates significantly greater than 0.

    • @Red_Toucan
      @Red_Toucan Год назад

      ​@@statquest I was wondering the same thing.
      If I understand correctly, using a t-distribution means that your new low-dimension similarity matrix will have larger similarity values that it would if we'd used a normal distribution, at least for points that are far away from each other.
      But I'm still not sure why this results in clusters that are farther from each other. Maybe it has to do with how the points "move" in order to get matching similarity matrices?

    • @statquest
      @statquest  Год назад

      @@Red_Toucan I think I should have clarified - that the t-distribution allows for the points within a cluster to not pile up on each other and make a single dot. The t-distribution spreads them out so each point can be easily seen. At least, that is my understanding of how this works.

  • @jarosawbachnio3753
    @jarosawbachnio3753 4 года назад +1

    Hi Josh, great video, many thanks! Anyway, I still don't get how do you determine the distribution properties (like standard deviation) for calculating unscaled similarity between two points. When you introduced half as dense cluster as the others, you used normal distribution with standard deviation doubled, what is quite intuitve. But you knew that this cluster is just half as dense as the others. The question is, how to know the properties of these distribution curves?

    • @statquest
      @statquest  4 года назад

      You estimate it from the data.

  • @janiobachmann5029
    @janiobachmann5029 6 лет назад

    Thank you Joshua for this amazing video! I just want to make sure I understand, so basically t-SNE determines the clusters in the high dimensional data then takes it "randomly" to the lower dimensional data and it tries to follow the patterns made in the high dimensional data. The only thing I was a bit confused at the end is what does the algorithm do at each step that it moves the point closer to the other clusters? (where you show the two matrix) The matrix to the left was all mixed up and the one to the right was organized due to the scaling scores we previously did. So how does the matrix to the left (lower dimension(1-D)) learns from the matrix to the right (higher dimension (2-D))? I hope you understood me but anyways Joshua best video on t-SNE in youtube. Thanks for sharing!

    • @janiobachmann5029
      @janiobachmann5029 6 лет назад

      Thanks Joshua! Now it makes perfect sense to me thanks for giving the explanation. I really find this algorithm useful for finding clusters and I just find fascinating how it determines groups of clusters. Again, thanks for your quick response to my question!

  • @parthgupta1562
    @parthgupta1562 3 года назад +1

    Beautiful explanations! Please make a video on Locally Linear Embedding too.

    • @statquest
      @statquest  3 года назад

      I'll keep that in mind.

  • @Elmirgtr
    @Elmirgtr 6 лет назад +6

    Your speak like Kevin from The Office. Great explanation, thanks a lot:)

  • @simonandrews5604
    @simonandrews5604 5 лет назад

    Incredibly helpful and well presented. Thank you.