StatQuest: PCA main ideas in only 5 minutes!!!

Поделиться
HTML-код
  • Опубликовано: 28 сен 2024

Комментарии • 737

  • @statquest
    @statquest  6 лет назад +110

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
    Special thanks to PROTIST for the Russian subtitles!!! :)

    • @falaksingla6242
      @falaksingla6242 2 года назад +1

      Hi Josh,
      Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so.
      Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.

  • @tuongminhquoc
    @tuongminhquoc 4 года назад +524

    I appreciated your effort spent on these videos. Sadly, since I am still a student, I have no money to support you just a bit. So, I have spent much of my effort to translate your videos into my language as it is my best language as a thank to you. Hope you would accept my thank.

    • @statquest
      @statquest  4 года назад +67

      Thank you very much!!!! :)

    • @tuongminhquoc
      @tuongminhquoc 4 года назад +43

      @StatQuest with Josh Starmer I didn't think you would check this that soon :))) thanks for accepting my contribution!. I'll translate more of your videos whenever I have free time (and wifi haha :D )

    • @statquest
      @statquest  4 года назад +34

      @@tuongminhquoc I really appreciate it! :)

    • @leonguyen1204
      @leonguyen1204 3 года назад +9

      No wonder the subtitle was spot on! Great work mate, thanks for that! Also thanks @StatQuest with Josh Starmer, nice video with simple explanation, I'm trying to make sense of it.

    • @ngoctandang9307
      @ngoctandang9307 3 года назад +4

      Really excited when watching this video in Vietnamese subtitle, thank you!!!

  • @fartissimo
    @fartissimo 4 года назад +12

    I enjoy your videos and you are performing a valuable service. The few things I would mention that would be helpful are that PCA is really a measure of covariance in a sample and that PCA does NOT provide ANY indication of statistical significance. Understanding Covariance is helpful to really understand PCA. Also, PCA is particularly useful when patterns emerge between experimental and non-experimental parameters. If patterns associated with experimental parameters are observed (i.e. treatment conditions) it indicates that there may be changes between samples/populations that are of interest; in cases where there are patterns associated with non-experimental parameters (such as collection date or incubation conditions) it indicates that the date of collection resulted in more variance than experimental parameters. In such a case, it points to a possible flaw in experimental design so that it would be of benefit to re-evaluate sample collection/preparation/incubation etc... in the workflows to minimize the influence on the studied populations.

  • @chintanchawda9725
    @chintanchawda9725 6 лет назад +12

    Every maths prof. must be like, way of explaination is as simple as possible. Thank you.

  • @liranzaidman1610
    @liranzaidman1610 4 года назад +2

    2:03 People should stop here and listen very carefully because this is a really important concept, and I mean - Really important!
    When analyzing data and the parameters effecting the outcome of something - this must be the way to think.
    Great work

  • @Nicblocks
    @Nicblocks Год назад +2

    I had to wrap my head around PCA plots as part of a presentation and just could not understand it. This was really well done and I'll be taking this knowledge in with me. Thank you!

  • @verdasney
    @verdasney 3 года назад +1

    It is more understandable that my 1.5 h lecture and a good start of PCA class.
    Thank you for the video. Very well created.

    • @statquest
      @statquest  3 года назад

      Glad it was helpful!

  • @eemmmaaadd
    @eemmmaaadd 5 лет назад +2

    I always come across your videos when looking for stat information. And always your videos are the best.

    • @statquest
      @statquest  5 лет назад

      Awesome! :) Thank you! :)

  • @Lastmomenttuitions
    @Lastmomenttuitions 5 лет назад +70

    Very nicely explained thank you so much for Putting this

    • @statquest
      @statquest  5 лет назад +4

      Hooray! I'm glad you like it. :)

  • @boundlessbrittany
    @boundlessbrittany 3 года назад +2

    Holy moly. I finally understand the concept of PCA plots :O THANK YOU SO MUCH

  • @timlofton6595
    @timlofton6595 2 года назад +2

    Good stuff Josh. Going to the lengthier version to further blast this through my thick skull. 😃 Appreciate your efforts with this!

  • @UKFishCam
    @UKFishCam Год назад +2

    Thank you so much for making this video! I've got my final year project due soon and Id lost the plot before this video!

  • @ghatshilagogol
    @ghatshilagogol 3 месяца назад +1

    awesome pca for dim reduction with vertical+horizontal+depth all in one 3-d rotates

  • @edupazz
    @edupazz 2 года назад +1

    Great!! Your Calm and crystal pronunciation makes the concept very clear to understand. Thanks

  • @ohromujici
    @ohromujici 6 лет назад +1

    Thanks! This was a great overview. I am in big data for a pharma company and we added PCA to one of our data tools. The documentation we received was a little "dry" so thank you for putting this into easy to understand key concepts. This helped a lot. Also, I did my original graduate work in mRNA decay so bonus points for dragging mRNA into this. :) :) :)

  • @noneoftheabove1589
    @noneoftheabove1589 9 месяцев назад +1

    I love your humor! What a lovely way to present and explain. ahem.. what could be daunting to some lol (such as myself!) Grateful for the work and the passion! Keep up the good work!
    A new subscriber!

    • @statquest
      @statquest  9 месяцев назад

      Thanks so much!

  • @KingstonUponThames
    @KingstonUponThames 4 года назад +1

    Best explanations of PCA in layman terms. Great work. Thank you!

  • @ashutoshdongare5370
    @ashutoshdongare5370 Год назад +1

    I am fan of your videos and the way you explain. Most part of this video, and more prominently after 3:52, it started appearing like the plots of cells whereas it is really the plot of Genes and Cells are the dimentions of those Genes..

    • @statquest
      @statquest  Год назад

      The video is correct - if you do PCA based on correlations, you start with plots of genes and end up with plots of cells. This is confusing and one of the reasons I don't think it's a good idea to teach PCA from the perspective of correlations. However, people still do it, so I have this video. However, in my opinion, an easier to understand method (because we plot cells the entire time) is the more modern approach that uses SVD and is explained in this video: ruclips.net/video/FgakZw6K1QQ/видео.html

  • @naturaldisastertv3298
    @naturaldisastertv3298 4 года назад +1

    i watched your 20 min video too. But this was easier to understand. Thank you so much.

  • @DrATIF-ij9hy
    @DrATIF-ij9hy 2 года назад +1

    Thank you for the Arabic subtitling, as I have always recommended your channel to my students; best wishes.

  • @chriscollins5320
    @chriscollins5320 6 лет назад +1

    What a great video that clearly and concisely explains PCA. Great job, keep these up.

  • @trela47.
    @trela47. 2 года назад +1

    Just want to let you know, the 'Awesome song' just won you a subscriber.

  • @vimalakirtilotus4611
    @vimalakirtilotus4611 2 года назад

    You have saved me from the sea of formulas. Thank you!

  • @arkadipbasu828
    @arkadipbasu828 2 года назад +1

    Clearly explained Josh. Thank you

  • @alaa9613
    @alaa9613 10 месяцев назад +1

    this saved my life thank you i hope you're doing well sir

  • @MeditaBrasil
    @MeditaBrasil 2 года назад +1

    So well explained it!!! AMAZING!!! Thank you very much for making this video!!!

    • @statquest
      @statquest  2 года назад

      Glad you enjoyed it!

  • @anapeleteirovigil3199
    @anapeleteirovigil3199 4 месяца назад +1

    Came for the explanations and definitely stayed for the openings

  • @enzoaievola5372
    @enzoaievola5372 3 года назад

    Thank you for your help! Your videos are amazing! If I can suggest you another topic to discuss about, one of your clear description of UMAP method would be very helpful :)

    • @statquest
      @statquest  3 года назад +1

      UMAP is very, very similar to t-SNE. The only differences are very subtle, so if you understand the main ideas of t-SNE, then you understand the main ideas of UMAP. Here's my video on t-SNE: ruclips.net/video/NEaUSP4YerM/видео.html

    • @enzoaievola5372
      @enzoaievola5372 3 года назад +1

      @@statquest thank you for the answer and the link!

  • @MrUmban
    @MrUmban 3 года назад +1

    Excellent explanation! Thank you

  • @putriestimandasari4173
    @putriestimandasari4173 5 лет назад +2

    I looovee your voice and your explanation... Great job, Sir.. Thank you !!!

    • @statquest
      @statquest  5 лет назад

      Hooray! I’m glad you like the video!! :)

  • @andersonbessa9044
    @andersonbessa9044 3 года назад +2

    Great explanation as always! Thanks a lot for your effort!

  • @winniewu8581
    @winniewu8581 4 года назад +1

    Thanks again. Good as always. Thanks for the weight and height example!

  • @beshr1993
    @beshr1993 2 года назад +1

    This is EXCELLENT! Thank you good sir!

  • @ahmedelbechir4870
    @ahmedelbechir4870 Год назад +1

    Great explination as always👍

  • @ameliadiazdepascual9127
    @ameliadiazdepascual9127 5 лет назад +1

    Congratulation. it is an excellent example of PCA.

  • @96unicorns
    @96unicorns 3 года назад +1

    This was such a great explanation and so entertaining!

    • @statquest
      @statquest  3 года назад

      Glad you enjoyed it!

  • @tomoleusz
    @tomoleusz 2 года назад +2

    I was dumb before watching this video. Now I am still dumb but at least I understand PCA.

  • @makefriendsnotmoney9623
    @makefriendsnotmoney9623 Год назад +2

    Lol who made the korean subtitles?? the subtitles are well interpreted with sense of humor as well

    • @statquest
      @statquest  Год назад

      A good friend made them. Thanks!

  • @Mochikoo
    @Mochikoo 4 года назад +1

    Super helpful! Thank you!

  • @dilshangsj634
    @dilshangsj634 2 года назад +1

    That Intro 🔥🔥🔥!

  • @baharehbehrooziasl9517
    @baharehbehrooziasl9517 Месяц назад

    Thank you for your absolutely amazing explanations. I have a question. Should the axes be the genes and each point one cell? At the end of the day, we aim to cluster our cells based on the information from our genes.

    • @statquest
      @statquest  Месяц назад

      I think this video might do a better job explaining what the axes represent: ruclips.net/video/FgakZw6K1QQ/видео.html

  • @akhtaruzzamanlimon7270
    @akhtaruzzamanlimon7270 3 года назад +1

    You are a genius.

  • @rikblomson2045
    @rikblomson2045 3 года назад +1

    nice ukulele chords you got there in the intro

  • @brucewayne000
    @brucewayne000 Год назад +1

    BBBBBaaaaaaaaaaaammm!! Good song and good video!!! Please do one on factor analysis!!!

  • @MM-qq6xp
    @MM-qq6xp 5 лет назад

    Excellent demonstration

  • @ZawmyoHtet-lg7jn
    @ZawmyoHtet-lg7jn Год назад +1

    Thank you very much, sir.

  • @sadiyashaikh8511
    @sadiyashaikh8511 5 лет назад +2

    its mah 1st video.. u r really awesome ,i thought i was actually sitting infront of u..i paused the video just to say u...Hellloo JOSH ,,enjoying ur videooo..

  • @tymothylim6550
    @tymothylim6550 3 года назад +1

    Thank you very much for this video! Really great video :)

  • @shubham900100
    @shubham900100 2 года назад +3

    Who else here says BAM! after Josh says it?

  • @kanefoster8780
    @kanefoster8780 4 года назад +1

    Great vid! Thanks

  • @vivo-xz8di
    @vivo-xz8di 3 года назад +1

    ĐỈNH!!! Thanks for the explanation.

  • @ILoveAvatarShow
    @ILoveAvatarShow 2 года назад +1

    Thank you statquest :') i have subscribed

  • @jasonyimc
    @jasonyimc 4 года назад +1

    love this content. Thanks!

  • @MrKostas336
    @MrKostas336 2 года назад +1

    Great video!

  • @gabrielperalta8378
    @gabrielperalta8378 4 года назад +1

    Statquest is the best!

  • @JamesSCavenaugh
    @JamesSCavenaugh 4 года назад

    I think that using the transpose of this would make a lot more sense. It doesn't really make sense to put a cell on its own axis, as if "Cell1"-ness were some sort of continuum. A cell is a discrete event with continuous values of gene expression. Perhaps a better example would be flow cytometry, where each cell is an event and the dimensions are the columns - the transpose of the convention for microarrays.

    • @statquest
      @statquest  4 года назад

      This video may seem backwards, but it attempts to show how PCA was done traditionally - by calculating the variance/covariance matrix of all of the "cells" and using that information to calculate the eigen vectors etc.

  • @nadhirahbaharin.
    @nadhirahbaharin. 6 лет назад

    you need more subscribers!!! Thank you so much your videos are life saver

    • @nadhirahbaharin.
      @nadhirahbaharin. 6 лет назад

      I didn't skip the ad to support you (it's the least I can do haha. )

  • @Diegoblismartmedpengar
    @Diegoblismartmedpengar 3 года назад +1

    oh man!! thank you, I needed this so bad

  • @ashwinpatel2260
    @ashwinpatel2260 3 года назад +1

    Concise, and clear!

  • @kenanmorani9204
    @kenanmorani9204 3 года назад

    Thank you, very nice, useful and simple! Do you share the slides that you prepare Sir?

    • @statquest
      @statquest  3 года назад

      I have PDF study guides for some of my videos, including PCA, here: statquest.org/studyguides/

  • @rezaheydari3661
    @rezaheydari3661 6 лет назад +1

    thanks Josh your videos are amazing!!!

    • @statquest
      @statquest  6 лет назад

      Hooray! If you have time, check out the new PCA video that I made. It's longer, but it goes way deeper and it's just as easy to understand: ruclips.net/video/FgakZw6K1QQ/видео.html

  • @NuNiia95
    @NuNiia95 4 года назад +1

    thanks, this helped me much!!!!!!!!

  • @fereidoon82
    @fereidoon82 5 лет назад

    well done, short and clean, thank u

  • @riansyahtohamba8215
    @riansyahtohamba8215 Год назад +1

    thanks josh!

  • @celiahan3787
    @celiahan3787 3 года назад

    I like your video very well! Can you have more contents about time series?

    • @statquest
      @statquest  3 года назад

      I'll keep that in mind.

  • @locqaj
    @locqaj 3 года назад +1

    Thank you!

  • @hassanrevel
    @hassanrevel 3 года назад +1

    Thanks a lot buddy.

  • @dhirgajbhiye06
    @dhirgajbhiye06 2 года назад +1

    You are the best!!

  • @gopia.r7370
    @gopia.r7370 4 года назад

    👍🏻 great .... very neatly and slowly explained concepts of PCA.. but I would also like to see what if, we have PC1,PC2 & PC3 for an instance..! I would thank #statquest team for this video 👌🏻

    • @statquest
      @statquest  4 года назад

      Check out the PCA, step-by-step video. It has many more details and explains how things work when there are more PCs. ruclips.net/video/FgakZw6K1QQ/видео.html

  • @pedrohenriquecarneiro904
    @pedrohenriquecarneiro904 3 года назад +1

    Very nice!

  • @AdinanJuma
    @AdinanJuma Год назад +1

    great job

  • @MrKingoverall
    @MrKingoverall Год назад +1

    Bruh !! I LOVE YOU !!

  • @jeetsharma9892
    @jeetsharma9892 4 года назад +1

    Thank you

  • @葉芯雅
    @葉芯雅 2 года назад +1

    thank you so much😍

  • @majfbi
    @majfbi 5 лет назад

    Thanks for the priceless videos, Can you make a video on FPCA, please?

  • @shimonatunde
    @shimonatunde 3 года назад +1

    I love the intro 😂

  • @nehatevathia3350
    @nehatevathia3350 5 лет назад

    Can we have a video comparing MLE vs Gradient Descent vs OLS? When to use what and how one benefits over the other. Can we have MLE vs Gradient descent approach for Logistic or other models? Thank you :)

  • @sisialg1
    @sisialg1 2 года назад +1

    you're amazing thank you

  • @nuclearcrow28
    @nuclearcrow28 10 месяцев назад

    The meaning of the PCA plot is well explained, but there is one aspect that I find a bit odd about how the concept is approached...
    At the beginning, we consider plots where the axes correspond to different data samples, and the points on the graph correspond to different features. This kind of plot is useful in its own right, as it allows to compare two or three data samples across all of their features and reason about their correlation.
    HOWEVER, this kind of plot does not truly illustrate the issue that PCA solves. The problem with this kind of plot is that you cannot compare many data samples at once; but there is no problem with the amount of features - you can have as many as you want. PCA, however, solves a different problem - the problem of having too many features, rather than too many data samples. When you switch to the PCA plot, this is quickly noticeable - now the axes represent features (or rather linear combinations of features), and the points represent data samples.
    This is not to say that the video is poorly structured - in fact it is structured really well. You present a certain problem that one can run into with data visualisation, and then you show a different approach that avoids the problem. All I'm saying is the following: The initial approach used to present the problem does not quite show the same problem that PCA solves. A better illustration of the problem would be the case where there are many cells regardless, but a varying amount of features. This can better bring across the meaning of "dimensionality reduction".

    • @statquest
      @statquest  10 месяцев назад

      The example is based on the way PCA was originally developed (calculating the eigenvalues and vectors from a correlation matrix), which, as you observed, seems a little backwards - however, a lot of people are taught PCA from this perspective and, as a result, this example resonates with their existing understanding. To see PCA from a more modern perspective - one that doesn't seem as backwards - see: ruclips.net/video/FgakZw6K1QQ/видео.html

    • @nuclearcrow28
      @nuclearcrow28 10 месяцев назад +1

      @@statquest Thank you for the response - it clarified your intent to me. I see now why this approach makes sense as well. Also your other video on the topic is really great, thanks for linking it!

  • @ylazerson
    @ylazerson 5 лет назад +1

    Great video - thanks!

  • @dandanzhang2891
    @dandanzhang2891 5 лет назад +1

    Great job! like your lovely voice!

  • @elvazhang3866
    @elvazhang3866 4 года назад +1

    thank you👍👍👍

  • @reregad590
    @reregad590 Год назад +1

    Thanks 🙏

  • @herberthsieh66
    @herberthsieh66 4 года назад

    You mentioned the heatmap as one of the dimension reduction methods. Could you instruct how it could be used as a dimension reduction method? I tried to find the answer in your Heatmaps video but still not sure how to do it. Could you explain it a bit? Thank you!

    • @statquest
      @statquest  4 года назад

      Say like I measured gene expression of 500 genes in 6 different tumors. That means each tumor has 500 dimensions. I can then use the gene expression measurements to calculate the euclidean distance among all 6 tumors and use those distances to draw a 2 dimensional heatmap. Thus, I reduced the dimensions from 500 to 2.

  • @CessTenn
    @CessTenn 3 года назад +1

    Thanks!

  • @secondaccountadlinaluthfia6099
    @secondaccountadlinaluthfia6099 2 года назад +1

    much love!!

  • @vincenzo4259
    @vincenzo4259 2 года назад +1

    Thanks

  • @mahmoudel-elbouhy8879
    @mahmoudel-elbouhy8879 5 лет назад

    it is very easy illustration, Is there a video for ARIMA, ARMA, AR and MA ?

  • @henryyao4860
    @henryyao4860 4 года назад +1

    thank you so much

  • @JohnVandivier
    @JohnVandivier 2 года назад

    Can you do a PCA vs MCA vid? I don’t understand why PCA can’t is bad for categorical data, eg with one hot encoding.

    • @statquest
      @statquest  2 года назад +1

      I'll keep that in mind. However, to learn more about PCA, and why it works best with continuous data, see: ruclips.net/video/FgakZw6K1QQ/видео.html and ruclips.net/video/oRvgq966yZg/видео.html

  • @chibearsfan17
    @chibearsfan17 4 года назад +1

    I’ve watched your PCA videos but I still have major questions around 2 things:
    1. What is PCA used for? In what cases would I want to perform PCA instead of other statistical analyses?
    2. How can I develop my intuition on what each dimension “means”. I understand the meaning isn’t clear/obvious since they are linear combinations, but any help would be good.

    • @statquest
      @statquest  4 года назад

      Can you repost your comment on this video: ruclips.net/video/FgakZw6K1QQ/видео.html Then I can use timepoints within that video to answer your questions.

  • @sridharmaloth87
    @sridharmaloth87 5 лет назад

    Thanks for keeping this video

  • @zapy422
    @zapy422 5 лет назад +1

    Can we do PCA combining numerical and categorical variable?
    Let’s say for example that we add a column for cell location in this example

    • @statquest
      @statquest  5 лет назад +1

      I don't think you can. However, you could use a random forest to calculate distances among all of your samples (Random Forests can use discrete and continuous variables) and then use multi-dimensional scaling to plot them.

    • @zapy422
      @zapy422 5 лет назад

      Thank you for the reply, will explore sklearn

  • @mohitrathore15
    @mohitrathore15 5 лет назад +1

    BAM! Can you share link to the dataset or was it synthetic?

    • @statquest
      @statquest  5 лет назад +2

      I have two other PCA videos with code and data that are better than this one. If you use Python, try this video: ruclips.net/video/Lsue2gEM9D0/видео.html If you use R, try this video: ruclips.net/video/0Jp4gsfOLMs/видео.html

  • @kseniiafaiskanova4888
    @kseniiafaiskanova4888 3 года назад

    Thanks for the video. It is really helpful.
    I am curious which programm was used to plot it. I have downloaded Excel with XCLStat but I cannot get smth similar from there....

    • @statquest
      @statquest  3 года назад

      If you want to replicate the results in my videos, I recommend PCA in Python: ruclips.net/video/Lsue2gEM9D0/видео.html or PCA in R: ruclips.net/video/0Jp4gsfOLMs/видео.html

    • @kseniiafaiskanova4888
      @kseniiafaiskanova4888 3 года назад +1

      @@statquest , Thank you!

  • @sherifgerges9316
    @sherifgerges9316 6 лет назад

    Amazing, another method that is becoming very popular is ICA. Any chance of a video on that? :)

    • @sherifgerges9316
      @sherifgerges9316 6 лет назад

      A couple of big single-cell papers came out using those methods :). ICA will now take off haha.

  • @MrMattie
    @MrMattie 10 месяцев назад

    Are PC1 and PC2 in this example genes? Im a little confused as to what the components are. Thanks!

    • @statquest
      @statquest  10 месяцев назад

      They are combinations of genes. To be honest, I have a newer video on PCA that might make more sense to you (it makes more sense to me): ruclips.net/video/FgakZw6K1QQ/видео.html The reason that this video isn't super clear is that it is based on how PCA is usually taught (which is a little backwards). My newer video uses a more modern approach.

  • @aborucu
    @aborucu 3 года назад

    I just want to make a rough comparison to probabilistic sampling : In statistical sampling cells would be outcomes of experiment, Genes are attributes of outcomes which we take measurements on each cell. And thus Gene measurements are random variables. So in classical stat analysis we would be calculating correlations between different Gene types since they are the random variables. But here in a ML context we are modelling/conducting an experiment the opposite way round. By calculating corr between different cells. As if cells are random variables this time which we measure on each Gene. Am I missing something ?

    • @statquest
      @statquest  3 года назад +1

      I'm not sure you explanation of correlation is correct in this context. If I pick two different stocks (from the financial market), I can collect a bunch of measurements, the prices of the stock, one per day, and end up with 365 measurements per stock. I can then calculate the correlation between stock A and stock B using both sets fo 365 measurements. If there is a high correlation, stock A will tell us a lot about stock B and the other way around. This is exact the same thing we are doing here with cells (the stocks) and genes (the prices of the stock each day). For more details on how PCA works, see: ruclips.net/video/FgakZw6K1QQ/видео.html

    • @aborucu
      @aborucu 3 года назад

      @@statquest Thank you, I thought that gene1,gene2 .. gene n were seperate attributes (i.e. price, volume of trading etc.). But you made it clear that they are all measurements from same attribute(genes) ~ theres one price measurement.

  • @deltas4820
    @deltas4820 4 года назад

    Anyone know the advantage of using PCA over NMF? Probably not asking this question in the right place but thought I'd give it a try :)

  • @arjunkadam71
    @arjunkadam71 4 года назад

    Hellow sir , can you please explain how PLS works differently than PCA.

    • @statquest
      @statquest  4 года назад

      I'll keep that topic in mind.