Principal Component Analysis (PCA) - easy and practical explanation

Поделиться
HTML-код
  • Опубликовано: 23 янв 2025

Комментарии • 107

  • @busyshah
    @busyshah 9 месяцев назад +59

    I am now convinced that there are no tough subjects, only ineffective tutors. I have been struggling to understand this concept for over 3 years, and here I am, within 11 minutes things have fallen into place.
    An expert not necessarily be a great teacher. There might be great experts assigned in educational institute to teach such concepts.
    But someone like you is what we need in our schools and colleges (expert and well articulated).
    Simplicity is the utmost form of sophistication.
    Thanking you from the bottom of my heart.
    Keep on helping people like us.
    Perhaps another video on how to do it in R will be great hit.

    • @biostatsquid
      @biostatsquid  8 месяцев назад +3

      Thank you so much for your kind words, I'm really flattered! I'm glad it was useful and it cleared up concepts for you:) Great idea about an R tutorial, will definitely add it to my todo list!

    • @jay_wright_thats_right
      @jay_wright_thats_right 7 месяцев назад +3

      100% CORRECT

    • @lisabiiri404
      @lisabiiri404 Месяц назад +1

      Hi there. Yes, please add a video where you show us how to do this analysis on R. Thank you :)

  • @sandracrnipolsek2289
    @sandracrnipolsek2289 2 года назад +38

    You are for sure principal component #1! You're the best at describing information ;)

  • @joelsabiti4828
    @joelsabiti4828 3 месяца назад +4

    i have suffered with PCA for two years and this video just made it so easy for me

  • @jumazevick5143
    @jumazevick5143 11 месяцев назад +29

    This video just solved half of my problem in understanding PCA stats. To solve the other half is I need to translate the info to my actual research.

  • @tarathetortoise
    @tarathetortoise 10 месяцев назад +5

    Was going insane looking for an understandable explanation of "what" a PCA is, until I found this video! Thank you very much!

    • @biostatsquid
      @biostatsquid  10 месяцев назад

      Thank you for your kind words!! Glad it helped:)

  • @RUBIKCUKEN
    @RUBIKCUKEN 2 месяца назад +3

    This is the best tutorial I have ever seen. Thank you so much.

  • @saver_x_28
    @saver_x_28 4 месяца назад +3

    A big thank you!! I have this topic in my semester exams, and everyone around me is mugging up, upon asking what it is actually, they give me definitions that do not satisfy me, this single video clears a lot in understanding PCA. I wish I had a teacher like you in my college.

  • @bringonautumn
    @bringonautumn 5 месяцев назад +2

    I have been struggling to understand PCA for days 🤯, despite reading many articles and watching countless videos, but this is by far the best and easiest to understand explanation, thank you!! 🥳

  • @cuby4942
    @cuby4942 8 месяцев назад +3

    I have watched so many videos trying to understand pca ..and this by far is the most interesting with fundamentals fully explained

  • @leixiao169
    @leixiao169 Месяц назад +1

    Very good teacher! Thanks!

  • @antonrosenfeld6861
    @antonrosenfeld6861 6 месяцев назад +1

    A very clear and engaging introduction to PCA. It was new to me, and I came away with a good impression of how it would be used. Thanks very much!😀

  • @RafaelRabinovich
    @RafaelRabinovich Год назад +4

    It would be great to have PCA explained conceptually, mathematically, as well as programmatically. When push comes to shove, we'll need to do it in a computer, running an algorithm that either we have to put in, or call from a Python library.
    Thank you for all the work you do educating us!

  • @suraali7624
    @suraali7624 2 месяца назад +1

    Great video. It explained PCA in a very simple and clear way. Thank you!

  • @pranee31
    @pranee31 4 месяца назад +1

    Wonderfully explained! Keep up the great work, Biostasquid.

  • @nusratafrin898
    @nusratafrin898 4 месяца назад

    This is the best video I found on PCA!
    Can't thank you enough Biostatsquid!❤

  • @tubihemukamamethodius6952
    @tubihemukamamethodius6952 Год назад +2

    You really understand what you were talking, big up

  • @nibirsaadman8214
    @nibirsaadman8214 Год назад +2

    Wow! best PCA video on youtube.

  • @serendipitum1694
    @serendipitum1694 2 года назад +3

    super elegant and clear explanations, thank you!

    • @biostatsquid
      @biostatsquid  2 года назад

      Thank you, I'm happy you found it useful:)

  • @elmoelmo6505
    @elmoelmo6505 8 месяцев назад

    Hi thank you so much for explaining PCA in such a clear way. I've been really stressed about understanding it for my uni stats exam, but now I feel much more confident :)

  • @kyaw94
    @kyaw94 8 месяцев назад

    I'm currently watching without logging into my Google account. 😊 However, halfway through, I made the decision to log in, hit the like button, and subscribe to your channel. 🎉 Thank you for your valuable content-it's truly helpful, and I encourage you to keep up the great work! 👍

  • @michaellewandowski5489
    @michaellewandowski5489 11 месяцев назад

    This was excellent. Some people just know how to explain things

  • @vylam1521
    @vylam1521 10 месяцев назад

    Thanks for making amazing video help me explain things I have been researched for days.

  • @plqify
    @plqify 3 месяца назад

    Thanks!

  • @alkakumari3771
    @alkakumari3771 5 месяцев назад

    Thank you for such informative, easy-to-understand content!

  • @pratimam5807
    @pratimam5807 2 месяца назад

    So simple and clear. U are awesome.

  • @moniquebrasilbaptista1989
    @moniquebrasilbaptista1989 9 месяцев назад

    Loved it! It's a really comprehensive explanation!😍

  • @MatthieuGG_
    @MatthieuGG_ 4 месяца назад

    Incredibly clear ! Thank you, and congratulation

  • @edwardhitti3422
    @edwardhitti3422 Год назад +4

    Amazingly well explained

  • @teddyperera8531
    @teddyperera8531 2 месяца назад

    Perfect explanation for Principal Component Analysis

  • @SamsuriAW
    @SamsuriAW 5 месяцев назад +2

    Although I don’t use PCA in my workday, I think this is the best video out there explaining how PCA works. Good job 👍

  • @ycyftdtf5d6fycyfyf
    @ycyftdtf5d6fycyfyf 12 дней назад

    This was brilliant, thank you

  • @FaysalHamid-p4k
    @FaysalHamid-p4k Год назад

    Best explanation in RUclips, awesome.

  • @cgpivot
    @cgpivot 4 месяца назад

    Absolutely brilliant explanation.

  • @IqbalPrawira
    @IqbalPrawira 10 месяцев назад

    the best explanation. easy to understand.

  • @ricardoveiga007
    @ricardoveiga007 8 месяцев назад

    Amazing content, clearly explained! :)

  • @antaraghoshal4012
    @antaraghoshal4012 Год назад

    Best video to understand PCA plot 😊

  • @aditisharma9369
    @aditisharma9369 Год назад

    wow!!! that was explained so nicely by you..... thank you!

  • @wobby7055
    @wobby7055 8 месяцев назад

    So well explained. Thanks a bunch!

  • @ruthgyereh1052
    @ruthgyereh1052 Год назад +1

    Fantastic presentation.

  • @AW12
    @AW12 Год назад +1

    Great I benefited a lot!

  • @zdkr_4ii
    @zdkr_4ii 9 месяцев назад

    Lovely video, thank you for explaining!

    • @biostatsquid
      @biostatsquid  9 месяцев назад

      Glad it was helpful! You're very welcome:)

  • @gaku95
    @gaku95 27 дней назад

    I have a question:
    Let's say I want to calculate the PCA of children's grades at school to know how it impacts the final grade average of each child. For that I have my sample, which would be all the children in a class, for example. Then I have the grades of those children in different subjects such as Math, History, Physical Education, etc. And I also have the average. Do I add the average as another variable to the PCA analysis? Or should I make a correlation, for example, between PC1 and the average and see the PCA loadings?

  • @harshaharod2076
    @harshaharod2076 Год назад

    just on point!Loved it!

  • @ThompsoniusIV
    @ThompsoniusIV 2 месяца назад

    Thanks for the video, this really explains it in a nutshell! This may be not in your alley, but what would PC2 to be if you would plot geochemical data with each variable being an element (PC1 seems to be rock type). Thanks in advance!

  • @huyquach5044
    @huyquach5044 2 года назад

    Thank you very much for your clear explanation.

  • @coder-c6961
    @coder-c6961 Год назад

    Very great example this was the exact example of what im doing too!

  • @onatovonatovic526
    @onatovonatovic526 Год назад

    i wish i could hug you, thank you so much

  • @Dr.AmulyaPanda
    @Dr.AmulyaPanda 9 месяцев назад

    Simply excellent !

  • @mohammedy.salemalihorbi1210
    @mohammedy.salemalihorbi1210 10 месяцев назад

    Good explanation, Thats great!

  • @zeeshanazam5104
    @zeeshanazam5104 Год назад

    I have one question if I have 60(A1-A60) variables with a 2k sample size,
    A1 is the first and A60 is last, in between these A10, A20, A30, A40, A50 and the confirmed output but for some of the samples the A19, A29 output doesn't exist, as A20 reached earlier, the data is of this type for some reasons.
    Will PCA work in the same way as explained?

  • @dannggg
    @dannggg 8 месяцев назад

    Very good high level video!

  • @tathagatasharma
    @tathagatasharma Год назад

    Thank you very much, very well explained

  • @nicthofer
    @nicthofer 9 месяцев назад

    How to obtain the loadings? is it the same to eigenvectors or scaled coordinates?? in my geochemical software iogas the report of PCA contain this items: Correlation - Eigenvectors - Eigenvector Plots - Eigenvalues - Scree Plot - Scaled Coordinates - PC1 vs PC2 - PC1 vs PC3 - PC1 vs PC4 and so on... (the last is PC3 vs PC4). My input was 32 chemical elements previously transformed with CLR

    • @nicthofer
      @nicthofer 9 месяцев назад

      Here is the ioGAS description for Scaled Coordinates:
      "Created by scaling the length of the eigenvector to the eigenvalue. All eigenvectors have a length of 1 so scaling by the eigenvalue changes the lengths so that the length is proportional to the variance (eigenvalue) accounted for by that eigenvector.
      Click on a PC header column to sort the scaled coordinates from lowest to highest or vice versa."
      And for Eigenvectors:
      "Eigenvectors are PCA coordinate values that correspond to the projected location of the original input variables onto the calculated PCA axes. PC1, or the first eigenvector, is a calculated line of best fit through the maximum direction of variation for the selected variables. The PC1 eigenvectors represent the value of each input put along this line. PC2 is a line of best fit through the maximum variation at right angles to PC1 so the PC2 eigenvalues are the original input variable values projected onto this axis, and so on for each of the number of principal components.
      An eigenvector may be in either of two opposite directions. ioGAS will always choose the eigenvector whose first element is positive. Click on a PC header column to sort the eigenvectors from lowest to highest or vice versa."

    • @nicthofer
      @nicthofer 9 месяцев назад

      Ahhh, I think the Loadings are equal to Scale Coordinates 😅

  • @biogfp9340
    @biogfp9340 Год назад

    I'd love to have a tutorial on how to perform this on R. This was very well explained.

    • @biostatsquid
      @biostatsquid  Год назад

      Great suggestion! I cover it a bit in the preprocessing video but maybe a specific video for PCA in R would be good - I'll keep it in mind! Thanks!

  • @nathanieldanielrabo2423
    @nathanieldanielrabo2423 5 месяцев назад

    You are exceptional 🤩

  • @julenekenyon3278
    @julenekenyon3278 Год назад

    Very well explained, thank you!

  • @nchimunyamuloongo4436
    @nchimunyamuloongo4436 10 месяцев назад

    Woow.. this is so helpful

  • @monicaaelavarthi5637
    @monicaaelavarthi5637 Год назад

    Well explained. Thank You.

  • @Ms1Unique
    @Ms1Unique Год назад

    Thank you. Well explained!

  • @xdmoaz7909
    @xdmoaz7909 Месяц назад

    good work. thanks

  • @mariamontero5651
    @mariamontero5651 Год назад

    really nice, congratulations for your video! I follow you now :)

  • @tomnewman9306
    @tomnewman9306 11 месяцев назад

    At ~3:58 you say the principal components explain 85% of the variance in life expectancy. I don't think that's right. I think it's 85% of the variance in the predictor variables. Or am I totally confused?

  • @ranjanpal7217
    @ranjanpal7217 Год назад

    Amazing explanation

  • @changliu7553
    @changliu7553 2 месяца назад

    How do someone find out the linear combination of PC1?

  • @shivavyavahare
    @shivavyavahare 8 месяцев назад

    How to explain which factors contribute to PC1 and PC2? by biplot graph.

  • @程王-t9s
    @程王-t9s 9 месяцев назад +2

    it's well explained for begginer to understand the plot,but if you wanna know how to do it,this video can't help you

  • @jackdawson7385
    @jackdawson7385 8 месяцев назад

    Please can u tell me how can we calculate principal loading. I am a bit confused to this part.

  • @MrDISSxD
    @MrDISSxD 3 месяца назад

    the best video i found regarding loadings, but you don't mention the scores

  • @brettlidbury4110
    @brettlidbury4110 Год назад

    Thank you for your video. After you have assigned PC1 to PC5 ..., you show the PC matrix in order reflecting the amount of variation explained, where there are a variety of values listed under each PC from - 6 to +6. What do these values represent?

    • @biostatsquid
      @biostatsquid  Год назад +1

      Hi! Thanks for your question! So the values are just an example, they don't necessarily go from -6 to +6. Basically, the values represent the 'contribution' of that variable to a specific PC. Since PCs are ranked by the variation of the dataset they explain (PC1 explains more than PC2, which in turn explains more than PC3...), variables with higher (more positive) or lower (more negative) scores for lower PCs (i.e., PC1) are 'more important', in other words, they explain more variability in the dataset. Hope this helped!

    • @brettlidbury4110
      @brettlidbury4110 Год назад

      Thank you very much for your rapid reply and explanation. I thought that this was the case, but was not certain. As an extension of my question, do these + or - values under each PC align with a tick mark on the x:y and -x:-y axes? (for reference the axes you use to demonstrate these concepts around 5:10 to 5:30 minutes into your presentation). If "yes", and by way of feedback, having a scale on these axes would be helpful. I have watched 3 separate presentations on PCA today, and I have found yours the most useful. Thank you again, and in particular for responding to my question so quickly. Best wishes.

    • @biostatsquid
      @biostatsquid  Год назад +1

      Hi thanks so much for your feedback! No, they're not! The tick marks represent increments of 1 (so 1, 2, 3, 4...) and I think my intention was to make them match the PC scores, but I must have changed the labels around to make it make sense with the biology and forgot to update the table. But they should match, so thanks for pointing that out! Will correct it if I ever do a part 2 on this:) Cheers @@brettlidbury4110

    • @brettlidbury4110
      @brettlidbury4110 Год назад

      @@biostatsquid My pleasure and looking forward to the next installment (o:

  • @ahmadebrahem4611
    @ahmadebrahem4611 9 месяцев назад

    very well explaining

  • @andrefsr00
    @andrefsr00 10 месяцев назад

    Nice, video, thanks!

  • @mdmahmudulhasanmiddya9632
    @mdmahmudulhasanmiddya9632 Год назад

    Very good explanation mam

  • @kishranai6262
    @kishranai6262 Год назад +1

    Hi
    Good presentation on PCA. Can we apply PCA on a dataset that have numeric and categorical data? Also do we need to ensure that each variable follow a normal distribution if it does not what should we do? Also do we need to normalised each of variables? Appreciate your comments.

    • @biostatsquid
      @biostatsquid  Год назад +3

      Hi, great questions. PCA is not recommended for categorical data - even if you one-hot encode it. For mixed data types, there are better alternatives like Multiple Factor Analysis available in the FactoMineR R package (FAMD()) or Multiple Factor Analysis (MFA()) is also an option. I haven't got experience with either but you can check the thread here: stats.stackexchange.com/questions/5774/can-principal-component-analysis-be-applied-to-datasets-containing-a-mix-of-cont
      Yes, it is necessary to standardise data before performing PCA because PCA basically maximises the variance. So if you have some variables with a very large variance and some with little variance, it will give more importance to the variables with large variance. If you change the scale of one of your variables, e.g., weight of mice, from kg to g, the variance increases, and the variables 'weight' will go from having little impact to be the main feature that explains variance in your dataset. Standardising will do the trick since it makes the SD of all the variables the same (normalization does not make all variables to have the same variance). Hope this was clear!

  • @UzmanNöropsikolog
    @UzmanNöropsikolog 2 месяца назад

    SUPERB THANKS

  • @Aoffyfeefy
    @Aoffyfeefy Год назад

    Nice video😊

  • @nikitrianta9896
    @nikitrianta9896 Год назад

    Very helpful video but I'm not sure I understand when to use PCA variable need to be correlated or not?

    • @biostatsquid
      @biostatsquid  Год назад

      Hi Niki, not sure if I understand your question, could you rephrase it, please?

    • @nikitrianta9896
      @nikitrianta9896 Год назад

      @@biostatsquid Sorry it was not clear...I just wonder if there is a limitation at applying PCA only in cases of data where there is some correlation among the factors or some factors for example height and weight are correlated etc.

    • @biostatsquid
      @biostatsquid  Год назад +1

      @@nikitrianta9896 Oh I see ! No, not at all, actually PCA allows you to gather insights about features describing our data - by looking at the coefficients of the features/variables for each PC you can find out if they are positively, negatively or not correlated.
      If you want to visualise this you can draw a plot of the coefficients for PC1 vs PC2 (for example) for all features. For each feature, imagine (or draw) a vector with origin in (0, 0) to the point (coefficient PC1, coefficient PC2). Features that are positively correlated to each other have an angle between their vectors close to 0 degrees , if they are negatively correlated the angle between them is 180 and if they are not then the angle is close to 90 degrees.
      Does this answer your question? :)

    • @josephbalamaze4093
      @josephbalamaze4093 6 месяцев назад

      how do we interpret the data including hierachial

  • @rd10718
    @rd10718 10 месяцев назад

    Looking for a response from the Author - What is the signfiicance of a low PCA for a large biological data set? - Does a PC1 of

    • @biostatsquid
      @biostatsquid  10 месяцев назад +1

      If PC1 is 20% it means it explains 20% of the variability of the dataset. You can then check which are the top contributing variables of PC1 to figure out what are the features of your dataset that explain most variability. In complex scenarios you might be happy with 20% of variability. For example, you are studying height in the human population, and want to figure out which genes contribute to height. You 'take' a sample of people with different heights, do RNAseq to figure out gene expression (this is a very simple example, but let's go with it). You do PCA on the gene expression counts of all genes in the human genome. PC1 explains 20% of variability (i.e., differences in height in the sample you took). Then you check and top PC1-contributing genes are X, Y, Z. So you know that X, Y, Z genes most probably play an important role in height. But of course this is only 20% of the variability of your data. What about the other 80%? Well, you forgot about other important factors that contribute to height, like diet, gender, genomic varaibility (not only transcriptomics, but also epigenetics, genomics might play an important role!) ... etc. Hope this made it a bit easier to understand!

    • @damilarechosen2779
      @damilarechosen2779 4 месяца назад

      @@biostatsquidoh myyy. What a good teacher you are! Thumbs up🎉

  • @lizheltamon
    @lizheltamon Год назад +1

    Hi! I really love your explanation! Would it be possible to get a copy of the dataset? I need to teach PCA and i think this a nice example cause the relationships between the factors are easy to understand! Would definitely point them to this video!

    • @biostatsquid
      @biostatsquid  Год назад

      Hi Liz, thanks for your feedback! Unfortunately I cannot share my dataset, not because I don't want to, but because there is no dataset! I just made up the categories and figures for illustration purposes, just cause it is easier to understand when the factors are 'obvious'. So sorry to disappoint you...
      However, you can check out my post here in case it is helpful: biostatsquid.com/pca-simply-explained/

    • @lizheltamon
      @lizheltamon Год назад

      @@biostatsquid no worries thanks so much!

  • @ruthgyereh1052
    @ruthgyereh1052 Год назад

    Do you by chance make time for appointments? I would be grateful. Thanks

    • @biostatsquid
      @biostatsquid  Год назад

      Hi Ruth! Just send me an email describing your issue and I'll tell you if I can help:)

  • @basuumer501
    @basuumer501 11 месяцев назад

    Nice, very nice

  • @JoanaPadillo-f5t
    @JoanaPadillo-f5t Год назад

    very nice

  • @user-God-s-child-0101
    @user-God-s-child-0101 7 месяцев назад

    Whole world creator's godfather bless you all always and you all love and remember godfather with your pure hearts.

  • @Ale-dy8jc
    @Ale-dy8jc 3 месяца назад

    Bravissima

  • @gloriahodges5127
    @gloriahodges5127 4 месяца назад

    Walker George Clark John Garcia Anthony