Principal components analysis in R

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024

Комментарии • 209

  • @rebecai.m.6670
    @rebecai.m.6670 6 лет назад +44

    OMG, this tutorial is perfection, I´m serious. You make it sound so easy and you explain every single step. Also, that is the prettiest plot I´ve seen. Thank you so much for this.

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад +1

      You're very welcome! If you like pretty plots, check out my video on using ggplot2 ;) ruclips.net/video/1GmQ5BdAhG4/видео.html

  • @sadian3392
    @sadian3392 6 лет назад +11

    I had listened to several other lectures on this topic but the pace and the detail covered in this video is simply the best.
    Please keep up the good work!

  • @maitivandenbosch1541
    @maitivandenbosch1541 4 года назад +9

    Never a tutorial about PCA so clear and simply. Thanks

  • @PhinaLovesMusic
    @PhinaLovesMusic 5 лет назад +3

    I'm in graduate school and you just explained PCA better than my professor. GOD BLESS YOU!!!!

  • @vplougoboy
    @vplougoboy 3 года назад +1

    Noone explains R better than Hefin. Give this man a medal already!!

  • @HarmonicaTool
    @HarmonicaTool 2 года назад

    5 year old video still one of the best I found on the topic on YT. Thumbs up

  • @WatchMacro16
    @WatchMacro16 5 лет назад +11

    Finally a perfect tutorial for POA in Rstudio. Thanks mate!

  • @jackiemwaniki1266
    @jackiemwaniki1266 4 года назад

    How i came across this video a week before ,my final year, project due date is a miracle. Thank you so much Hefin Rhys.

    • @mohamedadow8153
      @mohamedadow8153 4 года назад

      Jackie Mwaniki doing?

    • @jackiemwaniki1266
      @jackiemwaniki1266 4 года назад

      @@mohamedadow8153 my topic is on Macroeconomic factors and the stock prices using the APT framework.

  • @Rudblattner
    @Rudblattner 3 года назад

    I never comments on videos, but you really saved me here. Nothing was working on my dataset and this came smoothly. Well done on the explanations too, everything as crystal clear.

  • @user-kb6ui2sh5v
    @user-kb6ui2sh5v Год назад

    really useful video thank you, I've just started my MSc project using PCA, so thank you for this. I will be following subsequent videos.

  • @chinmoysarangi9399
    @chinmoysarangi9399 4 года назад +1

    I have my exam in 2 days and Your video saved me tons of effort in combing through so many other articles and videos explaining PCA. A BIG Thank You! Hope you do many more videos and impart your knowledge to newbies like me. :)

  • @timisoutdoors
    @timisoutdoors 4 года назад +2

    Quite literally, the best tutorial I've ever seen on an advanced multivariate topic. Job well done, sir!

  • @fabriziomauri9109
    @fabriziomauri9109 4 года назад +6

    Damn, your accent is hypnotic! The explanation is good too!

  • @lisakaly6371
    @lisakaly6371 Год назад

    In fact I found out how to overcome the multicolinearity , by using the eigen values of PC1 and PC2! I love PCA!

  • @Axle_Tavish
    @Axle_Tavish 2 года назад

    Explained everything one might need. If only every tutorial on RUclips is like this one!

  • @tylerripku8222
    @tylerripku8222 3 года назад

    The best run through I've seen for using and understanding PCA.

  • @shantanutamuly6932
    @shantanutamuly6932 4 года назад +1

    Excellent tutorial. I have used this for analysis of my research. Thanks a lot for sharing your valuable knowledge.

  • @jackpumpunifrimpong-manso6523
    @jackpumpunifrimpong-manso6523 4 года назад

    Excellent! Words cannot show how grateful I am!

  • @lilmune
    @lilmune 4 года назад +1

    In all honesty this is the best tutorial I've seen in months. Nice job!

  • @ditshegoralefeta1315
    @ditshegoralefeta1315 4 года назад +1

    I've been going through your tutorials and I'm so impressed. Legend!!!

  • @HDgamesFTW
    @HDgamesFTW 4 года назад +1

    Best explanation I’ve found so far! Thanks mate, legend!

    • @HDgamesFTW
      @HDgamesFTW 4 года назад

      Uploaded the script as well what a guy

  • @glenndejucos3891
    @glenndejucos3891 3 года назад

    This video gave a major leap in my study. Thanks.

  • @johnmandrake8829
    @johnmandrake8829 3 года назад

    its so funny I don't think you realize but myPR "my pyaar" in Urdu/Hindi means my love. Thank you for an amazing and extremely helpful video

  • @johnkaruitha2527
    @johnkaruitha2527 3 года назад

    Great help, been doing my own work following step by step this tutorial...the whole night

  • @chris-qm2tq
    @chris-qm2tq Год назад

    Excellent walkthrough. Thank you!

  • @0xea31c0
    @0xea31c0 2 года назад

    The explanation is just perfect. Thank you.

  • @brunopiato
    @brunopiato 6 лет назад +1

    Great video. Very instructive. Please keep making them

  • @siktrading3117
    @siktrading3117 2 года назад

    This tutorial is outstanding. Excellent explanation! Thank you very much!!!

  • @vagabond197979
    @vagabond197979 Год назад

    Added to my stats/math playlist! Very useful.

  • @nrlzt9443
    @nrlzt9443 Год назад

    really love your explanantion! thank you so much for your video, really helpful and i can understand it! keep it up! looking forward to your many more upcoming videos

  • @brunocamargodossantos5049
    @brunocamargodossantos5049 2 года назад

    Thanks for the the video, it helped me a lot!! Your explanation is very didactic!

  • @em70171
    @em70171 3 года назад

    This is gold. I absolutely love you for this

  • @elenavlasenko5452
    @elenavlasenko5452 6 лет назад

    I can say for sure that it´s the best explanation I´ve ever seen!! Go on and I would be really grateful if you make one of Time Series and Forecasting :)

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Thanks Elena! Thank you also for the feedback; I may make a video on time series in the future.

  • @tankstube09
    @tankstube09 6 лет назад

    Very nice tutorial, nicely explained and really complete, looking forward to learn more in R with other of your vids, thank you for the tremendous help!

  • @rVnikov
    @rVnikov 6 лет назад +5

    Excellent tutorial Hefin. Hooked and subscribed...

    • @hefinrhys9234
      @hefinrhys9234 6 лет назад +1

      Vesselin Nikov thank you! Feel free to let me know if there are other topics you'd like to see covered.

  • @harryainsworth6923
    @harryainsworth6923 4 года назад +1

    this tutorial is slap bang fuckin perfect, god bless you, you magnificant bastard

  • @andreamonge5025
    @andreamonge5025 2 года назад

    Thank you so much for the very clear and concise explanation!

  • @himand11
    @himand11 2 года назад +1

    Thank you so so much!! You just saved the day and helped me really understand my homework for predictive analysis.

  • @florama5210
    @florama5210 6 лет назад +1

    It is a really nice and clear tutorial! Thanks a lot,​ Hefin~

  • @biochemistry9729
    @biochemistry9729 4 года назад

    Thank you so much! This is GREAT! You explained very clearly and smoothly.

  • @shafiqullaharyan261
    @shafiqullaharyan261 4 года назад

    Perfect! Never seen such explanation

  • @kevinroberts5703
    @kevinroberts5703 Год назад

    thank you so much for this video. incredibly helpful.

  • @mustafa_sakalli
    @mustafa_sakalli 3 года назад

    Finally understood this goddamn topic! Thank you dude

  • @blackpearlstay
    @blackpearlstay 3 года назад

    Thank you so much for this SUPER helpful video. (P.S. The explanation with the iris dataset was especially convenient for me as I'm working on a dataset with dozens of recorded plant traits:D)

  • @murambiwanyati3607
    @murambiwanyati3607 2 года назад

    Great teacher you are, thanks

  • @Fan-vk9gx
    @Fan-vk9gx 3 года назад +1

    You are really a life saver! Thank you!

  • @arunkumarmallik9091
    @arunkumarmallik9091 4 года назад

    Thanks for nice and easy way of explanation.It really helps me a lot.

  • @EV4UTube
    @EV4UTube 3 года назад

    Can I confess something that baffles me? Because, I see this all the time. OK, so you, personally, are motivated to share your knowledge with the world, right? I mean, you took time, effort, energy, focus, planning, equipment, software, etc. to prepare this explanation and exercises. You screen-captured it, you set up your microphone, you edited the video, you did all this enormous amount of work. You're clearly motivated. Yet, when it actually comes time to deliver that instruction, you think it is 100% acceptable to place all your code into an absolutely miniscule fraction of the entire screen. Like, pretty-close to 96% of the screen is 'dead-space' from the perspective of the learner. The size of the typeface is miniscule (depending on your viewing system). It would be like producing a major blockbuster film, but then publishing it at the size of a postage stamp. Surely, it would be possible for you to 'zoom-into' that section of the IDE to show people what it was you were typing - the operators, the functions, the arugments, etc. I'm not really picking on you, individually, per se. I see this happen all the time with instructors of every stripe. I have this insane idea that instruction has much, much less to do with the insturctor's ability to demonstrate their knowledge to an uninformed person and has much, much more to do with the instructor's ability to 'meet' the student 'where' they are and to carry the student from a place of relative ignoracne (about a specific topic) to a place of relative competence. One of the best tools for assessing whether you're meeting that criteria is to PRETEND that you know nothing about the topic - then watch your own video (stripping-out all the assumptions you would automatically make about what is going on based on your existing knowledge). If you didn't have a 48" monitor and excellent eye-sight, would you be able to see what was being written? Like... why would you do that? If writing of the code IS NOT important - don't bother showing it. If writing of the code IS important, then make it (freaking) visible and legible. This really baffles me. I guess instructors are so "in-their-own-head" when they're delivering content, they don't take time to realize that no one can see what is happening. . It just baffles me how often I see this.

    • @EV4UTube
      @EV4UTube 3 года назад

      If 'zooming-in' is not easily achieved, the least instructors could do is go into the preferences of the IDE and jack-up the size of the text so that it would be reasonably legible on a screen typical of, say, a laptop or tablet. It just seems like such a low-hanging fruit, and easy fix to facilitate learning and ensure legibility.

    • @Pancho96albo
      @Pancho96albo 2 года назад +1

      @@EV4UTube chill out dude

  • @metadelabegaz6279
    @metadelabegaz6279 6 лет назад +2

    Sweet baby Jesus. Thank you for making this video!

  • @SUMITKUMAR-hj8im
    @SUMITKUMAR-hj8im 4 года назад

    a perfect tutorial for PCA... Thank you

  • @sandal-city-pet-clinic-1
    @sandal-city-pet-clinic-1 5 лет назад +1

    simple and clear. very good

  • @blessingtate9387
    @blessingtate9387 4 года назад +3

    You "R" AWESOME!!!

  • @fatimaelmansouri9338
    @fatimaelmansouri9338 3 года назад

    Super well-explained, thank you!

  • @Jjhukri
    @Jjhukri 5 лет назад +2

    Amazing video Hefin, there are lot of details covered in 27 min video, we just have to be careful not to miss any second of the video. I have a question: How does the scores are calculated for each PC's ? Why do we have to check the correlation between the variables and the PC1 & PC2 ? what value it adds practically ?

  • @tonyrobinson9046
    @tonyrobinson9046 Год назад

    Outstanding. Thank you.

  • @mativillagran1684
    @mativillagran1684 4 года назад

    thank you so much! you are the best, very clear explanation.

  • @timothystewart7300
    @timothystewart7300 3 года назад

    Fantastic video Hefin! thanks

  • @testchannel5805
    @testchannel5805 4 года назад

    Very nice, guys hit the subscribe button, the best explanation so far.

  • @Actanonverba01
    @Actanonverba01 5 лет назад +1

    Clear and straight forward, good work!
    Bully for you! Lol

  • @aliosmanturgut102
    @aliosmanturgut102 3 года назад

    Very informative and clear Thanks.

  • @galk32
    @galk32 5 лет назад +2

    amazing video, thank you

  • @kmowl1994
    @kmowl1994 2 года назад

    Very helpful, thanks!

  • @esterteran2872
    @esterteran2872 4 года назад

    Good tutorial!I have learnt a lot. Thanks !

  • @heartfighters2055
    @heartfighters2055 5 лет назад +2

    just brilliant

  • @Badwolf_82
    @Badwolf_82 3 года назад

    Thank you so much for this tutorial, it really helped me!

  • @mario17-t34
    @mario17-t34 2 года назад

    Thanks much Hefin!!!

  • @lindseykoper761
    @lindseykoper761 2 года назад +1

    Thank you so much for your videos!! Your videos are the best I have seen hands down :) All of your explanations and step by step through R are what I needed to work on my research.
    One area I am having trouble with (since I am not a statistician) is making sure I run my data through all the necessary statistical tests before running the PCA. My data is similar to the iris dataset (skull measurements categorized by family and subfamily levels) but I am seeing different sources run different tests before the PCA (ANOVA vs non-parametric tests). If anything, would you be able to recommend some good sources for me to refer to? Thank you! I really appreciate it!

  • @Emmyb
    @Emmyb 6 лет назад +1

    this video is fab thank you!

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Thank you Emily! Happy dimension reduction!

  • @OZ88
    @OZ88 4 года назад +1

    Ok so the Sepal.Width contributes mostly over 80% to the PC2 and the other three to PC1 more. 14:32 and so Sepal Width is fair enough as an info to separate setosa in the next plot. Isn't it also advisable to apply pca to linear problems?

    • @hefinrhys8572
      @hefinrhys8572  4 года назад

      You're correct about the relative contributions of the variables to each principal component. The Setosa species is discriminated from the other two species mainly by PC1, to which sepal.width contributes less that than the other variables. As PCA is a linear dimension reduction technique, it will best reveal clusters of cases that are linearly separable, but PCA is still a valid and useful approach to compress information, even in situations where this isn't true, or when we don't know about the structures in the data. Non-linear techniques such as t-SNE and UMAP are excellent at revealing non-linearly-separable clusters of cases in data, but interpreting their axes is very difficult/impossible.

  • @yayciencia
    @yayciencia 3 года назад

    Thank you! This was very helpful to me

  • @aminsajid123
    @aminsajid123 2 года назад

    Amazing video! Thanks for the explaining everything very simply. Could you please do a video on PLS-DA?

  • @andrewh8747
    @andrewh8747 2 года назад

    Fantastic!

  • @bitanbasu1965
    @bitanbasu1965 5 лет назад +2

    Thanks Hefin :)

  • @maf4421
    @maf4421 3 года назад +1

    Thank you Hefin Rhys for explaining PCA in detail. Can you please explain how to find weights of a variable by PCA for making a composite index? Is it rotation values that are for PC1, PC2, etc.? For example, if I have (I=w1*X+w2*Y+w3*Z) then how to find w1, w2, w3 by PCA.

  • @anjangowdas2541
    @anjangowdas2541 3 года назад

    Thank you, it was very helpful.

  • @evidenceandlogic6936
    @evidenceandlogic6936 4 года назад

    Top notch. Thank-you.

  • @tiberiusjimbo9176
    @tiberiusjimbo9176 3 года назад

    Thank you. Very helpful.

  • @abhiagni242
    @abhiagni242 7 лет назад +1

    Thanks for the video..helped a lot :)

    • @hefinrhys9234
      @hefinrhys9234 7 лет назад

      ABHI agni Glad it helped :) Feel free to give feedback on other topics that would be useful.

  • @DesertHash
    @DesertHash 3 года назад +1

    At 5:50, don't you mean that if we measured sepal width in kilometers then it would appear LESS important? Because if we measured it in kilometers instead of millimeters, our numerical values will be smaller and vary far less, making it less important in the context of PCA.
    Thank you for this video.

    • @hefinrhys8572
      @hefinrhys8572  3 года назад +1

      Yes, you're absolutely correct! What I meant to say was that if that length was kilometers, but we neasured it in millimeters, then it would be given greater importance. But yes, larger values are given greater importance.

    • @DesertHash
      @DesertHash 3 года назад

      @@hefinrhys8572 Alright, thanks for the reply and for the video!

  • @danieldavieau1517
    @danieldavieau1517 6 лет назад +1

    Damn good job!

  • @brunopiato
    @brunopiato 6 лет назад +1

    If I may suggest something, please make the script available for download. It would be very helpful for people beginning to learn R language.
    Once again, congrats for the video.

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад +1

      Hi Bruno, this is a great suggestion thank you. I agree it would be worthwhile; I just need to find the time to upload the scripts haha.

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      The script is now available to download from the video description. Sorry it took me so long!

  • @rockcandy28
    @rockcandy28 5 лет назад +2

    Hello! Thanks for the video, just a question how would you modify the code if you have NA values? In advance, thank you!

  • @samuelokt
    @samuelokt 5 лет назад

    Thanks for the tutorial!!

  • @yuvenmuniandy8202
    @yuvenmuniandy8202 6 лет назад

    Amazing tutorial. Very simple and straight to the point. Already subscribed. I have some questions. PCA is an unsupervised method, isn't it? Is it possible to further decompose the data for Versicolor and Virginica to find further grouping? I have read before there are supervised methods. Do you have some tutorial for those?

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Thanks enthiran! Yes, PCA is unsupervised because we don't give it any information about group membership, we give it unlabelled data and let if find the optimal projection of the data into a lower dimensional space that maximises the explained variance. If you wanted to build a model to predict group membership, then you would need to use a supervised cluster analysis algorithm, where you supply a training dataset with grouping labels (this is what makes it supervised). The algorithm will then learn which features in the data associate with each group, such that when you give the model unlabelled data, it will predict group membership. I have a video on various clustering algorithms here: ruclips.net/video/PX5nSBGB5Tw/видео.html

  • @salvatoregiordano2511
    @salvatoregiordano2511 3 года назад +1

    Hi Hefin,
    Thanks for this tutorial. What do we do if PC1 and PC2 can only explain around 50% of the variation? Do we also include PC3 and PC4? If so, how?

  • @mohammadtuhinali1430
    @mohammadtuhinali1430 Год назад

    Many thanks for your efforts to make this complex issue much easier for us. Could you enlight me to understand group similarly and dissimilarity using pca?

  • @tiffanyd6543
    @tiffanyd6543 2 года назад

    THANK YOU SO MUCH

  • @rifathasnat3495
    @rifathasnat3495 2 года назад

    Thank you!

  • @MiloLabradoodle
    @MiloLabradoodle 4 года назад

    Thanks for the link to the R code.

  • @federicogarland272
    @federicogarland272 2 года назад

    thank you very much

  • @aiitr
    @aiitr 6 лет назад

    awesome tutorial

  • @AcademicActuary
    @AcademicActuary 3 года назад

    Great presentation! However, why did you not binarize the categorical variable first, and then do the subsequent analysis?
    Thanks!

  • @ajakarok8036
    @ajakarok8036 6 лет назад +1

    Nice !!!!!!!!!!!

  • @balaji.r2735
    @balaji.r2735 3 года назад

    Thanks

  • @simonjds4960
    @simonjds4960 3 года назад

    Very cool Hefin. I'm trying to run a data reduction for panel data (220 countries, about 25 years of data, and about 100 different variables). Could PCA be used for this?

    • @hefinrhys8572
      @hefinrhys8572  3 года назад

      Hi Simon, it will depend on what kind of data you have and what your goal is. All the variables will need to be numeric as PCA can't handle categorical variables (check out independent correspondence analysis for this). If you want to find linear combinations of variables to explain most the variation in the data, then PCA is a good choice. If you're just interested in seeing whether there are subgroups of subjects in your dataset, you might want to try a non-linear dimension reduction algorithm like t-SNE or UMAP :)

  • @anamika406
    @anamika406 4 года назад

    This is a very comprehensive video, great job!!
    I have one query, you have shown to create a bilpot with PC1 and PC2. How can I make a biplot using PC1 and PC3?
    I will really appreciate if you reply back.

    • @hefinrhys8572
      @hefinrhys8572  4 года назад +1

      Try: biplot(myPr, choices = c(1, 3)) 😀

  • @personaldevelopment1802
    @personaldevelopment1802 6 лет назад +1

    Good video but due to background color and small font size not very clear, thanks for considering it in next videos, thanks

    • @russsm
      @russsm 5 лет назад

      It was fine for me once I switched to full screen and changed the quality to 1080p HD.
      Also, you can download his R code (link in description) and run it in your own R program along with him, so you can see it better there.

  • @sandracuadros3787
    @sandracuadros3787 4 года назад +1

    Hi! I have a question, does it make sense to run a PCA on discrete data? I am trying something using your tutorial as a guide but I get a weird result in the plot, and I am wondering it it is because of the nature of my data. Thanks

    • @hefinrhys8572
      @hefinrhys8572  4 года назад

      Great question! If your data are not ordinal, you may get some use out of PCA if you numerically encode your discrete variables, but you may get more out of Multiple Correspondence Analysis (MCA) than PCA. Have a look here: www.rpubs.com/piterii/dimension_reduction

  • @jackiemwaniki1266
    @jackiemwaniki1266 4 года назад

    Thank again. Quick one....Would you mind also doing the Fama and Macbeth Analysis without using the KenFrench Dataframe?

  • @XxRoos898xX
    @XxRoos898xX 3 года назад

    Hey Hefin,
    Amazing video! I am working on my own PCA at the moment.
    If at all possible would you be able to go over a few questions with me?
    I would appreciate any advice you would be able to give.