Principal Component Regression in R

Поделиться
HTML-код
  • Опубликовано: 15 июл 2024
  • ===== Likes: 152 👍: Dislikes: 0 👎: 100.0% : Updated on 01-21-2023 11:57:17 EST =====
    Understand Principal Component Analysis? Cool! So, how do I use PCA for Machine Learning purposes? Well, look no further! I go into depth on how to utilize the linear transformations of PCA for any machine learning model!
    Github:
    github.com/SpencerPao/Data_Sc...
    PCA:
    • Applied Principal Comp...
    XGBoost Regression: Theory and Application!
    • Understanding and Appl...
    XGBoost Classification: Theory and Application!
    • Understanding and Appl...
    Linear Regression: Understanding Linear Regression!
    • HOW TO: Linear Regress...
    Neural Networks: Understanding and Applying Neural Networks!
    • Understanding and Appl...
    Data Imputation: Wondering what to do with NA observations?
    • Dealing with MISSING D...
    0:00 - Principal Component Regression Summary
    1:14 - Cleaning Data for Machine Learning Models
    3:40 - Linear Regression Model (Base)
    5:35 - Principal Component Regression
    9:40 - Using Results of PCA for other Machine Learning Models (train/test)
  • НаукаНаука

Комментарии • 39

  • @fawn0213
    @fawn0213 2 года назад +2

    Super clear and very helpful. I am so glad to find this video. Thank you!

  • @shahrizalmuhammadabdillah3127
    @shahrizalmuhammadabdillah3127 8 месяцев назад

    Thanks for the insight... This was amazing...

  • @rafaelguimaraes1424
    @rafaelguimaraes1424 Год назад

    Very good. I am Brazil.

  • @lilmikeytheskater
    @lilmikeytheskater 2 года назад +2

    Hey Spencer, I love your videos! Your channel is among the most insightful in all of data science RUclips. My favorite video of yours is the pairs trading one from a few months back. You mentioned future videos on seasonality and other finance related topics at the end of that video. Do you still have plans to cover more financial topics?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад +1

      Thanks for watching! :)
      Oh for sure! If there is a demand for it, then by all means, I can make future videos surrounding that material. I can note down some additional financial applications around the idea of financial trading. I'll make a note for future content.

    • @lilmikeytheskater
      @lilmikeytheskater 2 года назад

      @@SpencerPaoHere looking forward to it!

  • @kexinni6864
    @kexinni6864 2 года назад +1

    Hi Spencer, your video is super helpful! Could you perhaps explain more about what do PC1 and PC2 capture in the final bit of the video?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      Glad you like it!
      The components you are referring to represent the newly transformed features in a different feature space. Those two components explain some percentage of the variance of the original data and can be used in place of the original features for classification or regression type problems.
      I hoped that helped!

  • @baeksudream7964
    @baeksudream7964 Год назад

    amazing voice

  • @ishtardory
    @ishtardory 2 года назад

    Hi Spencer, great video thanks! I just had a question. I know that with PCA you can also visualize the correlation of supplementary variables (not used in building the dimensions) with the dimensions. So if you find that your dependent variable (i.e. Life expectancy) is highly and significantly correlated with a subset of the PCA dimensions...why would you need to do a regression with the principal components in addition?
    I would really appreciate this clarification, thanks a lot!

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад +1

      If you have an independent variable that is essentially a cofactor to your dependent variable, I'd say that is highly suspicious.
      But the overall idea is that you'd want to have predictive capabilities using PCA. So, if you ever want to place this model in production, you will have to follow a succinct pattern.
      Transform features using the PCA model
      Plug in PCA output into regression model (assuming that model has already been trained)
      Get predictions for whatever you are trying to do.

    • @ishtardory
      @ishtardory 2 года назад

      @@SpencerPaoHere Thank you !

  • @fabios5524
    @fabios5524 2 года назад

    Hi Spencer!
    Great video. I have a question:
    Can i fit the results of a FAMD from the package FatMineR into this model?
    If it is possible, do you knoe any example about how to do it?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      I am not familiar with that particular package. However, I can imagine that you can utilize the predictions of PCR with any other package. You can save the predictions as a data frame (for example) and use as a input for another function.

    • @nosaosawe3158
      @nosaosawe3158 8 месяцев назад

      FactorMiner you mean?. Yeah, it should work

  • @amanrastogi5184
    @amanrastogi5184 2 года назад

    What would you suggest if you are having categorical variables in your dataset? I mean how does PCA deals with them?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      You’d want to one hot encode your categorical variables ! Then you can run PCA on the dataset

  • @jaredgreathouse3672
    @jaredgreathouse3672 2 года назад

    Hey Spencer, are you familiar with something called the synthetic control method? It's a technique from econometrics that's become pretty popular over the years for causal inference.

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      I am just reading about it, and this is a fascinating subject!

    • @jaredgreathouse3672
      @jaredgreathouse3672 2 года назад

      @@SpencerPaoHere reason I asked, is cuz apparently.... you can use PCR to de-noise an outcome matrix, and then impute counterfactuals from it using SCM. I don't know if you'd have access to it, but you should look up a paper called "Using Synthetic Controls" by Alberto Abadie, published in the Journal of Economic Literature.

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      ​@@jaredgreathouse3672
      I believe the paper is linked here: economics.mit.edu/files/17847
      I'll dig a little deeper on this subject, but yes! This is an intriguing topic. I didn't realize that this method was commonly used in many areas. Might be an interesting video topic!

  • @MELVINBRO100
    @MELVINBRO100 8 месяцев назад

    Hi Spencer,
    I wonder if using this approach as opposed to the princomp() function and package would be sufficient enough to find the number of principal components in PCA?
    Thank you!

    • @SpencerPaoHere
      @SpencerPaoHere  4 месяца назад

      Yep! Both are fine methods. You can use one or the other.

  • @FrancisNgoma-vr7nj
    @FrancisNgoma-vr7nj 8 месяцев назад

    Hi. Thank you for your video. It is very informative.
    I do have one concern though. In fact, in performing the principal component regression technique, how can we calculate the regression coefficients from the starting values. If possible, could you send me the script file or the do file for implementing these estimates?

    • @SpencerPaoHere
      @SpencerPaoHere  4 месяца назад

      I have a github that hosts the code:
      github.com/SpencerPao/Data_Science/tree/main/Principal%20Components/PCR

  • @cooookieraider
    @cooookieraider 2 года назад

    Hi Spencer! I am a beginner at R and have to use a PCA for my school project, hoping you can help :)
    I have parental language proficiency scores in 4 domains (understanding, speaking, reading, writing) --> I have done a PCA on these and it resulted in 2 factors. PC 1 --> reading, writing. PC2 --> understanding, speaking.
    Now I would like to check if PC1 and PC2 are correlated with another variable, language use.
    How should I proceed?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад +1

      It seems that your variables are categorical? If so, try to run the chi-square test to see correlation between variables.
      If it's a categorical vs numerical, try running the one-way anova test and analyze from there.

  • @kakabudi
    @kakabudi 2 года назад

    Hey Spencer, Thanks for making and sharing this, it is much appreciated!
    I have a question:
    I have 180 variables on human body movement. I want to reduce the size of the dataset while keeping as much variability as possible, hence me using PCA. However, I have no dependent variable! What does this mean? As far as I know I can't use the same methodology you used in this video, since you used life expectancy as your dependent.
    Is PCA still applicable here?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад +1

      For Principal Component Regression, you will need a dependent variable since it is a regression.
      For Principal Component analysis, nope! You don't need a dependent variable. You can check out the PCA video here:
      ruclips.net/video/uNJBBpyss50/видео.html

    • @kakabudi
      @kakabudi 2 года назад

      @@SpencerPaoHere thank you!

    • @kakabudi
      @kakabudi 2 года назад

      @@SpencerPaoHere Therefore, can I not test the validity of my PCA transformation compared to the original dataset?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      @@kakabudi You most definitley can. But you'd just need to follow the same data transformation process and compare with the original dataset.

    • @kakabudi
      @kakabudi 2 года назад

      @@SpencerPaoHereSorry, but what method could I use to compare the transformed data to the original data? I am mostly only familiar with comparing regarding linear regression methods, and without that I am admittedly lost as to how to compare them.

  • @surpriseworld1662
    @surpriseworld1662 2 года назад

    Hi Spencer, I try to look for your video topic Scree Plot in R with no luck. Would you be happy to send me a copy please. Thank you muchly.

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      Hi! Check out my PCA video here: ruclips.net/video/uNJBBpyss50/видео.html
      I go over the screeplot topic more in depth there.