You know what starting with this video I was...why is he so loud? But at the end you made my day dude...and I subscribed your channel .. thanks a lot... bring more videos ..keep the learning on😃
I am so sorry.. my "hi" has received quite a lot of criticism. I had recorded these videos over a period of time hence it took time for me to mellow my "hi". I shall ensure that the upcoming videos have no loud scary hi :P haha
Hi Diana Thank you for your comment. At times even though a column has numeric data in R the data type might be different. In order to change the data back to numeric you can use the as.numeric() function in R. After the data type is numeric then you can execute PCA seamlessly. Hope this helps.
Thank You very-very much! It is certainly one of the best explanations. Very helpful! But I have got some questions. 1) I can not understand what do we do next with the PCAs? Shall I use it for multiple regression along with other variables or for clustering? 2) Can I reevaluate impact of variables using loading data? For instance, I use 5 variables to build PCAs. My PCA1 and 2 describe about 85% of variability, but each PCA does not connect to - lets say - the 3rd variable. May be I should delete the 3rd variable and run the analysis with only 4 others? Will this improve the outcome? 3) Why are some spaces blank in loadings (minute 6.28 on the video) - like Sepal.Width vs Comp.1? 4) And the final - body mass index is given as an example of PCA outcome. Does that mean that we can retrieve PCA1 and name it as a sort of new stable variable? Or BMI is just an example of data dimension reduction that does not correspond to PCA directly? Thats a lot of questions - but I really wonder...
Hey thank you for such a detailed comment. At times I wish RUclips allows voice notes because typing would be ineffective per se. BMI is indeed just an example. The main application of PCA is to capture the essence of a large list of variables in fewer newly generated variables. Assume you have bank data.. there is a list of 120 variables and you need to predict loan delinquency. Inputting 120 variables and eventually tuning the model would be cumbersome in such instances you can deploy PCA to reduce the list of variables from 120 to let us say 12 and rest assured if you have executed PCA well then these 12 variables would have correctly captured the essence of the original 120 variables. The model that you will be building with these new set of PCs will be lighter and faster. You can deploy PCA before unsupervised learning as well !!
@@easyml1234 thanks! my point is - may be I don't need all 120. May be, I should extract 40 for one PCA and other 50 for another one and discard the remaining 30 and I will get better sketch of my "Eiffel tower" - that was a great example)) at least, I might get better values of my PC1 and PC2 equations. do we do so? reduce data for several PCAs? And once I get results of several PCAs - how do I interpret them? As variables for regression or basis for clustering?
Awesome video...kudos... u said the variables are eligible for PCA as they have mean(cor(mydata)) value as 0.46( u said it was high and good enough) - What is the value range for PCA?? :)
Hi i want to ask, if i use 2 comp from my PCA, how to know which variable is in Comp.1 and which is in Comp.2?? Do i only need to see in PCA$loadings which one has the larger value among all of them for each Comp?? thank you! ^^
Hey Leyya Thank you for your comment. There are definitely other methods such as bi plot etc. But loadings matrix is a very good place to start with and it is definitely a good indicator to begin with. Hope this helps!
You know what starting with this video I was...why is he so loud?
But at the end you made my day dude...and I subscribed your channel .. thanks a lot... bring more videos ..keep the learning on😃
I am so sorry.. my "hi" has received quite a lot of criticism. I had recorded these videos over a period of time hence it took time for me to mellow my "hi". I shall ensure that the upcoming videos have no loud scary hi :P haha
@@easyml1234 waiting 😃
The mean of my cor(data) is 0.45. Is it not eligible for pca? What should i use for variable selection then?
Thank you Sir. This video is so much helpful and excellent.....
Thanks very much . Well explained and very useful
There's no link down below 😅🤣
I have a error “x” must be numeric... any solution?
Hi Diana Thank you for your comment. At times even though a column has numeric data in R the data type might be different. In order to change the data back to numeric you can use the as.numeric() function in R. After the data type is numeric then you can execute PCA seamlessly. Hope this helps.
@@easyml1234 thank you!
Thank You very-very much! It is certainly one of the best explanations. Very helpful! But I have got some questions. 1) I can not understand what do we do next with the PCAs? Shall I use it for multiple regression along with other variables or for clustering? 2) Can I reevaluate impact of variables using loading data? For instance, I use 5 variables to build PCAs. My PCA1 and 2 describe about 85% of variability, but each PCA does not connect to - lets say - the 3rd variable. May be I should delete the 3rd variable and run the analysis with only 4 others? Will this improve the outcome? 3) Why are some spaces blank in loadings (minute 6.28 on the video) - like Sepal.Width vs Comp.1? 4) And the final - body mass index is given as an example of PCA outcome. Does that mean that we can retrieve PCA1 and name it as a sort of new stable variable? Or BMI is just an example of data dimension reduction that does not correspond to PCA directly? Thats a lot of questions - but I really wonder...
Hey thank you for such a detailed comment. At times I wish RUclips allows voice notes because typing would be ineffective per se.
BMI is indeed just an example. The main application of PCA is to capture the essence of a large list of variables in fewer newly generated variables.
Assume you have bank data.. there is a list of 120 variables and you need to predict loan delinquency. Inputting 120 variables and eventually tuning the model would be cumbersome in such instances you can deploy PCA to reduce the list of variables from 120 to let us say 12 and rest assured if you have executed PCA well then these 12 variables would have correctly captured the essence of the original 120 variables. The model that you will be building with these new set of PCs will be lighter and faster. You can deploy PCA before unsupervised learning as well !!
@@easyml1234 thanks! my point is - may be I don't need all 120. May be, I should extract 40 for one PCA and other 50 for another one and discard the remaining 30 and I will get better sketch of my "Eiffel tower" - that was a great example)) at least, I might get better values of my PC1 and PC2 equations. do we do so? reduce data for several PCAs? And once I get results of several PCAs - how do I interpret them? As variables for regression or basis for clustering?
@@easyml1234can this method used for categorical data ?
us data analysts learning more than necessary about plant terminology..
It was like a bomb. Thank you bro.
:P
useful. thank you. I will try this.
Awesome video...kudos... u said the variables are eligible for PCA as they have mean(cor(mydata)) value as 0.46( u said it was high and good enough) - What is the value range for PCA?? :)
Firstly thanks so much. To answer your query Yes in my previous videos I have mentioned about threshold >0.3 and
Tqsm 😀
Hi i want to ask, if i use 2 comp from my PCA, how to know which variable is in Comp.1 and which is in Comp.2?? Do i only need to see in PCA$loadings which one has the larger value among all of them for each Comp?? thank you! ^^
Hey Leyya Thank you for your comment. There are definitely other methods such as bi plot etc. But loadings matrix is a very good place to start with and it is definitely a good indicator to begin with. Hope this helps!
excellent
Thanks :)
While computing mean correlation you shouldve get rid of the diag and upper trig (?)
why the f you scream this much...