Principal Component Analysis 2 Theory (1/2)

Chemometrics & Machine Learning in Copenhagen

Просмотров 50 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 сен 2024

Комментарии • 20

@CarlosContrerasZunig 4 года назад ⁺¹
Es lo que buscaba. Una clase magistral y muy pedagógica.
Gracias.
@srkajolfan1 12 лет назад ⁺¹
I think he's referring to the end when you look at the sugar samples from the 3 different factories. You show that there is better separation between the 3 classes when you use PC1 and PC3 than when you use PC1 and PC2. Why is this the case? Shouldn't one expect better separation with the first 2 principal components?
Thank you for posting these videos by the way. Very good explanations.
@RasmusBroJ 11 лет назад
Hi Jolly
Sorry for not answering before. It is perhaps puzzling at first that a minor component can separate, but it is very common. In spectroscopy (as we have here), the first component typically explains a lot (50-95% depending on type and other things). With a suitable signal to noise ratio, small components can be absolutely valid and this third component with less than .5% variance is absolutely fine. This can be verified e.g. by the loading vectors and its appearance
@timerwentoff 11 лет назад ⁺¹
Very illuminating, and thanks for explaining these concepts in layman's terms.
@QualityAndTechnology 11 лет назад ⁺¹
Thanks so much. We are truly happy the videos are helpful
@QualityAndTechnology 12 лет назад
I think that you refer to the use of the scores obtained by PCA for another model (e.g. normal regression or even Multivariate Regression). We usually don't normalize them, since being just a vector it is not really needed. Nevertheless, if you use more than one score vector (let's say, the first four PCs) it is advisable the normalization for further analysis.
@anibalamaya 11 лет назад ⁺¹
thank you very much for these videos, I am trying to do a QSAR project and I think this will help a lot :D
@RasmusBroJ 11 лет назад
Dear Luk
I do not disagree with you at all. I do not think, though, that this is a critical aspect at the level that this slide show is aiming at. This is meant only as a brief introduction that can give an intuitive understanding. So I consider the visualizations and the intuition they can provide, more important. I hope that sounds reasonable.
@RasmusBroJ 11 лет назад
Actually, you should not expect separation between classes in any particular components. The components reflect the chemistry of the samples (as measured by fluorescence) and that chemistry can related to level differences between classes, to processing of sugar, to the composition of the beets used for production and many other things. All these things come out indirectly in the PCA and in no 'godgiven' order so to speak. So the class separation could be in any of the valid components.
@DHPPhDPhD 11 лет назад
Nice presentation. Thanks
@RasmusBroJ 11 лет назад
No; it is the first column of the loading matrix P that you see e.g. at time point 2.10. You can e.g. find P as P' = (T'*T)^(-1)*T'*X. More on that on e.g. Wikipedia on PCA
@Gogargoat 12 лет назад
After you perform PCA and you get a new lower dimensional data set, do you typically normalize each component before using them as inputs to a model, or should you leave it the way it is?
@RomanticSoul02 11 лет назад
Nice & clear presentation. But I am disappointed about the implicit definition of the score matrix by X=T.P' at 1:18. This is only valid if standard normalization of the loading matrix is assumed (P'.P=I). Several other normalizations are used, each with their own advantages when peforming orthogonal rotation (e.g., maintain orthogonality of PCs). For these, your formula takes a different form. I suggest using instead the explicit definition of scores, i.e., T = X.P, to avoid this caveat.
@QualityAndTechnology 12 лет назад
What is the exact point in the video that you see that?
@hampshireoak 11 лет назад
Please explain the 1st loading at time point 9:17 is it the Coeficient of variance? If not how is it calculated in other words? Thank you
@barsgunsurmeli6935 6 лет назад
Hello, one thing is to remark that, you first referred the "new dimensions" as Principal Components" which consists of one row of the score and one of the loading matrix. Then your refer to the same thing (lines that forms the plane to project on which supposed to be the new dimensions) as "Loadings". Now, which one is true, the lines are loadings or PCs?
@RasmusBroJ 6 лет назад ⁺¹
I am afraid I don't completely follow your question/notation. One component is made up of a score vector (a column vector) and a loading vector (also a column vector in our notation). You may also sometimes refer to the component through only its scores or only its loadings if convenient. I hope this is what your concern was about.
@barsgunsurmeli6935 6 лет назад
Rasmus Bro Hello thanks for the very fast answer, you completely followed my question. This is actually a big ambiguity at least for me, since all these terms corresponds to different matrices which should be followed by different notations in the formula.
I am not criticizing you tho, after some more research I see that this ambiguity/confusion between loadings/scores/PCs/eigenvectors is actually in all the literature, especially differing for the different application areas of data science.
Below, I want to give two questions from StackExchange where this is discussed and might help the people who had similar confusions:
stats.stackexchange.com/questions/88118/what-exactly-is-called-principal-component-in-pca
stats.stackexchange.com/questions/143905/loadings-vs-eigenvectors-in-pca-when-to-use-one-or-another
@RasmusBroJ 6 лет назад ⁺¹
Hi there. Indeed it is confusing but it does actually make sense. Think of e.g. pH. There are two aspects of that: how you measure it (take an electrode, calibrate etc) and then there is the actual value of a given beer sample. When you say the word pH, you may refer to both aspects or one of them. Same with a PC. You may talk about the definition (loading) or the actual value (score) or both! It's not wrong just confusing indeed. And different fields lean towards different common practices.
@barsgunsurmeli6935 6 лет назад
Rasmus Bro Thanks for the example Prof. Bro, I definitely see your point.
Could you recommend a book for Chemometrics area?
I have been working on Machine Learning area before but I am new to Chemometrics/Food Science, therefore I am sometimes struggling and more open to confusion. The notations, keywords, implementations are all different from what I studied, I must say I fell like I have learned PCA from the beginning recently :)

Следующие

Автовоспроизведение

Principal Component Analysis 2 Theory (2/2)