Use-cases for inverted PCA

probabl

Просмотров 1,6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 19 авг 2024
A lot of people know of PCA for it's ability to reduce the dimensionality of a dataset. It can turn a wide dataset into a thin one, while hopefully only a limited amount of information. But what about doing it the other way around as well? Can you turn the thin representation into a wide one again? And if so, what might be a use-case for that?
If you're interested in the notebook for this video, it can be found here:
github.com/pro...
This whiteboard video is part of the open efforts over at probabl. To learn more you can check out website or reach out to us on social media.
Website: probabl.ai/
LinkedIn: / probabl
Twitter: x.com/probabl_ai
We also host a podcast called Sample Space, which you can find on your favourite podcast player. All the links can be found here:
rss.com/podcas...
If you're keen to see more videos like this, you can follow us over at @probabl_ai.

Комментарии • 20

@guilhermeduarte1932 Месяц назад ⁺²
Really Cool! I never thought of PCA in this way. Thanks for showing!
@alexloftus8892 Месяц назад ⁺¹
Cool! The way I'm thinking about this is that \hat{X_w} is at most rank 2 (in this case), since it appears as the result of a linear transformation of a rank 2 matrix. Therefore since it is the matrix that minimizes MSE loss from X_w, the span of \hat{X_w} defines the best-fitting (by MSE) 2D plane for the 64-dimensional datapoints. So an 'outlier', in this case, is 'any point which is far away from the plane'. Because we're doing PCA, the axes of that plane are the two directions of highest variance within those 64 dimensions.
One thing to bear in mind here is that as you increase dimensionality, distances in the space increase. So you're going to have to change whatever threshold you use if you go up to, for example, 64x64 images instead of 8x8, and things might get weird.
@user-eg8mt4im1i Месяц назад ⁺¹
Another fun video thanks ! 🎉 I believe this compression/decompression may be used in fraud detection isnt it ? The visualization is great !
@gilad_rubin Месяц назад ⁺¹
Super cool!
@coopernik Месяц назад ⁺²
That’s how encoders and decoders are used to spot anomalies no?
@probabl_ai Месяц назад ⁺¹
There's certainly a similar thing happening in neural network land. You could (somewhat handwavingly) look at PCA as an autoencoder with a single linear layer with a smaller dimension and no activation function that needs to output the input.
@zb9458 Месяц назад ⁺¹
Amazing video! You really explained this in such a nice and easy step by step process, I feel my mental model of PCA is better now! I wonder, could you do a video on discrete cosine transform for encoding MNIST digits? I know patch-wise encoding is how JPG works, but I always got kind of lost in the linear algebra there...
@Melle-sq4df Месяц назад ⁺¹
I never thought of PCA this way but it's clever :) Would you say that you basically re-invented autoencoder ? or do you consider that it'll have subtle behavior difference with this method?
@probabl_ai 29 дней назад ⁺²
PCA existed well before autoencoders did, so it feels strange to call it a reinvention. But there is a relationship for sure. If you remove the activation functions and just have a single hidden layer then it feels pretty much identical to me.
@ilearnthings123 3 дня назад
genius
@denniswatson4326 6 дней назад
Is it my imagination or do a lot of datasets exhibit that rotated square or diamond shape you see around 3:22 when reduced to two dimensions?
@denniswatson4326 6 дней назад
Great video by the way. Back in the day before generative AI for images I thought about trying this on something like the oliveti faces dataset. Reduce the dimensions to some latent space and then draw from there randomly to make "PCA faces"? I wonder if there is a distribution of the dimensions on the latent space? That is to say each individual dimension has some kind of non-flat distribution that would have to be drawn from in order to make a good face. I suppose if the chosen values where outliers then the face would be distorted?
@probabl_ai 6 дней назад
There will for sure be 'a distribution' in the latent space but the question remains if this distribution can translate back to reality. PCA really is loosing a bunch of information and there is no guarantee that you might recover it.
If you really want to go and reconstruct, why not just use a neural network? You may appreciate this old blogpost of mine:
koaning.io/posts/gaussian-auto-embeddings/
@carlomotta2742 Месяц назад ⁺¹
hey cool video, you show a simplified and linearized version of an autoencoder! Btw, I was checking the notebook you provide, when you import `from wigglystuff impor Slider2D` what is wigglystuff actually? I could not find any reference on the web. Thanks!
@probabl_ai Месяц назад ⁺¹
(Vincent here)
It is a super new library that's mostly personal that I use for these demos. It's a collection of UI elements made with Anywidget to help explain stuff in Jupyter.
Hence the name. It's widgets that wiggle to explain stuff. Wigglystuff!
@carlomotta2742 Месяц назад
@@probabl_ai that's really awesome! any plan to release it to make the nbs reproducible, or can you recommend an alternative to get the interactive pca graph? Thanks!
@probabl_ai Месяц назад
@@carlomotta2742 The shownotes of the video show a link to the notebook. Wigglystuff is on PyPi so that's just a pip install away.
pypi.org/project/wigglystuff/
@carlomotta2742 Месяц назад
@@probabl_ai couldn't find it before - thank you!
@kanewilliams1653 Месяц назад
One part I don't understand is 5:38 - why is it x_w hat instead of just x_w? There is no epsilon/error term in a PCA. And assuming T is invertible, wouldn't it just return exactly the same original matrix x_w?
@probabl_ai Месяц назад
There is information loss when you go from the original input to the low dimensional representation. When you then go back you still have this information loss that makes it impossible to fully reconstruct the original array.

Следующие

Автовоспроизведение

Why Does Diffusion Work Better than Auto-Regression?