StatQuest: MDS and PCoA

StatQuest with Josh Starmer

Просмотров 195 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 18 ноя 2024

Комментарии • 195

@statquest 2 года назад ⁺³
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@taoyang563 4 года назад ⁺²⁰
This is such a great video.
To answer a student's question in one sentence demonstrates the teacher's complete understanding of the knowledge.
The more the teacher talks to answer, the less the teacher knows what you are asking and the more confused you become.
@statquest 4 года назад ⁺²
Thank you very much! :)
@lade_edal 10 месяцев назад ⁺¹
runn around all over the internet none the wiser then come across this channel and bam! It all fits so easy. Why do some people over complicate such simple things? Thanks Josh!
@statquest 10 месяцев назад
Happy to help!
@dsagman Год назад ⁺¹³
Honestly the best machine learning and stats videos available. How did we live before Statquest?
@statquest Год назад
Thank you! :)
@son681 4 года назад ⁺⁷
Thank you so much for such an easy and bite-size content that I can understand to the fullest. It's way much better visualized and informative compared with other videos I've seen !!!
@statquest 4 года назад
Thank you very much! :)
@MSuriyaPrakaashJL 4 года назад
@@statquest This is a great video, but where can I find the maths behind it?
@statquest 4 года назад
@@MSuriyaPrakaashJL Start here: en.wikipedia.org/wiki/Multidimensional_scaling
@Ivaniushina 6 лет назад ⁺⁵
Brilliant! so clear. Now I understand (at last!) the relations between PCA and MDS.
@초롱초록 4 года назад ⁺³
Thank you so much! I was confused with the concept of difference about PCA and MDS. Thanks to your explanation, I could understand.
@statquest 4 года назад ⁺¹
Hooray!!! :)
@AlonKedem1000 9 месяцев назад ⁺¹
I love your videos. Just want to mention that in 4:18 you calculated the euclidian distances for gene 2 twice while saying its gene number 3. :)
@statquest 9 месяцев назад
oops
@nikhiljoyappa687 2 года назад ⁺¹
very helpful in the world of people who are always helpfool.
@statquest 2 года назад
Thanks!
@nittygritty8161 Месяц назад
When searched, there were many explanations that insisted PCoA and MDS have differences. But, I couldn't get it enough. You said both are exactly identical in this video, and can you explain the reason more, please...
@statquest Месяц назад
I talk about the differences very early in the video, at 0:42
@takethegaussian7548 4 года назад ⁺³
Thank you very much! This is a really really good explanation.
@statquest 4 года назад
Glad it was helpful!
@ahmetlacin5748 2 года назад ⁺¹
ı just have no idea how to thank you. Viva Josh!
@statquest 2 года назад ⁺¹
Thanks!
@alejandrotenorio2327 4 года назад ⁺³
In MDS where does the minimizing of the Raw Stress go? I'm not getting how you can do that while performing EVD to reduce the dimensions
@Stephanbitterwolf 6 лет назад ⁺¹
Very helpful. Not sure if this has been pointed out yet but at around 4:17 you talk about distance for gene 3 but the numbers aren't accurate for that gene difference.
@poojakunte6865 6 лет назад ⁺⁷
difference for Gene 3 should be (2.2 - 1)^2 right ?
@statquest 6 лет назад ⁺⁴
Yes! That's just a typo in the video.
@simonhunter-barnett6616 4 года назад
If MDS and PCA have the same outputs, why would you chose one over the other? What's the importance of correlation vs distance? P.S. I've been trying to understand PCA and MDS for months now and this was so much easier than reading articles and books :D
@statquest 4 года назад ⁺¹
Starting at 4:48 I give examples of using MDS with different distance metrics, which result in outputs that are different PCA.
@sofiagreen9742 5 лет назад ⁺²
Hello Josh and thank you for your videos, they are really helpful. Would you mind making a video on Canonical Correlations please?
@MrZanvine 7 лет назад ⁺²
Brilliant video, you're awesome! Thanks for taking the time to make these :)
@rlh4648 2 года назад ⁺¹
Thanks Josh
You're feckin awesome.
@statquest 2 года назад
Thanks!
@madihamariamahmed8727 2 года назад
Please make videos on Deep clustering methods!
@statquest 2 года назад
I'll keep that in mind.
@medazzouzi2649 Год назад ⁺¹
Heyy josh i m confuse in the pca statement "correlations among samples" isn't suppose to be correlation among variables? Since we are reducing dimension of variables in this case ( genes) not the samples?
@statquest Год назад ⁺¹
The goal of the plot is to show correlations among the samples - so each sample has a lot of gene measurements, and correlations among sample would mean that a lot of those measurements are similar (or the exact opposite of similar) and we want to preserve those relationships. We want things that are highly correlated to appear close to each other in a graph.
@medazzouzi2649 Год назад ⁺²
@@statquest ahhh okayyyy i gettt itt 😍😍😍
@medazzouzi2649 Год назад ⁺²
@@statquest thanks josh
@abcd123456789zxc 3 года назад ⁺²
Thanks so much for your video, but I still have a question; I really don't understand what is the difference between PCoA and MDS.
It would be a great help if anyone could explain the difference between PCoA and MDS.
@statquest 3 года назад ⁺²
MDS has two versions: "Classical" and "Non-Metric". This video shows how "Classical" MDS works. Classical MDS is the exact same thing as PCoA. There is no difference. However, there is a difference between PCoA and "Non-Metric" MDS. Maybe one day I'll make a video on "Non-Metric" MDS.
@abcd123456789zxc 3 года назад ⁺¹
@@statquest Thank you so much for your time and consideration.
@HOMESTUDY247 3 года назад ⁺¹
Great video
@statquest 3 года назад
Thanks!
@alexlee3511 7 месяцев назад
Thank you for the effort! but i am wondering if we are going to reduce the dimension of genomic data, do people prefer PCA or PCoA?
@statquest 7 месяцев назад
MDS with log fold change is the default for DESeq2 and possibly other programs. However, I feel like PCA is more commonly used.
@marchino1981 6 лет назад ⁺¹
Very nice and clear! Thank you!
@statquest 6 лет назад
Hooray! :)
@dist321 5 лет назад ⁺¹
Hi Josh! I´ve been here many times and love your channel. I have a question about the axis. I understand that each one accounts for "x" percentage variation of the dataset, being axis one, the higher percentage, however if I look at samples along PC1, can I assume any biological meaning for those samples far to the right or far to the left?
@trinh123456 4 года назад ⁺¹
Your videos are amazing!
@statquest 4 года назад ⁺¹
Thank you very much! :)
@DungPham-ai 7 лет назад
best video. Can you make video explain Non-negative matrix factorization (NMF) ?
@Mako0123 7 лет назад
Nice explanation as always!
@liranzaidman1610 4 года назад ⁺¹
Hi Josh,
have you ever encountered a clustering model where there were more than 3-4 clusters? I've done it many times, and it looks like the number of optimal clusters (3-4) is "natural".
@statquest 4 года назад
Very interesting. I'll try to remember to keep track of these things in the future to see if I get similar results.
@kaynkayn9870 Год назад
I like to learn using videos (mainly from your channel) and gpt for the maths equation. I checked wikipedia just to be sure but it looks like you skipped the step about "Double Centering and Matrix Transformation" entirely.
@statquest Год назад
I talk about that in my PCA videos: ruclips.net/video/FgakZw6K1QQ/видео.html and ruclips.net/video/oRvgq966yZg/видео.html
@kaynkayn9870 Год назад
@@statquest I must have missed it, ill review it again. Thank you.
@statquest Год назад ⁺¹
@@kaynkayn9870 Those video specifically talk about the centering of the data - how and why we need to do that. I don't talk about matrix transformations explicitly because those are just one of several ways to specifically perform PCA.
@manueltiburtini6528 3 года назад ⁺¹
Hi Josh from Italy! Are the assumptions of this methods always the same? (Normality, independence, homosch., linearity)
@statquest 3 года назад ⁺¹
The same as PCA? I'm not sure. However, I do know that whatever assumptions there are are often ignored and people just try PCA or MDS and see what happens.
@manueltiburtini6528 3 года назад
@@statquest this could lead to false interpretations. isn't it? I'm using such technique and LDA to analyze taxonomic data and I'm scared that my dataset is not independent due to phylogenetic common origin.
@statquest 3 года назад ⁺¹
@@manueltiburtini6528 I don't really think that's a big problem for MDS or PCA. These methods are just designed to reduce dimensionality for drawing graphs or to plug into some other analysis (like regression).
@adelutzaification 7 лет назад
Wow. The PCA and MDS really are very similar, just like the videos describing them (clearly explained and overall awesome ;) It seems to me that PCA is just a particular case of MDS, as in the case of MDS one can adjust the distance metric to get various outputs, including the one given by PCA. If that is the case, why aren't people use MDS more? It seems under-utilized. Is it trickier to implement?
@malteneumeier3274 5 лет назад ⁺¹
@Josh Starmer: in minute 4:14 there is a tiny mistake in the formula: the difference for gene 3 should be (2.2 - 1)². Instead the distance for gene 2 was repeated.
@statquest 5 лет назад
Thanks a lot for pointing that out. I've added this to the "Errata" page that I maintain so that one day, when I create new editions of these videos, I can correct all the little mistakes.
@khajariazuddinnawazmohamme3092 6 лет назад ⁺²
Hi Josh, I really like your videos and they are very intuitive. Could you do a StatQuest video on Partial Least Squares if possible? Thanks in Advance :)
@statquest 6 лет назад ⁺³
Partial Least Squares is on the to-do list, so, with your vote, I'll bump it up a notch so that it is closer to the top.
@khajariazuddinnawazmohamme3092 6 лет назад
@@statquest thank you so much Josh 😊
@melaniee467 5 лет назад ⁺¹
@@statquest cant wait for your Partial Least Square explanation!
@statquest 5 лет назад
@@melaniee467 Sounds good! I'll bump it up another notch!
@rrrprogram8667 6 лет назад
Great Video.... Actually I am elevating my self from Excel data analysis to machine learning... Right now I am in stage to grab everything I can....What are ur advise to excel users to machine learning enthusiasts...
@YooToobins 5 лет назад ⁺⁹
Recommend speeding this up to 1.25x while viewing
@Emily-Bo 2 года назад
Hi Josh, how do you choose among PCA, LDA and MDS methods?
@statquest 2 года назад ⁺¹
LDA is supervised, so you can only use it when you know what groups you want to supervise. MDS is useful when you want to change the distance metric. And if you don't want to change the distance metric, MDS and PCA are the same.
@Emily-Bo 2 года назад ⁺¹
@@statquest Thank you, Josh! very helpful!
@jxaskcijiaxhsic9943 3 месяца назад
How do you exactly find the axis of MDS? What do you do after you calculate the distances?
@statquest 3 месяца назад
To get a sense of how it works, see: ruclips.net/video/FgakZw6K1QQ/видео.html
@jxaskcijiaxhsic9943 3 месяца назад
@@statquest is it the same thread as calculating the PCs when calculating the axis of MDS? Like finding the best fitted line by minimizing the SSR. If it is, what role does calculating the distances between points play?
@statquest 3 месяца назад
@@jxaskcijiaxhsic9943 It's a related technique. It's not the same, but related. Based on the distances we can calculate variances and covariances and from those we can find the directions that there is the most variation in the data.
@jxaskcijiaxhsic9943 3 месяца назад
@@statquest okay so it is still finding the best fitted line but remain the distance between the points same after dimension reduction
@ranitchatterjee5552 3 года назад
To plot the data, do we select the cells with maximum distances? Like for example if cell 1 & 2 and cell 3&4 have maximum distances, do we plot with respect to them?
@statquest 3 года назад
To get a better understanding of how it works, check out the StatQuest on PCA: ruclips.net/video/FgakZw6K1QQ/видео.html
@bitsajmer 3 года назад
Hi Josh,
1. How do we plot the values of MDS on the graph. because with distances we only have a single value.
DO we plot it on a number line? but you showed a graph with 2 axis
@statquest 3 года назад
MDS converts a matrix of distances into different axes in much the same way that we do it for PCA. For details, see: ruclips.net/video/_UVHneBUBW0/видео.html
@ketalesto 3 года назад ⁺¹
Day 40 of #66DaysOfData
Yeah baby! Let's go!
@statquest 3 года назад ⁺¹
bam!
@KayYesYouTuber 5 лет назад
Are you saying we compute Eigen Values and Eigen vectors on the distance matrix instead of the covariance matrix? Is that the only the only difference between PCA and MDS ?
@statquest 5 лет назад ⁺¹
And you get your choice of distance metrics.
@CWunderA 6 лет назад ⁺²
Good video, but it was not very clear to me why you would choose one over the other (MDS vs PCA)
@statquest 6 лет назад
If you're working with distances, then MDS is the way to go.
@CWunderA 6 лет назад ⁺¹
My question was more why would someone choose to cluster/reduce dimensionally using distances over correlations?
@statquest 6 лет назад ⁺²
At 6:20 in the video I mention that a Biologist might choose to use MDS to show clustering using log-fold changes because, traditionally, gene measurements are analyzed in terms of log-fold changes.
Alternatively, it could be you want to cluster locations in a city based on how far they are away via taxi (so blocks and one-way streets are a factor) - MDS can do this.
@CWunderA 6 лет назад ⁺²
Ah I see, so it is more that MDS allows you to cluster via any distance metric of interest, where as PCA limits you to correlation/euclidian distance. Thanks for taking the time to help me out!
@statquest 6 лет назад ⁺²
You are correct - MDS lets you cluster stuff using any distance metric. The coolest thing about that, which I forgot to mention, is that, via Random Forests, you can use MDS to cluster any data, regardless of type. Check it out in "Random Forests Part 2:" ruclips.net/video/nyxTdL_4Q-Q/видео.html
@whatyouwantyouare 3 года назад
Hi Josh, thanks so much ... Confusion: the new table with distances will have columns d12. d13 d14 .... d23 d23 .... so when we plot stuff why would we still have clusters corresponding to cell1 cell 2... wouldn;t the colours correspond to d12 d13 ... etc. ?
@statquest 3 года назад
The first column in the distance matrix will be cell1, the second will be cell2, etc, the first row in the distance matrix cell1 and the second will be cell2 etc. The distances are then the values in the matrix. The distance between cell1 and cell1 (in the upper left hand corner of the matrix) is 0, etc.
@darkredrose7683 2 года назад
Thank you! And how about the CAP analysis? I'm so confused >< Thank you in advance!
@statquest 2 года назад
I'll keep that topic in mind.
@mohsenvazirizade6334 5 лет назад ⁺¹
Hi, Thank you so so much for such a good explanation. Do you mind if I ask the reference book/paper for the terminologies? I am a little bit confused since I assume the same methods are a little bit different in various reference books. Thank you
@statquest 5 лет назад ⁺²
To be honest, I can't remember what my original sources are for this video. More recently I've been putting the sources in the description below the video, but this video is too old for that.
@swarnimkoteshwar 2 года назад ⁺¹
Thank you!
@statquest 2 года назад
:)
@oliseh2285 5 лет назад
Hi Josh, thanks a lot your amazing videos!!!
I have a question, with molecular markers (SSRs or SNPs) what would you personally choose?
PCA or PCoA?
@statquest 5 лет назад ⁺¹
If you use the euclidian distance, then they are the same.
@oliseh2285 5 лет назад
Yes, I got it seeing the video. But I'm not sure which kind of distance should I use in case I want to perform a PCoA with microsatellites in R, and also if PCoA is better than PCA when you use a specific distance for microsatellites.
It's weird because when I used the Adegenet function [dudi.pca()] for my df of 5 SSRs with 23 alleles, the function instead of considering 5 variables (the 5 SSRs) took 23 variables (the 23 alleles) and for this reason, the explanation of variance of PC1 and PC2 is quite low.
Hope you can suggest me something based on your experience as a geneticist.
Thanks a lot.
@statquest 5 лет назад ⁺¹
PCA is the most commonly used method in genetics.
@oliseh2285 5 лет назад ⁺¹
Thanks a lot for the answer and for making statistics accessible to all and funny. Please continue your terrific job. We love you!!!
@drzun 5 лет назад
Hi Josh, thanks for the video. I'm a bit confused that when you said "PCA starts by calculating the correlation among samples", did you mean the plotting of each sample on multi-dimensions like your previous PCA video? If so, how about PCoA? Do we also "plot" the distances among samples first and then try to get the top 2 PCs as well? If that's true, then how is the number of dimensions determined in the case of PCoA? I watched all of your PCA videos and I can understand how to get a PCA, but somehow I still don't know how a PCoA is done... thank you!
@statquest 5 лет назад ⁺¹
There are two ways to do PCA - an old method that is based on covariances and correlations (described in this ruclips.net/video/HMOI_lkzW08/видео.html and this ruclips.net/video/_UVHneBUBW0/видео.html ) and a new method that uses Singular Value Decomposition (described in this ruclips.net/video/FgakZw6K1QQ/видео.html ) . This video on PCoA/MDS references the older method (using covariances and correlations). To calculate the covariances and correlations among the samples, you follow the steps outlined in these videos on covariance statquest.org/2019/10/08/covariance-and-correlation-part-1-covariance/ and correlation statquest.org/2019/10/08/covariance-and-correlation-part-2-pearsons-correlation/ . That gives you a single number for every pair of samples. We then do Eigen Decomposition of those numbers to get the PCs. With PCoA, we calculate distances (using the euclidian distance or some other metric) between each pair of samples and do Eigen Decomposition of those numbers to get the PCs.
@shahbazsiddiqi74 5 лет назад ⁺⁴
Unlike PCA where we compared Genes variation in order to give weight to calculate the value for each cell and then map them accordingly to PC1 and PC2. Here we are calculating the distance between cells with reference to each genes. What is the calculation for MDS1 and MDS2 . I am confused because we are taking 2 cells at a time, instead of one and are we plotting the difference of each gene with respect to cell 1 along x axis and cell 2 along y axis. Could you please explain what to consider for MDS1 and MDS2 ? Thanks a ton
@chrisjfox8715 4 года назад
If this is in reference to the LogFold Change graph then I too agree that it isn’t explained what those two axis distinctly represent. I get how the lfc was calculated before then (between every single pair of datapoints), but those axis could theoretically be anything at the discretion of the investigator...and what it is here hasn’t been made clear.
@YasmineNazmy 3 года назад ⁺¹
Brilliant thank you
@statquest 3 года назад
Wow! You're going through them all! BAM! :)
@sathsarawijerathna9325 5 лет назад
Hi Josh. Do you have any videos for NMDS?
@statquest 5 лет назад
Not yet. You can find an organized listing of all of my videos here: statquest.org/video-index/
@Retko85 3 года назад
Hi Josh.. I am a little confused, regarding features and samples . For example here on 6:56, you say that PCA create plots based on correlations among samples. Only concept of correlation that I know is between features, so when 2 features change together, correlation is big. But I got confused here. I tried to search about sample correlations, and what I found was correlations on samples, as part of a population, but here samples should be like rows/instances/observations. Also your computation of Euclidean distance got me confused, Since you have features as rows - gene1, gene2, and samples as columns, cell 1, cell 2. Can you please confirm my understanding - Does PCA create plot based on correlations among FEATURES, like, person age, weight etc., where each person is a sample?? Thank you :)
@statquest 3 года назад
To get a better sense of how PCA works, see: ruclips.net/video/FgakZw6K1QQ/видео.html
@mahdimohammadalipour3077 2 года назад
Where can I find a numerical example ? I googled but couldn't find anything :(
@statquest 2 года назад
See: ruclips.net/video/pGAUHhLYp5Q/видео.html
@jihadrachid9044 4 года назад
Thank you for this great video but I want to understand for nMDS graph I should transform my values from % to square root?
I have like 28 species. Your help will be highly appreciated.
@statquest 4 года назад
Unfortunately this video only covers classical MDS.
@jihadrachid9044 4 года назад
@@statquest Can I contact you by email to understand more my case?
@yudiherdiana4979 3 года назад ⁺¹
Thank you!!
@statquest 3 года назад
You're welcome!
@SophieLemire Год назад ⁺¹
Thanks!
@statquest Год назад
Hooray! Thank you so much for supporting StatQuest! TRIPLE BAM! :)
@hannahnelson4569 5 месяцев назад
Ok. I'm going to admit. I don't understand what this video is saying. It says to just replace the dot product with other distance metrics. And that sounds fine? But it doesn't make sense that we are using the same computations mathmatically for a distance matrix and a correlation matrix. The correlation matrix (dot product distance) makes sense because its special properties allow it to have a decomposition with a diagonal component which we can sort and then reduce in dimension to produce our PCA plot. It is not at all clear to me why an arbitrary distance plot of the predictors will be diagonalizable in the same way. So the rest of the mathmatical interpretation breaks down from there.
Basically. The math and the interpretation feels a bit off to me. I'll have to do more research on the topic.
@jamesayukayuk1151 6 лет назад
Hey Joshua. I have not found anything on the non-metric version of MDS. Any videos, please?
@jamesayukayuk1151 6 лет назад
Thank you. Will keep an eye out for it when done. Thanks for the good work.
@trinh123456 4 года назад
Hi Josh, it is me again. Thanks for the great video! I am wondering if you have a video on nMDS because I saw this quite often in biological studies, but still quite blur..
@statquest 4 года назад
Unfortunately I don't have a video on non-metric MDS.
@trinh123456 4 года назад
No worries. Are you going to do it any time soon? I am quite looking forward to it because it is quite common in Biology. Thanks Josh!
@statquest 4 года назад ⁺²
@@trinh123456 Unfortunately, I do not have plans to do it anytime soon. My to-do list is huge (it has 100s of items on it) and I can only make a few videos each month. I work as fast as I can, and I work all the time, but it's not enough to keep up with the requests.
@datenfritz9860 4 года назад
Hi Tien, maybe I can provide some help for nMDS based on Josh's tripple BAM video! (As always amazing job Josh!). To my knowledge NMDS is a ranked based approach. Like MDS you start with computing the distance between samples. The these distance values get then ranked. After the ranking you perform the "fancy math" thing to get the coordiantes for a graph. Be aware that you loose quantitative information when clustering on ranks.
You can check this website for more details: mb3is.megx.net/gustame/dissimilarity-based-methods/nmds
@siddheshb.kukade4685 Год назад ⁺¹
Thanks😊
@statquest Год назад
:)
@bibinkalirakath 4 года назад
i have seen pcoa graphs with 3 dimensions, is there any video explaining about them
@statquest 4 года назад
That would be a lot like seeing a 3-dimensional PCA plot. For more details, see: ruclips.net/video/FgakZw6K1QQ/видео.html
@bibinkalirakath 4 года назад ⁺¹
@@statquest Thank you very much. This helped me a lot.
@thourayaaouledmessaoud9223 6 лет назад
Thanks for this video, i just have one question does MDS only accept symmetric (square) matrix as input?
@thourayaaouledmessaoud9223 6 лет назад
Thanks again , i will watch your recommended video ^_^
@chrischoir3594 5 лет назад
Hi, What software do you use here?
thanks
@糜家睿 7 лет назад
Hi, Joshua. I noticed that you mention "the data is not linear" in the reply of comments. I am really confused about this concept for some time. What does non-linear data mean(I guess it is not the same kind of concept of linear model right, haha)? A bioinformatician told me that single-cell data is non-linear and we'd better used tSNE rather than PCA. How to explain the bulk RNA-seq data is linear data and single-cell RNA-seq data is non-linear. I really really hope you could answer my question because it really really confuses me for quite a long time.
@糜家睿 7 лет назад
Haha, thank you Joshua. The spiral pattern is the so-called "Swiss roll" model I think. Someone says that linear dimensional reduction focus more on global pattern(like distance), while the non-linear dimensional reduction methods focus more on local pattern.
Why not talking about zero-inflation in single-cell next time and the normalization methods used in single-cell data analysis?
@kartikmalladi1918 Год назад
What value is plotted exactly on MDS?
@statquest Год назад
It depends on what metric you use.
@kartikmalladi1918 Год назад
@@statquest if mds is plotted between 2 genes, then the distance itself became single variable. Any combination and their distance can be pointed on number scale. So if this is x coordinate of the plot, what is the y coordinate for a point
@DaisyKB123 5 лет назад
What does it mean by the "percentage of variation each axis accounts for"?
@杨明-r6i 5 лет назад
The principle component axis 1, 2, 3, 4, 5... explained Rate in the PCA plot.
@rekhasharma4962 Год назад
How to adjust overlapping labels in PCA biplot???
@statquest Год назад
Good question!
@jcb0trashmail 4 года назад
I still don't get why you would would choose MDS over PCA or the other way around...
@statquest 4 года назад
MDS can work with any distance metric, not just euclidian. Here's a great example: ruclips.net/video/sQ870aTKqiM/видео.html
@yulinliu850 6 лет назад ⁺¹
Excellent!
@statquest 6 лет назад
:)
@urjaswitayadav3188 7 лет назад
Great video!
@ninakoch1799 Год назад ⁺¹
THANK YOUU❤️
@statquest Год назад
You're welcome 😊
@rncg0331 5 лет назад
do you have a python version for mds?
@doremekarma3873 8 месяцев назад
can someone please explain how do we calculate MDS1 and MDS2 after obtaining the distance between each pair of cells
@statquest 8 месяцев назад
You use eigendecomposition.
@psyferinc.3573 2 месяца назад ⁺¹
the ukelele at the beginning
@statquest 2 месяца назад
:)
@kathik595 5 лет назад ⁺¹
Do complete statistical predictive modeling using python & R
@nutzanut9817 5 лет назад
How can we draw 2D graph after calculate distance of all pair ?
we've got nC2 value for n feature .
Thanks.
@statquest 5 лет назад
You do it just like PCA. For more details on how PCA does it, check out this video: ruclips.net/video/FgakZw6K1QQ/видео.html
@raquelpurpleboxes 5 лет назад ⁺¹
You're amazing!!!
@statquest 5 лет назад
Thanks! :)
@BeateSukray 5 лет назад ⁺²
I love you, man
@statquest 5 лет назад
Thanks! :)
@Diegocbaima 5 лет назад ⁺¹
Great, dude!
@statquest 5 лет назад
Thank you!
@jameelahharbi2714 Год назад
i need more details for PCA
@statquest Год назад
For more details about PCA, see: ruclips.net/video/FgakZw6K1QQ/видео.html
@lalala90348 6 лет назад ⁺¹
“Reduce them to a 2-D graph”? How exactly?
@statquest 6 лет назад
See 7:08
@noobshady 6 лет назад ⁺¹
where can we read about the fancy math related?
@statquest 6 лет назад ⁺³
Wikipedia is always a great place to start: en.wikipedia.org/wiki/Multidimensional_scaling
@adelutzaification 7 лет назад
One more comment. The fact that mds uses a precomputed distance reminds me hierarchical clustering. Does it mean that MDS is a 2d representation of hierarchical clustering?
@adelutzaification 7 лет назад
That would be cool. I am brewing something. I might have an idea. Not sure how good at this moment. I need to write it up. I'll keep u posted to see if it is worth anything. Ta ta
@adelutzaification 7 лет назад
I went down in flames :) It turns out I was thinking of re-inventing the wheel :) My inclination was to further dissect the PCA results/"clouds" and see the relationship between the comprising datapoints. I was deflated to see that this problem was solved many years ago by clustering (either kmeans or hierarchical). ;(
On the good side, I found a few useful things. A paper that confirms the relatedness between PCA and Kmeans as you were anticipating:. ranger.uta.edu/~chqding/papers/KmeansPCA1.pdf
I also found out about the HCPC package in R that can do hierarchical clustering after factor analysis. It seems kinda cool as on the graphical side it does pseudo 3D hc. Imagine the first 2 PC as a horizontal plane and the clusters roots coming from the top... www.r-project.org/conferences/useR-2009/slides/LeRay+Molto+Husson.pdf . In the usual 1D hc, I don't like the fact that some less related points are adjacents. This HCPC plotting is not perfect either as it obscures some datapoints.
I was thinking that 2D-density could be used to further "cluster" the PC plot ; for example with geom_density_2d()/stat_density_2d() in ggplot2. with the right arguments and aesthetics (with the right function) might be able to pick up some "clusters" but not relationship between the point inside of a contour. Maybe adding relatedness by connecting the dots somehow on a zoomed in plot (by adjusting the axes) my help to see further details. ..
What other ways of showing relatedness besides hc and correlation matrices do people use ?
@neckar6006 Год назад
4:15， maybe distance for gene3 is wrong
@statquest Год назад
Yep. That's a typo.
@marahakermi-nt7lc Год назад
hmmm i guess the covariane matrix in this case is a matrix with o diatances in ths diagonal
@statquest Год назад ⁺¹
That would mean the variance was 0.
@marahakermi-nt7lc Год назад
@@statquest yessss since subtracting the same distance=0
@alecvan7143 4 года назад ⁺¹
awesome :)
@statquest 4 года назад
Thanks! :)
@fatihbaltac1482 6 лет назад ⁺¹
BAAAM !!
@statquest 6 лет назад
love it! :)
@Cuicui229 3 года назад
hi Josh! Thanks for the video! I still didn't get the point how we can do the same thing on the distance matrix as we do on PCA(ruclips.net/video/FgakZw6K1QQ/видео.html) I watched this video, and thanks for your wonderful explanation, I could imagine that for serveral samples with 2 genes, we can draw the dot on the 2-D plot(gene1 and gene2), and we find the best fit line, which is the PC1 and then a PC1 vertical line as PC2, both with the largest distance to the origin. But when it comes to the distance matrix, how can we draw the dot, because there is no gene. Only sample1, sample2 ...et al. I really confused. Truly thankful!
@statquest 3 года назад ⁺¹
There are two methods for doing PCA - the one I present in that video is called "Singular Value Decomposition" and it works the way I presented in that video. Alternatively, we can do something called "Eigen Value Decomposition" and this is based on using the covariance or correlation matrix of the data. It is through this second way that PCA ends up giving us results similar to MDS. Unfortunately, I don't have a good video for explaining how this second way works. :(

Следующие

Автовоспроизведение