Two-step Cluster Analysis in SPSS

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 221

  • @Gaskination
    @Gaskination  3 года назад +1

    Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the RUclips channel where we post a new video almost three times per week: ruclips.net/channel/UCiujxblFduQz8V4xHjMzyzQ
    Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074
    And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en
    Check it out! Thanks!

  • @duran099
    @duran099 10 лет назад +11

    Thank you! I enjoyed the back and forth of your problem shooting at the start for which variables to use. Made it more real, and gave some context from a theory perspective.

  • @blackchallice
    @blackchallice 11 лет назад +1

    WOW, you have explained cluster analysis very clearly. This is the first time I'm learning CA and I totally get it. Thank you!

  • @vide0gameCaster
    @vide0gameCaster 8 лет назад

    Dude you don't understand how this vid helped me for my statistic exam. I aced my test thanks to you!
    You just gain a subscriber!

  • @lefkiospaik
    @lefkiospaik 10 лет назад

    Great presentation! Moreover the "not suitable" variables you chose in the beginning, really helped a lot to understand more on the cluster analysis. Thanks

  • @TulioMaia
    @TulioMaia 12 лет назад

    Thank you so much!
    I'm a starter on SPSS. I'm a R user, but i'm gonna start SPSS from now!
    Thanks again!

  • @Gaskination
    @Gaskination  12 лет назад

    Funny you should ask! I was just considering doing this yesterday. I will probably do a K-means cluster, and also show how to segment the data and explore clusters for sub-populations. This is definitely on my to do list.

  • @Gaskination
    @Gaskination  11 лет назад

    Not a stupid question because I had to look up the answer :) The SPSS help manual says that the two-step cluster analysis assumes normally distributed data for all continuous variables, but that tests have shown it to be robust enough to handle non-normal data fairly well.

  • @yuriveneziani8029
    @yuriveneziani8029 6 лет назад +1

    Amazing explanation... clear and direct! Thank you!

  • @samirsarsamss
    @samirsarsamss 10 лет назад

    Many thanks dear James Gaskin for this helpful video, please go ahead with other different aspects or even tools.

  • @Xirukah
    @Xirukah 11 лет назад

    You're a great guy!! I study SPSS in College in three levels.. Introduction to Data Analysis, Univariate Data Analysis and Multi-Variate Data Analysis for 3rd level. In this moment i'm on 3rd and this process is really usefull!
    Thank You!!

  • @krismatthews7550
    @krismatthews7550 8 лет назад +2

    You seriously just saved my Quantitative Analysis project :] THANK YOU!

  • @JohnParavantis
    @JohnParavantis 5 лет назад

    If I may, at 9:01 I would like to correct your reference to the boxplot: the middle line does indeed represent the median, but the left and right edges of the box lie at the first and third quantile respectively. So, rather than representing one standard deviation below and above the mean, the box represents the middle 50% of the observations. Thank you very much for the video, very lucid explanation of swamping variables, still very useful in 2019!

  • @vshapoval
    @vshapoval 11 лет назад

    I do not have questions, but I found your video extremely helpful with very good explanation So I only wanted to say thank you. Your video was a great help. =)

  • @Gaskination
    @Gaskination  11 лет назад

    You can certainly try k-means. It just depends on what your research intentions are. I actually prefer k-means over two-step. I just learned two-step first, so that's what I made the video for. I should probably make one for k-means sometime...

  • @snailbby6664
    @snailbby6664 7 лет назад +9

    "These are the ones you'll probably punish by making them managers" 😂

  • @user-qy6mx4oq9i
    @user-qy6mx4oq9i 4 года назад

    thank you for awesome explanation! wish you good luck! I've found all your videos very very very helpful

  • @TheCopginger
    @TheCopginger 12 лет назад

    That's great indee! Well, I also have some ideas on how you could make it better from learner point of view.
    1. Explaining why use certain/specific methodology for clustering
    2. Producing it from basic to advanced methodology
    3. Probably using data across industry/sector
    I dont know how much time you have to spend on these and you would want to, however I can provide you data which will enhance your quality of analysis. (and off course your self marketing value)

  • @AlbertGavino
    @AlbertGavino 9 лет назад

    great simple video on 2 step clustering (great for categorical variables or binary ones) with some continuous variables.But I like 2 step since it creates it's own clusters of which I don't have to specify (unlike in K-means)

  • @talhelmt
    @talhelmt 11 лет назад

    Thanks! I appreciate the time you put into making this.

  • @ildilovasz2982
    @ildilovasz2982 4 года назад

    Thank you for this video, very clear and it helped me write my thesis.

  • @MarcRodrigues10
    @MarcRodrigues10 5 лет назад

    Thank you! This video helped me a lot, especially with the results analysis.

  • @Gaskination
    @Gaskination  11 лет назад

    Look at the sig value. If it is less than 0.05, then it is the groups are significantly different for that variables of comparison. If it is poor quality, then you might try a three factor model. Not sure you can rely on the cluster groups when they are poor. This means that the membership assignment was inconsistent based on the indicators used for the clustering. e.g., sometimes males went into cluster 1, sometimes in cluster 2.

  • @tekonen
    @tekonen 9 лет назад +2

    Thanks for sharing your knowledge!

  • @petradubajovamarinakova9268
    @petradubajovamarinakova9268 10 лет назад +1

    Your video helped me. Thank you very much :)

  • @mxm001
    @mxm001 9 лет назад

    Thank you SO much, James. This was very helpful.

  • @chrisnahm
    @chrisnahm 7 лет назад

    Really enjoyed this and was very helpful. Thank you!

  • @Gaskination
    @Gaskination  12 лет назад

    Thanks for the ideas. I just do these when the need arises or when I have the time. I'll probably have some time to do a couple next week. I have some data that has grouping variables, so no need to send me yours. Thank you though.

  • @koenovisch
    @koenovisch 11 лет назад

    Thank you for your reaction! I will continue looking for it!

  • @Gaskination
    @Gaskination  11 лет назад

    That is what I meant, but those are undesirable sample sizes. You might also look at indicator importance to see if one variable is swamping out the others. If so, you might consider removing it. Or you can try K-means clustering... I haven't made any video for that yet...

  • @Gaskination
    @Gaskination  11 лет назад

    Glad to be helpful. Hope you'll subscribe and tell your friends. :)

  • @Gaskination
    @Gaskination  11 лет назад

    1. No references come to mind. When you run comparisons later on between clusters, if one cluster is much larger than another, then this will affect the critical ratio (t, f, or z statistic) since critical ratios are sensitive to sample size. Thus, working with similar sizes is ideal when making comparisons.
    2. SPSS makes n+1 groups, where the extra 1 is those who did not fit in anywhere else. To figure out which clusters are which, look at the cluster output number in the output window.

  • @Gaskination
    @Gaskination  11 лет назад

    Did you double click it? You have to double click it to make it show up.

  • @zhexiongtao2167
    @zhexiongtao2167 10 лет назад

    really interesting and helpful! Hope you can also make one for K-Means

  • @DaDonnyZhang
    @DaDonnyZhang 10 лет назад

    Great video! Thank you so much!

  • @Gaskination
    @Gaskination  11 лет назад

    I don't yet, but people keep asking for one, so I should probably do one.

  • @123canuckfan
    @123canuckfan 11 лет назад

    God I wish you were my stats teacher!

  • @Zopzuita
    @Zopzuita 11 лет назад

    I can't doublecklick since the model viewer doesn't show up it all. It writes the clusters in the column but that's it - even though I activated the option...Any ideas what could be wrong? Thanks a lot in advance!

  • @Jemoeder86
    @Jemoeder86 10 лет назад

    Very informative! Thanks

  • @sticky924
    @sticky924 10 лет назад

    Thank you for this video, it is very helpful

  • @Gaskination
    @Gaskination  11 лет назад

    I have not. Best of luck. But, basically it is like an R-squared analysis. It shows how much of the variance is being explained by each indicator.

  • @alfonspriessner8556
    @alfonspriessner8556 8 лет назад

    Hi James! Very helpful video - you saved me a lot of time. :-) Unfortunately, I have two additional questions, and it would be great if you could help me. I am sure, you are the expert who can help me!
    1) Lets assume SPSS program proposes 3 clusters based on a set of variables. What statistical tests are used for the selection of 3 clusters instead of 2 or 4 in the background? I read in some papers that e.g., likelihood-ratio (L2) and its p-value, the Bayesian Information Criterion (BIC) and the number of parameters (Npar) could be examples for these statistical tests (there are for sure others)? And if some of these tests are conducted by SPSS in the background, is there a way how I can create an output-chart of these statistical parameters in SPSS? In other words, since SPSS tells me 3 clusters, I would like to show why 3 clusters and not 4 based on a few statistical tests.
    2) Lets assume we still have these 3 clusters from question 1 which were created based on a set of variables. But I have another variable (e.g., age) which I did not use for the cluster analysis. How (if there is any option in SPSS) can I calculate the mean of variable age for each of the 3 identified cluster and show it in an output table (best case for more than 1 additional variable).
    I hope you understand my questions. I would appreciate your help and guidance!! Thanks a lot in advance!
    Regards, Alfons

    • @Gaskination
      @Gaskination  8 лет назад +1

      1. SPSS let's you choose the AIC or the BIC as the clustering criterion, or you can use the silhouette measure that shows in the output. The silhouette is considered fairly robust. You can force it to 2 or 4 clusters as well to see what the silhouette score is for those.
      2. Watch this video at the 2:16 mark. It will show how to do this using the Output button.

  • @jeromeboissel2793
    @jeromeboissel2793 3 года назад

    Dear James,
    What references have you used on this occasion ? Besides, what would be most appropriate : K-means or Two-steps. In the paper I am working on, I have used both sets of analysis and, if the number of clusters remains the same, the number of respondants in each cluster differs quite significantly depending on which technique I use. Any tips ?

    • @Gaskination
      @Gaskination  3 года назад +1

      I'm not much of an expert on cluster analysis. I've just used the Hair et al 2010 book. As for which approach to use, I think two-step is considered the most useful and valid, since t combines hierchical and non-hierarchical methods.

  • @yifanli4312
    @yifanli4312 5 лет назад

    Thank you! This vedio is very helpful!

  • @mcole6234
    @mcole6234 11 лет назад

    James,
    Very informative. You mention the need for over 30 in the smallest cluster and between 2-3 for the largest: smallest ratio. I am dong a Phd and wondered where these numbers came from. Do you have an academic reference(s) I could cite.
    Also, at the end of the video when you ran an ANOVA from the newly formed variables in SPSS. I ran different analysis, and never had more than 4 clusters but there were 5 new variables, all with uniformative names. How do I know which ones to use?

  • @MrMustav
    @MrMustav 11 лет назад

    Good tutorial

  • @MrMustav
    @MrMustav 11 лет назад

    What if one of the item after applying post hoc shows a non significant p value e.g. you differentiate clusters on a variable, and then find that two of the clusters do not significantly differ on one item.

  • @wassdepp1
    @wassdepp1 8 лет назад

    Thank you, It made my day

  • @emindeger.
    @emindeger. 4 года назад

    Hi thank you very much for this video series.
    I have a question, I would appreciate it if you answer.
    Do we need to normalize the data in spss?

  • @juliaworldwide
    @juliaworldwide 8 лет назад

    Thank you very much for that !

  • @koenovisch
    @koenovisch 11 лет назад

    James, do you know a video in which the IPA (importance/performance analysis) is being explained? Have you made such a video?

  • @TheCopginger
    @TheCopginger 12 лет назад

    By the way, I was performing cluster analysis based on your video. However, I have few questions to ask you
    1. Is it possible to assign weightage to individual record while performing segmentation?
    2. If there is already weightage available for individual record (based on other criterion) how to make use of that in the segmentation process?

  • @polisherci
    @polisherci 8 лет назад +1

    Hey, can you run a regression clustered by a certain variable on SPSS? like the regress ... cluster (.. ) command in stata?

    • @Gaskination
      @Gaskination  8 лет назад +1

      I'm not sure. I haven't used STATA much. You can run a cluster analysis, and then use those clusters as grouping variables when running regressions.

  • @olofreichenberg6885
    @olofreichenberg6885 11 лет назад

    Very helpful!

  • @cynthiagallagher75
    @cynthiagallagher75 8 лет назад

    Is here a video that provides more detail on interpreting the clusters themselves? It would be helpful to understand how the clusters are being selected and how the clusters are developed.

    • @Gaskination
      @Gaskination  8 лет назад

      The only other two-step cluster analysis video that I have is part of the Rosen College SEM Boot Camp: ruclips.net/video/2Lz2bU-sBGA/видео.html

  • @TheCopginger
    @TheCopginger 12 лет назад

    Thanks Mr. Gaskination! would you also show much more complicated (both in terms of data and procedure) segmentation.

  • @DisconnectHack
    @DisconnectHack 9 лет назад

    Hi James, did you say "swarming variable" or "swapping variable"? I couldn't figure it out, and I have tried looking for definitions for both, only found "swapping variable" for computer science, were you talking about the same ?

    • @Gaskination
      @Gaskination  9 лет назад +1

      +DisconnectHack Swamping. I don't know what the technical term would be (or if there is one).

    • @DisconnectHack
      @DisconnectHack 9 лет назад

      +James Gaskin Thanks James, it appears there isn't one.

  • @mldsg72
    @mldsg72 9 лет назад

    James, nice job, very well done! Do you mind to make a little comment about AIC and BIC on 2-step cluster?

    • @Gaskination
      @Gaskination  9 лет назад

      Marcelo Gabriel I was not aware you could generate AIC and BIC in SPSS during a 2-step cluster analysis. I've gone back to it to fiddle with it, but I can't figure it out if it is possible.

    • @mldsg72
      @mldsg72 9 лет назад

      James, thanks for your reply. At least on versions 20 and 22, you must check the "Clustering Criterion" by choosing BIC or AIC. I'm more inclined to consider AIC than BIC due to its characteristics. Your comment would be nice. Regards

    • @Gaskination
      @Gaskination  9 лет назад

      Marcelo Gabriel Thanks for pointing me to that. I played with it and looked into it and it appears that the results are often the same (with my data), but that in general, AIC is preferred to BIC. Here is an informative explanation of why as well as some useful references: en.wikipedia.org/wiki/Akaike_information_criterion#Comparison_with_BIC

  • @ntaalya
    @ntaalya 10 лет назад

    Thank You very much!

  • @Zopzuita
    @Zopzuita 11 лет назад

    Great video! I only have a problem with the model viewer - it doesn't show up. The results are written in the column in my table but the output misses the interactive graphics. Does anybody else have the same problem? Any ideas how to fix this? Thanks!!!

  • @thomasbulitta3817
    @thomasbulitta3817 8 лет назад

    Hi James, Thank you for that Video. It was very helpful. Do you know what actually happens "inside" SPSS when you this "Two-Step-Cluster"? Which forms of clustering are used? Single Linkage and hierarchial cluster analysis?

    • @Gaskination
      @Gaskination  8 лет назад

      +Thomas Bulitta It performs a hierarchical and a non-hierarchical step. I'm not sure which specific algorithms, but I bet the SPSS manual says.

  • @brandonknettel545
    @brandonknettel545 11 лет назад

    Hi there, thanks for the informative video. I ran this analysis for my data in two different ways and each time I got a single-cluster solution. I'm assuming that that is an indication that my participants are homogenous on the variables being studied, but when I run ANOVAs I am getting significant group differences. Is my best bet to run a k-group cluster analysis and force a distinction?

  • @hem135
    @hem135 5 лет назад

    Hi James - This video is very helpful, thank you! Within the model viewer, I can see the average silhouette statistic for the cluster result. My understanding is this number is the average fit across item in the cluster. Is there a way to find the silhouette data for each item separately?
    For context, I'm using cluster analysis to identify exemplar scenarios for different types of behavior. I'm clustering scenarios based on participant ratings (e.g., this scenario represents X behavior, yes/no). I'd like to compare fit across a few different types of participant groups using an ANOVA of the silhouettes for each item. Thanks in advance!

    • @Gaskination
      @Gaskination  5 лет назад

      If there is a way, I'm not sure how to do it.

  • @researcher53
    @researcher53 11 лет назад

    Thanks, very helpful

  • @nihonbunka
    @nihonbunka 7 лет назад

    Is it possible to analyse cluster NOT around central concepts like intelligence or years on the job but upon family relationship (binary relationship closeness in a network with the absence of commonalities, as is the case in real families).

    • @Gaskination
      @Gaskination  7 лет назад

      That's an interesting idea, but I don't know how to do it in a two-step. You might be able to do it with multiple alignment algorithms, but I'm not sure if SPSS has those...

    • @nihonbunka
      @nihonbunka 7 лет назад

      Thank you very much indeed.
      I have found a partial solution in the software here
      socnetv.org/downloads
      which has a network analysis network community detection algorithm which can be used on the correlation matrix produced by SPSS factor analysis.
      Others have had the idea before
      journals.plos.org/plosone/article?id=10.1371/journal.pone.0051558
      using a different community detection algorithm
      Full statement of problem and partial solution
      www.talkstats.com/showthread.php/69145-Family-Relationship-version-of-Factor-analysis-for-Japanese-Groups?p=199672&highlight=#post199672

    • @Gaskination
      @Gaskination  7 лет назад

      cool! Thanks!

  • @SharonaTLevy-nl4dc
    @SharonaTLevy-nl4dc 8 лет назад

    thank you, very helpful

  • @rajeshpandit3634
    @rajeshpandit3634 8 лет назад

    Great video. I just want to check whether the variables you put both continuous and categorical, do you standardize them? Standardize I mean Z Normal variables as you are putting scale, binary, categorical variables together

    • @Gaskination
      @Gaskination  8 лет назад

      +Rajesh Pandit SPSS automatically standardizes all continuous variables when doing a 2-step cluster analysis. You can see this in the options area when doing the 2-step.

  • @OPaixao13
    @OPaixao13 8 лет назад

    Hi James
    How can I get the Cubic Criterion Values at different number of clusters under consideration?? I think it's also a good way to justify why X number of clusters instead of Y, right??

    • @Gaskination
      @Gaskination  8 лет назад

      I'm not sure. I've never heard of the cubic criterion. Best of luck to you.

  • @medosman23
    @medosman23 9 лет назад

    great video thank you

  • @tomh3675
    @tomh3675 10 лет назад

    Thanks for the video, do you have an example of doing a cluster analysis as a way of illustrating factor analysis/factor scores?

    • @Gaskination
      @Gaskination  10 лет назад

      No, but I do have several videos about how to do factor analysis and extract factor scores.

  • @roxy629
    @roxy629 9 лет назад

    Awesome! So clear and informational :) James, what would be the major differences between cluster analysis and factor analysis? Is it the profiling aspect? Can CA do things that FA cannot?
    Thanks again!

    • @Gaskination
      @Gaskination  9 лет назад

      roxy629 Cluster analysis clusters rows. Factor analysis "clusters" columns.

    • @roxy629
      @roxy629 9 лет назад

      James Gaskin ahhh!!! that's why it's called "profiling" makes so much sense thanks james :)

  • @joseedupont2409
    @joseedupont2409 10 лет назад

    Very helpful! What version of SPSS are you using?

    • @Gaskination
      @Gaskination  10 лет назад

      Probably v20 or 21 in this video. Maybe 19...

  • @sureshpatel3992
    @sureshpatel3992 3 года назад

    Hello James, can Two-step Cluster Analysis handle mixed variable type? Eg. some variables that are output of factor analysis (that will have negative values too), and some binary variables?

    • @Gaskination
      @Gaskination  3 года назад +1

      Yes. The two-step method can handle all types of variables. The only thing you need to watch out for is highly skewed or kurtote variables, or discrete (categorical/nominal) variable without adequate representation from each group/category.

    • @sureshpatel3992
      @sureshpatel3992 3 года назад

      @@Gaskination thanks so much for your reply, this would really help!

  • @harsin009
    @harsin009 6 лет назад

    Can these profiles really be used as a moderator in SEM analysis? Because I thought SEM only uses continuous variables since it analyzes relationship between multiple variables through regression analysis.
    For a while, I thought you were referring to Hierarchical Regression Analysis.
    Thank you!

    • @Gaskination
      @Gaskination  6 лет назад

      It can be used as a multigroup moderator for multigroup analysis, which is a form of moderation.

  • @azianwacko
    @azianwacko 8 лет назад

    Hello again James, can you explain how the analysis actually creates the clusters? I've tried using it for categorical variables and I'm not fully understanding just how it determines the clusters. Thank you

    • @Gaskination
      @Gaskination  8 лет назад

      Here are some resources to help you understand 2 step cluster analysis better:
      1. www.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/idh_twostep_main.htm
      2. www.spss.ch/upload/1122644952_The%20SPSS%20TwoStep%20Cluster%20Component.pdf
      3. www.ryerson.ca/~rmichon/mkt700/SPSS/TwoStep%20Cluster%20Analysis.htm
      4. ruclips.net/video/2Lz2bU-sBGA/видео.html

  • @spss-for-research6518
    @spss-for-research6518 9 лет назад

    I have a dumb problem and I wonder if someone could help me. The SPSS shows the cluster comparisons only for the inputs, but NOT for the descriptive variables. It just shows a message: "the cluster comparison view encountered a problem and cannot display correctly" or something like that. Why? I can't figure out.

    • @Gaskination
      @Gaskination  9 лет назад

      spss-for-research I'm not sure. It may have something to do with the variables included. Try removing one variable at a time to see if you can identify which one is causing the problem. If it isn't that, then it may be a conflict in one of the libraries being utilized to run the analysis. If that is the case, then you might need to reinstall SPSS, or you might need to update your java or .NET version (not sure which one SPSS uses).

  • @souksomphoneanothay1149
    @souksomphoneanothay1149 11 лет назад

    good video

  • @sugun1993
    @sugun1993 5 лет назад

    Thank you for the quick tutorial. I am performing two step clustering on a data from a recent study but wants to somehow fit this new data in the clusters generated from past data. Kind of like supervised learning, but neither the coefficients of the model of past data is not available nor the data, unfortunately. Is there a way to solve this or is this case hopeless?
    p.s. To get the project done in time, without access to any tools, I tried to put the new records in clusters, manually, respecting the features/characteristics of the previously generated clusters. Since the time is my major constraint and the data is just 40 new entries, I have already performed it (could you give me some idea about my options to justify the job done this way). But I am just curious to know the right way.

    • @Gaskination
      @Gaskination  5 лет назад

      If the new data is using the exact same variables as the original data, then you can simply add the new rows to the dataset and re-run the cluster analysis. That is the easiest way. If the new data is not using the same variables, then there is no statistical way to cluster them along the same lines.

  • @Gaskination
    @Gaskination  11 лет назад

    Oh. That's bizarre... I'm not sure. I would google it, or email IBM.

  • @TheAce0
    @TheAce0 6 лет назад

    You mention that when having SPSS determine clusters automatically, Euclidean distance measurement is more appropriate but when specifying the number of clusters, Log-likelihood is preferred. Could you perhaps elaborate on why this is the case? Would you know any papers that go into a bit of detail about this?

    • @Gaskination
      @Gaskination  6 лет назад

      oooh, this has been a while. The literature I read at the time suggested these things, but I can't remember which articles and books I read, or what they had to say about it. Sorry about that. If cluster analysis was something I did more often, I would have a better answer for you. But I haven't done a cluster analysis again since making this video...

    • @TheAce0
      @TheAce0 6 лет назад

      Ah, okay, fair enough. I'm dealing with cluster analysis right now and need to figure out which parameters are appropriate and why :)

  • @azianwacko
    @azianwacko 8 лет назад

    Hello James, can you explain evaluation fields and whether something like a scale of mental health would go in there?

    • @Gaskination
      @Gaskination  8 лет назад

      +Thomas Chan Evaluation fields are used to see differences in evaluation variables based on cluster membership. It is sort of like doing an ANOVA on those variables, using the cluster membership as the factoring variable. The evaluation variables will not be used to determine cluster membership.

  • @MrMustav
    @MrMustav 11 лет назад

    Dear do you have a tutorial of logistic regression? Would be great!

  • @user-pz9rz7jr2j
    @user-pz9rz7jr2j 5 лет назад

    HELP! I am using the spss v.17 and I don't get the model index ... what is going wrong?

    • @Gaskination
      @Gaskination  5 лет назад

      I'm not sure what you mean by model index. Do you mean you are not getting the silhouette index? I'm not sure what might be causing that either way though... Sorry about that.

  • @Thanh-ThaoTPham
    @Thanh-ThaoTPham 7 лет назад

    Hi James, thanks for your valuable sharing. However, is there any source for the acceptable size of smallest cluster and threshold of ratio of sizes? Thanks in advance.

    • @Gaskination
      @Gaskination  7 лет назад

      I'm not sure. I'm really not an expert on cluster analysis. Those numbers just "feel" right, which I realize is not very scientific of me. I guess they feel right because they are practically useful - i.e., clusters of those sizes are usable in subsequent analyses and cluster ratios of that proportion break the data up into roughly equivalent groups.

    • @Thanh-ThaoTPham
      @Thanh-ThaoTPham 7 лет назад

      Thanks so much for your reply. Anw, I really love your tutorial series ^^

  • @Byzantic
    @Byzantic 10 лет назад

    I get 'predictor importance' instead of 'variable importance'. Is there a difference?

  • @shahzadfarid6446
    @shahzadfarid6446 7 лет назад

    Sir, Please upload detail lectures on Optimal scaling in SPSS (i.e. MCA, CATPCA and non-linear canonical correlation). These lectures are not available on RUclips. I searched in your channel , with the hope ... , but unfortunately ....

    • @Gaskination
      @Gaskination  7 лет назад

      I have never done those, so I cannot make videos on them. Any time I learn a new analysis, I make a video for it. If I ever have occasion to do these, I'll make videos for them. Best of luck to you.

  • @MrMustav
    @MrMustav 11 лет назад

    Great!

  • @JustMe-pt7bc
    @JustMe-pt7bc 11 лет назад

    good inspiration for something new'!!!

  • @xiaoyanggong2006
    @xiaoyanggong2006 8 лет назад

    Thanks!

  • @Sari2024m
    @Sari2024m 5 лет назад

    I think you treat categorical variables as continuous which are categorical.

  • @nassimfard867
    @nassimfard867 9 лет назад

    tnx for the videos. Can you please tell me if a set of data can be clsutered only by one variable? and if yes is the two-step cluster more probable or the k-mean clustering? I want to categorize a set of data based on one variable in to three groups and i don't know how to define the cut-off or range for each categorie. I would be glad if you can help me

    • @Gaskination
      @Gaskination  9 лет назад +1

      Nassim Fard If it is just one variable, then clustering algorithms won't help. If the variable is categorical, then just group them based on the category values. For example, if the variable is religion, then group them by which religion they affiliate with. If the variable is continous or ordinal, then make logical cutoff points into low, med, high.

  • @MrNicks86
    @MrNicks86 10 лет назад

    Thanks for the great video - very useful! I was just wondering if you could explain (in a nutshell) the difference between this Two-Step cluster analysis and k-means? Thanks

    • @Gaskination
      @Gaskination  10 лет назад +1

      The main difference is that two-step allows you to distinguish between categorical and continuous variables, and it processes them differently. Whereas k-means just treats them all the same. So, if you have categorical variables, two-step would be a more accurate clustering.

    • @MrNicks86
      @MrNicks86 10 лет назад

      Thanks for your reply. So with continuous data like domestic energy use, would k means be more appropriate? And is it right to say that k means treats each variable as independent to the next, which in the case of domestic energy use is not quite the case?
      Many thanks again!

    • @Gaskination
      @Gaskination  10 лет назад

      Nicholas Samson Unfortunately, I'm not an expert in cluster analyses. So your question surpasses my immediate knowledge. I would just have to look it up. I know that there are some good documents and articles that discuss the differences between two-step and k-means. I just googled it. Best of luck to you.

    • @MrNicks86
      @MrNicks86 10 лет назад

      Thanks James!

  • @Gaskination
    @Gaskination  11 лет назад

    That's just the way the cookie crumbles... Sometimes our theories work, sometimes they do not. Back to the drawing board...

  • @AdrienneDequina
    @AdrienneDequina 8 лет назад

    thanks a lot! i will use this in my phd die-ssertation lol

  • @larasgilangrahmany289
    @larasgilangrahmany289 2 года назад

    Hi, sir. I hope you are doing well and have a wonderful holiday and merry christmas sir! I want to ask a question related to steps of doing two step cluster : Do we have to use CF Tree first for PRECLUSTERING phase before doing the final clustering using BIC/AIC? I really hope you can answer me this time sir :') thank you so much

    • @Gaskination
      @Gaskination  2 года назад

      I do not think so. You should be able to jump straight to BIC/AIC as long as the solution has some face validity.

    • @larasgilangrahmany289
      @larasgilangrahmany289 2 года назад

      @@Gaskination thank you! can i know what factors that effect some variable can contribute to create the cluster? im talking about the purple table in predictor importance

    • @Gaskination
      @Gaskination  2 года назад

      @@larasgilangrahmany289 It is determined by the shared variance among all variables, and can be influenced strongly by discrete values, such as binary (e.g., single/married) or multinomial (e.g., age group: child, young adult, adult, middle-aged, post-middle-aged).

    • @larasgilangrahmany289
      @larasgilangrahmany289 2 года назад

      @@Gaskination Thank you so much, sir! :') thanks a lott

    • @larasgilangrahmany289
      @larasgilangrahmany289 2 года назад

      and it is also determined by how various the responses choose the options right? I observe, if all of the cluster choose almost the same option (ex: woman), its less than 0.5, but if each cluster choose different options (ex : woman and man), the value will be more than 0.5. Is it right?

  • @gs19921
    @gs19921 8 лет назад

    Thank you for this video I have done 4 different kmeans clustering and I need a method that choose the best clusteranalyses.Can I do it with twosteps or another method?

    • @Gaskination
      @Gaskination  8 лет назад

      +gs19921 Two step will provide a "fit" measure to let you know if the clustering solution was good. You can also examine the AIC (try to minimize it).

  • @JessicaRodrigues-wz3xo
    @JessicaRodrigues-wz3xo 7 лет назад

    Hi! How can I choose variables that are significant to use on it? There´s a statistical test to help? I have a lot of variables and I wanna know how I should choose them, if it has a criteria.

    • @Gaskination
      @Gaskination  7 лет назад

      Usually it is chosen theoretically, rather than statistically.

    • @JessicaRodrigues-wz3xo
      @JessicaRodrigues-wz3xo 7 лет назад

      Thank you for responding! I have several variables to draw a social and demographic profile of my population. Theoretically all these variables are important, but when I do the analysis with all of them, the results are not good. In other versions of SPSS there was a cut in those variables, a critical value, but I do not know how to identify this in SPSS 22. Can you help me, please?

    • @Gaskination
      @Gaskination  7 лет назад

      Jéssica Rodrigues you can look at the cluster quality or at the variable importance graph. These will give you indications of the overall value of the variables for clustering into groups.

  • @educationalconsultant9880
    @educationalconsultant9880 4 года назад

    Can I use cluster analysis in step wise classification like first classify asymptomatic and symptomatic , then in asymptomatic classify in terms of symptoms? ??

    • @Gaskination
      @Gaskination  4 года назад +1

      I think it should be possible. You could do the classification and save cluster membership number. Then, filter the dataset so that not all rows remain, but only remain those that are part of asymptomatic clusters. Then, cluster again to see if they cluster by symptom. Another route would be to just use evaluation variables in the two-step clustering. These variables aren't used to determine membership in clusters, but each cluster is evaluated post-hoc by symptoms.

    • @educationalconsultant9880
      @educationalconsultant9880 4 года назад

      @@Gaskination Thank you very much for your reply

  • @TheAnthology09
    @TheAnthology09 12 лет назад

    Thank you very much for the video. I have a specific question about using cluster analysis in my data. Can I contact you via email?

  • @olfabenarfa3790
    @olfabenarfa3790 10 лет назад

    Very informative video and extremely helpful as usual. I have only one concern is that when I did it the first time it gave me 3 groups, I ran it again it gave me 2 groups,…I did it many times and I noticed that the results are not stable! How come that the same steps and same algorithm gave different results! Did anyone face this issue with the two steps cluster analysis? Thanks.

    • @Gaskination
      @Gaskination  10 лет назад

      That is bizarre... I'm not sure what would be causing that. It should be the same every time I think.