Cluster analysis

Поделиться
HTML-код
  • Опубликовано: 2 дек 2024

Комментарии • 103

  • @Margoth195
    @Margoth195 2 года назад

    Sir, you are a saint! Thank you thank you thank you!!! Not only did you make this easy but you gave me peace of mind. If we ever meet in person, I hope you will give me the honor of buying you a drink.

  • @oberoiHimanshu
    @oberoiHimanshu 5 лет назад +12

    Best Video i ever saw on clustering algorithms. Great Work. Thanks for posting!

  • @yashgourav
    @yashgourav 3 года назад

    just to like this video and add a comment, I logged in from my google account.. awesome work Hefin... really appreciate your efforts.. :)

  • @WarmHeartedP
    @WarmHeartedP 6 лет назад +1

    Very helpful! I've found the k-means and hierarchical algorithms most useful for my specific data. Thumbs up for this video!

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Thanks! I'm glad you've been able to apply the algorithms and get meaningful results from your data.

  • @jackpumpunifrimpong-manso6523
    @jackpumpunifrimpong-manso6523 4 года назад +1

    Wonderful! I'm impressed. You're very bright! God has used you to bless me. Thank you!
    Keep on making more videos. Congrats & Cheers!

  • @liviaaraujo94
    @liviaaraujo94 6 лет назад

    Please do not stop posting videos! Congratulations on the explanations. Brazil here

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Obrigada! Estou feliz que você tenha gostado!

  • @gssmytube
    @gssmytube 2 года назад

    Dr Hefin Rhys your sessions on clustering is truely amazing and well explained thank you bring some on PAM and CLARA

  • @Insanesibak
    @Insanesibak 5 лет назад +3

    This is a great video on Clustering. Thank you for putting it together.

  • @Legogostar456
    @Legogostar456 3 года назад +2

    wow! very well explained, thank you so much.
    I appreciate the details and the beginner-friendliness of your tutorial!

  • @sophielong8937
    @sophielong8937 3 года назад

    hi, i keep getting the error:
    " Error in xy.coords(x, y, xlabel, ylabel, log) :
    'x' and 'y' lengths differ"
    when i try and plot the graph to see the number of k with this code -
    plot(1:10, betweenss_totss, type = "b",
    + ylab = "Between SS / Total SS", xlab = "Clusters (k)")
    Do you know how i can solve this? I have tried looking online and have found how to make the x and y axis the same, however for this graph we don't need them to be the same. Any information you could spare would be great!!

    • @hefinrhys8572
      @hefinrhys8572  3 года назад +1

      Hi Sophie, the first two arguments to the plot() function need to be vectors representing the x and y axes, respectively. At the moment, you only supply a vector for the x axis (1:10) that only contains values 1 through 10. What are you trying to plot here? The easiest way to plot the clusters is to create a new column in the data.frame indicating cluster membership, plotting two variables against each other and colouring by this cluster variable.

  • @gabrieleinguglia2314
    @gabrieleinguglia2314 5 лет назад

    Really congratulation for how you made these tutorials! They are really clear and helpful! Thank you

  • @nausheenfatima6523
    @nausheenfatima6523 4 года назад +3

    So clear and concise. Thank you!

  • @mikeybratkovic
    @mikeybratkovic 2 года назад

    Wow! Absolute great video, well explained + really helpful tipps and tricks! Thank you for that!

  • @muhammedwalugembe7142
    @muhammedwalugembe7142 4 года назад

    Best video and explanation on clustering

  • @StockSpotlightPodcast
    @StockSpotlightPodcast 4 года назад +1

    Fantastic video! The only thing that would have been nice to see is how do you take these clustering solutions and create a new column with the cluster number for each observation that could then be exported to Excel and used to create a PPT slide. This would have really been helpful for work situations.

  • @reshmirajeev5770
    @reshmirajeev5770 4 года назад +1

    Helloo..deah.. how to find a gene cluster of an secondary metabolite..if the genome of the organism is not sequenced?can you help me?

  • @amourlafleur1762
    @amourlafleur1762 2 года назад

    You are a great teacher. thanks MILLION

  • @thejuhulikal6290
    @thejuhulikal6290 3 года назад +1

    Thank you for the information, please help me to understand how the results of PCA is used in clustering, because both the videos of PCA and Clustering are found not continued, please help us if we want to continue clustering with results of PCA how we should do that..again thank you for detailed information

    • @hefinrhys8572
      @hefinrhys8572  3 года назад +2

      If you wish to cluster the results of a PCA, simply select the number of principal components you wish to retain (the first few that explain most of the variance), and use the data projected onto these components as the input data to your clustering algorithm. But I would suggest you compare the clusters based on the original data, and the clusters based on the reduced dimensional data, to see which performs better.

    • @thejuhulikal6290
      @thejuhulikal6290 3 года назад

      Thank u so much

  • @user-xc9ih8gv4h
    @user-xc9ih8gv4h 5 лет назад +1

    This is high quality content. Thanks.

  • @nolevel433
    @nolevel433 2 года назад

    Wonderful explanation. Thank you so much!!!!

  • @Olivia-rd6ce
    @Olivia-rd6ce 3 года назад

    Is there a way that you could figure out what metrics contribute to the hierarchical break?

  • @edwart83
    @edwart83 4 года назад +1

    At 30.37, in theory lower BIC is better the model fits and not like you said.

    • @hefinrhys8572
      @hefinrhys8572  4 года назад

      Nice spot. Using usual definition of BIC yes, lower is better. The mclust package rearranges the equation to be:
      BIC = L - 0.5 * p * ln(n)
      Such that the expression should be maximised instead of minimised.

    • @edwart83
      @edwart83 4 года назад

      @@hefinrhys8572 Ok i didn't know that definition , maybe they need to not call it BIC to not confuse the people. This part is for the people that read this reply and maybe don't know what we are talking: BIC=k ln(n)-2ln(L), k is the number of parameter estimated by the model, L is the maximized value of the likelihood function of the model and n the number of observations (sample size). Lower BIC is, less information we lose.

  • @NAVEENKUMARS12
    @NAVEENKUMARS12 4 года назад

    Very good introductions to clustering!!

  • @blaeandblack547
    @blaeandblack547 3 года назад

    Best on the web re clustering, thanks.

  • @hannahredders4442
    @hannahredders4442 4 года назад

    How can I perform cluster analysis on data that I specify as survey data first (with svydesign)? thanks!

  • @hitoshinishizawa1868
    @hitoshinishizawa1868 3 года назад

    Thank you for the wonderful tutorial!!! I have one question. Is there anyway to create different data frames based on the clusters identified? For instance, having a new column for 'cluster' and have the cluster # for each row. I am doing the following but not sure if it is right.
    kc

  • @albertcardoso1383
    @albertcardoso1383 6 лет назад +1

    What a brilliant tutorial, thank you!

  • @Rafael.a.f
    @Rafael.a.f 3 года назад

    Is there a clustering method indicated to a certain number of variables.? I have in my study 32 variables and I'm thinking that perhaps it would be a specific procedure to that much of variables. thanks for sharing you knowledge.

  • @tomsteffen1882
    @tomsteffen1882 4 года назад

    Do you know about Wards-Method when doing cluster Analysis?

  • @s.m.m8006
    @s.m.m8006 4 года назад

    Thank you for this great video , but if I need to use GMM clustering algorithm with them . may you help me to do that plz ?

  • @jacobthomsen2248
    @jacobthomsen2248 3 года назад

    Incredible, so well explained, thanks!

  • @rounakagarwal2134
    @rounakagarwal2134 4 года назад

    Very Nice Explanation.
    Learnt a lot of things :)

  • @charithkrish
    @charithkrish 4 года назад

    Hi sir, im new to R as im from a different background, is clustering available for Panel data?

  • @parth1211
    @parth1211 2 года назад

    Ty for the quality content brother , I am beginner that's been very helpful can you please provide more videos 🙂 thankyou

  • @lingzhao242
    @lingzhao242 4 года назад

    You are amazing! really helpful tutorials! Thank you!

  • @angushenderson2020
    @angushenderson2020 3 года назад

    ❤️ Thank you so much! this was brilliant!

  • @leandanielvillareal3352
    @leandanielvillareal3352 4 года назад +1

    This is the best video I have watched about cluster analysis. I subscribed immediately after watching this. Here's my question.
    I don't know if I skipped this in the video but how do I extract the vector containing the specific cluster each observation belongs to? I tried doing the model-based method.

    • @hefinrhys8572
      @hefinrhys8572  4 года назад +2

      I didn't show this in the video sorry, but you can extract the vector of most probable clusters by accessing the $classification component of your mclust model. If you want more detailed information, i.e. the matrix of probabilities for each datum belonging to each class, extract the $z component. In R, it's always useful to call str() on your model objects so you can understand and inspect their structure. Using ?Mclust and reading the Value section, also shows you what each component of the model object means.

    • @leandanielvillareal3352
      @leandanielvillareal3352 4 года назад

      @@hefinrhys8572 oh okay thanks. I didn't think of using ?Mclust. I'm going to use your videos as reference. Very informative and understandable. Thanks again.

  • @topfundus1093
    @topfundus1093 4 года назад

    Hallo Hefin,
    wieder ein großartiges Tutorial. Erlaube eine Frage: Wie kann ich zu einem Wert (oder Wertepaar x, y) aus dem Datensatz das zutreffende Cluster zuordnen?
    Eindimensionales Beispiel:
    Werte
    1 - 10 = Cluster 1
    11 - 20 = Cluster 2
    21 - 30 = Cluster 3
    Zu welchem Cluster gehört Var = 19?
    Antwort: Var (=19) gehört zum Cluster 2.
    Wie berechne ich das mit kmeans bzw. R (ggplot2)?
    Hello Hefin,
    another great tutorial. Allow a question: How can I assign the appropriate cluster to a value (or value pair x, y) from the data set?
    One-dimensional example:
    values
    1 - 10 = cluster 1
    11 - 20 = cluster 2
    21 - 30 = cluster 3
    Which cluster does Var = 19 belong to?
    Answer: Var (= 19) belongs to cluster 2.
    How do I calculate this with kmeans or R (ggplot2)?

  • @nssSmooge
    @nssSmooge 6 лет назад

    I am just curious, how would I go about removing a trend in my data and then using all the observation into clustering. I have several countries and several variables but observed over time [annually]. The bad thing is that picking a year to do clustering is an option but not that much great since explanation can be fault if its based only on one year - if I am not mistaken xD

    • @hefinrhys8572
      @hefinrhys8572  5 лет назад +1

      Hi Wildfox, sorry for the late reply. So I presume you are looking for clusters of countries? One approach would be to model the relationship between time and a dependent variable (if you have one) for each country using a linear model. Then, use the estimated marginal means (also called least squares means) to cluster the countries. Estimated marginal means are the predicted means of variables in a linear model after accounting for all the other variables (i.e. after removing the effect of time).

  • @ejiet-igolatemmanuel7376
    @ejiet-igolatemmanuel7376 4 месяца назад

    Excellent study material. BUT BECAME BLURRED FROM 6.45 TO ABOUT 16.05 MINUTES. Please how can I study the entire video clearly? Thank you for such a summary.

  • @fernandoguerrerozurita4716
    @fernandoguerrerozurita4716 5 лет назад

    What a very usefull tutorial video!!! Thank you so much!!

  • @djangoworldwide7925
    @djangoworldwide7925 Год назад

    Fantastic. Subscribed!

  • @muhammadazharnadeem2682
    @muhammadazharnadeem2682 4 года назад

    Excellent Job. Please provide the sample data file publically so that we can arrange our data easily. Thanks in advance.

  • @OgunCakr
    @OgunCakr 6 лет назад

    Great explanation 👍 you have such a soothing voice btw 😘

  • @vishnunath1524
    @vishnunath1524 6 лет назад

    Thank you for this excellent tutorial !

  • @tarkatirtha
    @tarkatirtha 3 года назад

    Great video!

  • @johnnybravo86
    @johnnybravo86 6 лет назад +1

    absolutely amazing!! thank you!. please do more. i'm trying to learn r to do CFA to fit theory of planned behavior models. can you do one on this?

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      You're very welcome. I'm sorry for the late reply to this; I'm not familiar with theory of planned behaviour models, but if you describe the problem you're trying to solve, I may be able to help.

  • @nikhiljamisetti7139
    @nikhiljamisetti7139 6 лет назад

    CAN ANYBODY TELL ME HOW TO FIND THE DEFECT CLUSTERS IN THE ABOVE DATASET

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Hi Nikhil, I'm not quite sure I understand what you are trying to do. Do you mean defect clustering in application testing? This isn't a statistical computing application per se, but here is some information which may be of use: www.pitsolutions.ch/blog/defect-clustering-and-pesticide-paradox/

  • @jaaadeeeeful
    @jaaadeeeeful 4 года назад

    Anyone else thinks his voice and accent sound like the man at Headspace? I cannot help deepening my breath when learning this...

  • @ghdoia
    @ghdoia 5 лет назад

    can you provide me with the data you used (iris) ?

    • @hefinrhys8572
      @hefinrhys8572  5 лет назад +1

      Hi ghdoia, so R and its packages come with datasets built in. To list them all, simply run data(), to load one (such as the iris dataset), run data(iris). Then, you can access the iris dataset by referring to it by name. I hope that helps.

    • @ghdoia
      @ghdoia 5 лет назад

      @@hefinrhys8572
      thank you very much. I learnt alots from you.
      I've just had some of issue when I did run the below code :
      plot(1:10, betweenss_totalss, type = "b",
      ylab = "Between SS/Total SS", xlab = "Cluster(K)") as it
      shows this message below;
      Error in plot.window(...) : need finite 'ylim' values
      In addition: Warning messages:
      1: In min(x) : no non-missing arguments to min; returning Inf
      2: In max(x) : no non-missing arguments to max; returning -Inf
      Please can you tell what is that mean and how i can solve this problem?

  • @EUrunner
    @EUrunner 4 года назад

    Thanks, you've helped me a lot 👍

  • @123gregery
    @123gregery 6 лет назад

    Congratulations for your videos. One question: how can I get the means of the columns of irisScaled? (irisScaled is a list).

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Thank you! Sorry for the late reply. So irisScaled is a matrix (which I think I omitted to mention in the video), and a succinct way to get the means of each column would be to use the apply() function:
      apply(irisScaled, 2, mean)
      where the first argument is the data, the second is an index (either 1 to iterate over rows or 2 for columns) and the third argument is the function to apply. You could also find the mean for each column individually like this:
      mean(irisScaled[, 1])

  • @delt19
    @delt19 6 лет назад

    is there an easy way to output the Mclust cluster assignments to a csv file?

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Sorry for the slow reply to this. I hope you worked it out, but I would extract the assigned groups using something like this:
      # ADD NEW COLUMN TO DATAFRAME WITH MCLUST GROUP ASSIGNMENTS
      irisScaled$Group

  • @hannahredders4442
    @hannahredders4442 4 года назад

    another question: what's a recommended number of variables for a cluster analysis?

    • @navjotsingh2251
      @navjotsingh2251 11 месяцев назад

      Honestly, it depends on what you are clustering. But there is the risk of too many variables which will cause problems clustering, I think this falls down to trial and error also using domain knowledge of what variables to include.

  • @AnaCvejic
    @AnaCvejic 3 года назад

    You help me :) Tnx for video

  • @rima3088
    @rima3088 4 года назад

    Big thumbs up ! Huge thumbs up... Really

  • @hersil2012
    @hersil2012 6 лет назад

    Very well explained! Thanks

  • @sahaywrestling5497
    @sahaywrestling5497 5 лет назад

    Thank you very much, extremely helpful

  • @SachinSingh-wr5yv
    @SachinSingh-wr5yv 6 лет назад

    Thank you sir for the videos...!!!!!

  • @Pankajjadwal
    @Pankajjadwal 6 лет назад

    Can I have the code, please?

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Hi Pankal, sorry for the late reply. I have now added a link to the R script in the video description above.

  • @priyanwadaatapattu5900
    @priyanwadaatapattu5900 6 лет назад

    > betweenss_totss

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Hi Priyanwada, I'm reading this on my phone so cannot check in R right now, but it looks like you have 'list()' in front of 'for(i in 10)'. This 'list()' function is empty and I'm not sure you need it anyway.

  • @sahilraihan6247
    @sahilraihan6247 6 лет назад

    Nice One ! really helpful.
    Thanks Hefin for such excellent video. can you please upload some video on Time series.

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      Thanks Sahil! Glad it was useful. Thanks for the feedback, I may do a video in time series in the future!

  • @anooshmitadas3597
    @anooshmitadas3597 6 лет назад

    It helped a lot. well done.

  • @rakeshdayalan8049
    @rakeshdayalan8049 6 лет назад

    well explained video ,thanks!

  • @lazregmohamedlamine4847
    @lazregmohamedlamine4847 6 лет назад

    thanks for the video , well done.

    • @hefinrhys8572
      @hefinrhys8572  6 лет назад

      You're very welcome. Happy clustering!

  • @lifeatjis
    @lifeatjis 6 лет назад

    thank you! super helpful!

  • @sophielong8937
    @sophielong8937 3 года назад

    BTW GREAT video!!

  • @nerilozanosangabriel2391
    @nerilozanosangabriel2391 3 года назад

    suscrito buen video

  • @shaghayeghsoudi18
    @shaghayeghsoudi18 3 года назад

    very nice tutorial, thanks a lot