How to perform clustering in R with the k-means algorithm - R for Data Science

Поделиться
HTML-код
  • Опубликовано: 10 дек 2024

Комментарии • 48

  • @data.ninjas
    @data.ninjas  3 года назад +1

    Get access to download the scripts and data from GoogleDrive: dataninjas.ck.page/yt-files

  • @amiyabasak7096
    @amiyabasak7096 Год назад +1

    I have gained a comprehensive understanding of this topic, and sir, your explanations have been exceedingly clear to me.

    • @data.ninjas
      @data.ninjas  Год назад

      Thank you very much for your kind message. I'm happy to hear that you find my video helpful. Best regards

  • @juanbautista6766
    @juanbautista6766 2 года назад +1

    Wow. Great tutorial. Have seen many videos for generating “elbow plot”, but using the factoextra package as you noted here is GOLDEN! Thanks!!

    • @data.ninjas
      @data.ninjas  2 года назад +1

      Thank you very much for your kind message! Yes, the factoextra package makes it easy to create an elbow plot. Glad to hear you find the video helpful. Kind regards

  • @snehaj3378
    @snehaj3378 Год назад

    You have no idea.. how u helped me.... God Bless!!

    • @data.ninjas
      @data.ninjas  Год назад

      You're very welcome. Glad to know you find the video helpful. Kind regards

  • @gabrielp.40
    @gabrielp.40 6 месяцев назад

    You are a lifesaver, thank you so much for the tutorial!

    • @data.ninjas
      @data.ninjas  6 месяцев назад

      You're very welcome! Thank you for watching my video

  • @AchiragChiragg
    @AchiragChiragg 11 месяцев назад

    Thank you for making this video!
    It was very informative and helpful

    • @data.ninjas
      @data.ninjas  11 месяцев назад

      Glad to hear you found the video helpful! Thanks for your kind comment

  • @DaliaAboelmakarm-un9ee
    @DaliaAboelmakarm-un9ee 4 месяца назад

    many thanks for this sufficient illustration,, really thanks

    • @data.ninjas
      @data.ninjas  4 месяца назад

      You're very welcome, thank you for watching my video

  • @johneagle4384
    @johneagle4384 2 года назад

    Thank you for the video, and also thank you for the scripts!

    • @data.ninjas
      @data.ninjas  2 года назад

      You're very welcome! Thank you for watching and for commenting on my video

  • @JorgeRodriguez-mp1mt
    @JorgeRodriguez-mp1mt 3 года назад

    Aware of your contributions greetings from Mexico

    • @data.ninjas
      @data.ninjas  3 года назад +1

      Thank you very much. Best regards

  • @thelightofgod9151
    @thelightofgod9151 2 года назад

    Wow. Very clear and precise. Thanks

    • @data.ninjas
      @data.ninjas  2 года назад

      Thanks for your kind comment

  • @lehoangucduy1425
    @lehoangucduy1425 Год назад +1

    Why choose center value of 3 in kmeans function? please explain help me

  • @Lilian.Chidinma.Nwafor
    @Lilian.Chidinma.Nwafor 6 месяцев назад

    Thank you sir. Can means be applied to analysis with likert scale data?

    • @data.ninjas
      @data.ninjas  6 месяцев назад +1

      You're welcome. You may need to do some data preprocessing to apply k-means to an analysis with likert scale data. You'll have to first apply one-hot encoding so each response/category becomes a binary variable (0 or 1) and then normalize the data to have a mean of 0 and a standard deviation of 1. However note that K-means clustering uses Euclidean distance and assumes that distances between points are meaningful and comparable. This may not be appropriate for likert scale data since likert scale data is ordinal and the distances between responses may not be consistent, so you may consider alternative clustering techniques that are more suited to ordinal data, such as hierarchical clustering or model-based clustering approaches

    • @Lilian.Chidinma.Nwafor
      @Lilian.Chidinma.Nwafor 6 месяцев назад

      @@data.ninjas thank you. I think hierarchical will be good

  • @letsfly8654
    @letsfly8654 9 месяцев назад

    fviz_nbclust(data,kmeans,method='wss' cannot be working why

  • @Pooh991
    @Pooh991 2 года назад

    Great video, I learned a lot from it, especially in regards to the methods for choosing the optimal number of clusters. Quick question though, the clusters overlap in your plot, but I don't think that they are supposed over lat in the Kmeans method. Do you have any insight on this?

    • @data.ninjas
      @data.ninjas  2 года назад

      Thanks for your kind comment. The clusters were created using 6 variables. The plots only show 2 variables at a time (2-dimensional plots) so some overlap can be seen. If it were possible to create a 6-dimensional plot then there would be not overlap

  • @aysegulgunduz4292
    @aysegulgunduz4292 2 года назад

    Hi, how can I find this data on the internet? or How can I have access to explanation about dataset?

  • @rafipermana7734
    @rafipermana7734 9 месяцев назад

    when im execute fviz_nbclust, this happening: Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning messages:
    1: In stats::dist(x) : NAs introduced by coercion
    2: In storage.mode(x)

    • @data.ninjas
      @data.ninjas  9 месяцев назад

      It may be because of NAs, kmeans cannot handle data that has NA values. See: stackoverflow.com/questions/36469671/error-in-do-onenmeth-na-nan-inf-in-foreign-function-call-arg-1

  • @anteachmad
    @anteachmad Год назад

    Does cluster analysis have to start with a multicollinearity test?

    • @data.ninjas
      @data.ninjas  Год назад +1

      No, it does not. Multicollinearity does not directly influence the cluster analysis results

  • @HarpreetKaur-bx1ej
    @HarpreetKaur-bx1ej 2 года назад

    Hi i have a question
    Perform a cluster analysis for 20 randomly selected Swiss bank notes.
    What is 20 in this case?

    • @data.ninjas
      @data.ninjas  2 года назад

      Hi. That question is not clear. It may mean that from a given dataset select 20 observations (rows) randomly and perform a cluster analysis, or it may mean something else

    • @HarpreetKaur-bx1ej
      @HarpreetKaur-bx1ej 2 года назад

      @@data.ninjas
      Here is the full question
      What is 20?
      Cluster analysis for 20 randomly selected Swiss bank dataset with following requirements
      1. Set pseudo random numbers for 20 randomly selected data points
      2.write about accuracy, missing values and outliers
      3. what is the rationale for selecting a k-means clustering and with a distance function
      4. interpret and make comment on clustering output
      5. is cluster analysis technique used for dataset is good? Use cluster evaluation
      6. visualize 20 selected datapoints by plotting the result of principal components

    • @data.ninjas
      @data.ninjas  2 года назад

      @@HarpreetKaur-bx1ej The first interpretation was correct. Select 20 rows (data points) from the dataset randomly

    • @HarpreetKaur-bx1ej
      @HarpreetKaur-bx1ej 2 года назад

      @@data.ninjas it means I have to take nstart=20?

    • @HarpreetKaur-bx1ej
      @HarpreetKaur-bx1ej 2 года назад

      Can you please help me in this question as am stuck in it

  • @vishalisharma3883
    @vishalisharma3883 9 месяцев назад

    why my mutate function is not working

  • @kharankumarr2119
    @kharankumarr2119 3 года назад

    Is this Cure algorithm

    • @data.ninjas
      @data.ninjas  3 года назад

      The kmeans() function in R uses the Hartigan-Wong algorithm by default. Other options are the Lloyd, Forgy and MacQueen algorithms

    • @kharankumarr2119
      @kharankumarr2119 3 года назад

      @@data.ninjas Sir now I need cure algorithm R programming code

    • @kharankumarr2119
      @kharankumarr2119 3 года назад

      Can you please give me your mail id

    • @data.ninjas
      @data.ninjas  3 года назад

      @@kharankumarr2119 There may not be an implementation of cure algorithm in R yet (or at least I have not found any). There is a Python implementation for cure: github.com/annoviko/pyclustering You may run cure in Python, or you may use the reticulate package in R to work with Python in R rstudio.github.io/reticulate/

    • @kharankumarr2119
      @kharankumarr2119 3 года назад

      @@data.ninjas sir it is a project for us to do it in R programming i am data analytics student of psgcas

  • @what2605
    @what2605 4 месяца назад

    that one sameple no.79 made me feel very unsatisfied ..

  • @foziachoudhary9858
    @foziachoudhary9858 28 дней назад

    Please provide your mail

  • @mehrananjum5501
    @mehrananjum5501 8 месяцев назад

    please can you help me i need your email?