Thank you very much for your kind message! Yes, the factoextra package makes it easy to create an elbow plot. Glad to hear you find the video helpful. Kind regards
You're welcome. You may need to do some data preprocessing to apply k-means to an analysis with likert scale data. You'll have to first apply one-hot encoding so each response/category becomes a binary variable (0 or 1) and then normalize the data to have a mean of 0 and a standard deviation of 1. However note that K-means clustering uses Euclidean distance and assumes that distances between points are meaningful and comparable. This may not be appropriate for likert scale data since likert scale data is ordinal and the distances between responses may not be consistent, so you may consider alternative clustering techniques that are more suited to ordinal data, such as hierarchical clustering or model-based clustering approaches
Great video, I learned a lot from it, especially in regards to the methods for choosing the optimal number of clusters. Quick question though, the clusters overlap in your plot, but I don't think that they are supposed over lat in the Kmeans method. Do you have any insight on this?
Thanks for your kind comment. The clusters were created using 6 variables. The plots only show 2 variables at a time (2-dimensional plots) so some overlap can be seen. If it were possible to create a 6-dimensional plot then there would be not overlap
when im execute fviz_nbclust, this happening: Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning messages: 1: In stats::dist(x) : NAs introduced by coercion 2: In storage.mode(x)
It may be because of NAs, kmeans cannot handle data that has NA values. See: stackoverflow.com/questions/36469671/error-in-do-onenmeth-na-nan-inf-in-foreign-function-call-arg-1
Hi. That question is not clear. It may mean that from a given dataset select 20 observations (rows) randomly and perform a cluster analysis, or it may mean something else
@@data.ninjas Here is the full question What is 20? Cluster analysis for 20 randomly selected Swiss bank dataset with following requirements 1. Set pseudo random numbers for 20 randomly selected data points 2.write about accuracy, missing values and outliers 3. what is the rationale for selecting a k-means clustering and with a distance function 4. interpret and make comment on clustering output 5. is cluster analysis technique used for dataset is good? Use cluster evaluation 6. visualize 20 selected datapoints by plotting the result of principal components
@@kharankumarr2119 There may not be an implementation of cure algorithm in R yet (or at least I have not found any). There is a Python implementation for cure: github.com/annoviko/pyclustering You may run cure in Python, or you may use the reticulate package in R to work with Python in R rstudio.github.io/reticulate/
Get access to download the scripts and data from GoogleDrive: dataninjas.ck.page/yt-files
I have gained a comprehensive understanding of this topic, and sir, your explanations have been exceedingly clear to me.
Thank you very much for your kind message. I'm happy to hear that you find my video helpful. Best regards
Wow. Great tutorial. Have seen many videos for generating “elbow plot”, but using the factoextra package as you noted here is GOLDEN! Thanks!!
Thank you very much for your kind message! Yes, the factoextra package makes it easy to create an elbow plot. Glad to hear you find the video helpful. Kind regards
You have no idea.. how u helped me.... God Bless!!
You're very welcome. Glad to know you find the video helpful. Kind regards
You are a lifesaver, thank you so much for the tutorial!
You're very welcome! Thank you for watching my video
Thank you for making this video!
It was very informative and helpful
Glad to hear you found the video helpful! Thanks for your kind comment
many thanks for this sufficient illustration,, really thanks
You're very welcome, thank you for watching my video
Thank you for the video, and also thank you for the scripts!
You're very welcome! Thank you for watching and for commenting on my video
Aware of your contributions greetings from Mexico
Thank you very much. Best regards
Wow. Very clear and precise. Thanks
Thanks for your kind comment
Why choose center value of 3 in kmeans function? please explain help me
Thank you sir. Can means be applied to analysis with likert scale data?
You're welcome. You may need to do some data preprocessing to apply k-means to an analysis with likert scale data. You'll have to first apply one-hot encoding so each response/category becomes a binary variable (0 or 1) and then normalize the data to have a mean of 0 and a standard deviation of 1. However note that K-means clustering uses Euclidean distance and assumes that distances between points are meaningful and comparable. This may not be appropriate for likert scale data since likert scale data is ordinal and the distances between responses may not be consistent, so you may consider alternative clustering techniques that are more suited to ordinal data, such as hierarchical clustering or model-based clustering approaches
@@data.ninjas thank you. I think hierarchical will be good
fviz_nbclust(data,kmeans,method='wss' cannot be working why
Great video, I learned a lot from it, especially in regards to the methods for choosing the optimal number of clusters. Quick question though, the clusters overlap in your plot, but I don't think that they are supposed over lat in the Kmeans method. Do you have any insight on this?
Thanks for your kind comment. The clusters were created using 6 variables. The plots only show 2 variables at a time (2-dimensional plots) so some overlap can be seen. If it were possible to create a 6-dimensional plot then there would be not overlap
Hi, how can I find this data on the internet? or How can I have access to explanation about dataset?
when im execute fviz_nbclust, this happening: Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In stats::dist(x) : NAs introduced by coercion
2: In storage.mode(x)
It may be because of NAs, kmeans cannot handle data that has NA values. See: stackoverflow.com/questions/36469671/error-in-do-onenmeth-na-nan-inf-in-foreign-function-call-arg-1
Does cluster analysis have to start with a multicollinearity test?
No, it does not. Multicollinearity does not directly influence the cluster analysis results
Hi i have a question
Perform a cluster analysis for 20 randomly selected Swiss bank notes.
What is 20 in this case?
Hi. That question is not clear. It may mean that from a given dataset select 20 observations (rows) randomly and perform a cluster analysis, or it may mean something else
@@data.ninjas
Here is the full question
What is 20?
Cluster analysis for 20 randomly selected Swiss bank dataset with following requirements
1. Set pseudo random numbers for 20 randomly selected data points
2.write about accuracy, missing values and outliers
3. what is the rationale for selecting a k-means clustering and with a distance function
4. interpret and make comment on clustering output
5. is cluster analysis technique used for dataset is good? Use cluster evaluation
6. visualize 20 selected datapoints by plotting the result of principal components
@@HarpreetKaur-bx1ej The first interpretation was correct. Select 20 rows (data points) from the dataset randomly
@@data.ninjas it means I have to take nstart=20?
Can you please help me in this question as am stuck in it
why my mutate function is not working
Is this Cure algorithm
The kmeans() function in R uses the Hartigan-Wong algorithm by default. Other options are the Lloyd, Forgy and MacQueen algorithms
@@data.ninjas Sir now I need cure algorithm R programming code
Can you please give me your mail id
@@kharankumarr2119 There may not be an implementation of cure algorithm in R yet (or at least I have not found any). There is a Python implementation for cure: github.com/annoviko/pyclustering You may run cure in Python, or you may use the reticulate package in R to work with Python in R rstudio.github.io/reticulate/
@@data.ninjas sir it is a project for us to do it in R programming i am data analytics student of psgcas
that one sameple no.79 made me feel very unsatisfied ..
Please provide your mail
please can you help me i need your email?