How to Perform K-Means Clustering in R Statistical Computing

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024

Комментарии • 95

  • @PsychTeacher100
    @PsychTeacher100 5 лет назад +2

    Helped me for my data analytics class. Very new to R and this step by step tutorial was wonderful. My assignment was actually to use the Iris data. I learned so much.

  • @joes9110
    @joes9110 8 лет назад +7

    You explained this in plain english, which I really appreciate!! Great video, thank you

  • @DWR447
    @DWR447 10 лет назад +9

    Great Work. I appreciate your use of the Iris data set. I'm familiar with the data set. That means I can focus on what you have to say about K-Means clustering without having to learn the details of a new data set. Thank you.

  • @TheSandyKale
    @TheSandyKale 8 лет назад

    Great video. Explained in an easy to understand manner, compared to some of the more cryptic R training material I have looked at.

  • @timothysorber5825
    @timothysorber5825 10 лет назад

    Excellent video. It provided simple steps to follow. I was working with the faithful dataset in the R distribution. Ones eye could see the two clusters. I applied your instruction but used the faithful data and was able to break out the two existing clusters. These are almost intuitive. Thanks

  • @juandavidcamargo5713
    @juandavidcamargo5713 3 года назад +1

    so easy to me, with your tutorial

  • @abuzarzia71
    @abuzarzia71 7 лет назад +3

    Error in plot.xy(xy, type, ...) : invalid color name 'Iris-setosa' thats the error which appears everytime....help me out with this

    • @samuelsephiri147
      @samuelsephiri147 6 лет назад

      I'm experiencing similar problem.Can anyone help with this,especially on the part when doing a plotting comparison with original data ?However if you use colRamp,it outputs the plot however colors disspears

  • @DCentFN
    @DCentFN 4 года назад

    When i try to check the actual data using "plot(Iris[c("petalLength","petalWidth")], col = Iris$class)" or "plot(Iris[c("sepLength","sepWidth")], col = Iris$class)", I get an error saying "Error in plot.xy(xy, type, ...) : invalid color name 'Iris-setosa'". Not sure how to fix this

  • @mayurgo10
    @mayurgo10 7 лет назад +1

    the codes aren't working for mine it shows error.Can you help me?

  • @mugrad25
    @mugrad25 8 лет назад

    quick questions. 1) would you standardize the data turning all responses into z-scores?
    2) cluster generation is not criterion based? As it stands now, the clustering is based on finding the greatest difference and similarities simultaneously based on the wanted number of clusters; the user then has to then compare the clustering results to response variables. the closest match infers a possible difference which is more than likely tested with an ANOVA to prove significance?

    • @souravdas1983
      @souravdas1983 8 лет назад +1

      use scaled_iris = scale(iris.features)

  • @WahranRai
    @WahranRai 4 года назад

    4:04 Why dont you normalize the data before kmeans ?
    Is there some rules concerning the range of attributs and extra relationship ...

  • @anishamariamthomas2981
    @anishamariamthomas2981 9 лет назад

    Which similarity measure is used here to perform the k-means clustering..??? And how this default measure can be changed in the Rstudio??

  • @abeersaxena3204
    @abeersaxena3204 6 лет назад

    what does cluster means describe for a particular observation? like we have to consider a whole row as a complete record. So , how to interpret the cluster means as coordinates of centroid or mean distance between centroid and data points?

  • @fauziardi4985
    @fauziardi4985 7 лет назад +1

    hello, can you help me? im new in using r. i follow your tutorial but i got an error.
    results = kmeans(Iris.features, 3)
    Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    In addition: Warning message:
    In kmeans(Iris.features, 3) : NAs introduced by coercion

    • @yasiruddin9896
      @yasiruddin9896 7 лет назад

      Initially, I too got this error. Then, I tried this and it worked fine.
      x

    • @subhashishkoirala9292
      @subhashishkoirala9292 7 лет назад

      although a bit late try this
      results

  • @alexmartino5949
    @alexmartino5949 9 лет назад

    Nice video. You said that the algorithm reinitializes during each run. What does that mean?

  • @meshackamimo1945
    @meshackamimo1945 10 лет назад

    God bless u for this ingenious n intuitive posting...keep up the good job. More videos please.

  • @BishSinhaExcelsior
    @BishSinhaExcelsior 9 лет назад +1

    Simple and very good for beginners.

  • @hanspratyaksa8936
    @hanspratyaksa8936 4 года назад

    Thanks for tutorial. Why didn't you normalize or scale the data first?

  • @lenwerksgt5906
    @lenwerksgt5906 4 года назад

    Where did you download the Iris.CSV? I don't see that anywhere on the website. Please assist.

  • @ASTRAGOVANM
    @ASTRAGOVANM 5 лет назад

    what to do when you retry to run the coding with the same data and the results of the cluster have different observations / cluster??? help please

  • @murtadhaal-sharuee3874
    @murtadhaal-sharuee3874 9 лет назад

    Hi Influxity, really helpful thank u very much.
    And I have a question if u could ans me please, is there a way to specify the initial centroids and specify the type of distance we can use?

  • @Dr_Ali.Aljboury
    @Dr_Ali.Aljboury 7 лет назад

    It's very great now I understand how is it work . But there's one question how if we make it work with semantic analysis and topological parameters.

  • @DBProductions12345-m
    @DBProductions12345-m 7 лет назад

    How would this be done with the mnist_dataset? It's got 20,000 rows and 700+ columns

  • @Pokemonpets
    @Pokemonpets 8 лет назад

    how can you calculate F1-Measuare , precision, recall and entropy of the clustering result? additionally does it support sparse data? for text clustering sparse data is a must

    • @bklamoreaux_old
      @bklamoreaux_old 8 лет назад

      You can evaluate clustering algorithms with other techniques like: Calinski Harabasz Evaluation, Davies Bouldin Evaluation, Gap Evaluation (Distance from center), Silhouette Evaluation. See www.mathworks.com/help/stats/clustering.evaluation.clustercriterion-class.html

  • @Injektil_o
    @Injektil_o 10 лет назад +3

    Hi Influxity. You mentioned that you can estimate how many clusters there should be. Can you cover this, or have you covered this elsewhere?

  • @sourishmukherjee2404
    @sourishmukherjee2404 7 лет назад

    I am getting 3 clusters each of same size-50 obsv on each cluster.Somethinh wrong.Any comments?
    '

  • @marcoantoniomirandahernand6819
    @marcoantoniomirandahernand6819 10 лет назад

    Dear Influxity, I tried to do the same steps that you do, with the same DB, in R Gui (32 bits) , but when I execute: results

    • @influxity2694
      @influxity2694  10 лет назад +2

      Can check the file you're reading in? See if it has non-numeric or missing values. Look to see if you have an extra line, Null, or something like 1.2e+10. Let me know what you find and we'll go from there.

    • @NanaOnix23
      @NanaOnix23 10 лет назад

      Influxity Hii!!
      If the file has missing values, what should i do to fix this?
      thank you for this video, it helps me a lot!! :D

    • @michelleli8953
      @michelleli8953 9 лет назад

      Hi Marco Antonio Miranda Hernández , I had this problem initially as well and it turns out there was an error in my csv. file from copy+pasting.

    • @subhashishkoirala9292
      @subhashishkoirala9292 7 лет назад

      although very late try this
      results

  • @jeshrielpolancos5143
    @jeshrielpolancos5143 7 лет назад +2

    Error in table(Iris$class, results$cluster) :
    all arguments must have the same length
    how to fix this?

    • @siddharthadas86
      @siddharthadas86 7 лет назад +1

      Check if class is not a factor by levels, if not change it to factor.

  • @kapamagicman
    @kapamagicman 10 лет назад +2

    Thanks for doing this! I subscribed and liked. Also do some more similar functions in R

    • @influxity2694
      @influxity2694  10 лет назад +1

      Thank you. I'll work to get some more videos up soon.

  • @dduttaroy
    @dduttaroy 6 лет назад

    table(Iris$class, results$cluster) is not clear. Please explain.

  • @urmayshah6863
    @urmayshah6863 7 лет назад

    how can i train some data using k means and then test some data using that? and bit explaination about accuracy and all other parameters...!!!

  • @HonGoArtist
    @HonGoArtist 8 лет назад

    Question: How can we upload the kmeans results to the original data set? I want to compare reality to the predicted class for each entity. in other words. I'd like to see the predicted class results in the original spreadsheet "iris".

    • @janisgredzens7463
      @janisgredzens7463 8 лет назад +2

      If it is still helpful, here is a version using data.table package:
      -------------------------------------------------------------------------------------------------------------------------------------------------
      require(data.table)
      iris.features

    • @souravdas1983
      @souravdas1983 8 лет назад

      Also use 'confusionMatrix' to check how well the model has predicted.

  • @zinmot5457
    @zinmot5457 2 года назад

    Very helpful! Thank you, god bless you sir.

  • @becarefull01
    @becarefull01 8 лет назад

    Could you please provide implementation of different version of spectral clustering using R ?

  • @shabnamtafreshi714
    @shabnamtafreshi714 9 лет назад

    Simple and up to point. Thanks!

  • @vktonline
    @vktonline 8 лет назад

    could u please upload implementation, I want to make changes in algorithm and see the results

  • @hongngoctran1218
    @hongngoctran1218 5 лет назад

    How to analyze Cluster means? I don't understand it.

  • @nikhiljamisetti7139
    @nikhiljamisetti7139 6 лет назад

    CAN ANYBODY TELL ME HOW TO FIND THE DEFECT CLUSTERS IN THE ABOVE DATASET

  • @anoojkvarghese9903
    @anoojkvarghese9903 9 лет назад

    i get when k=3
    then cluster is formed 50,61,39 is it correct?

    • @vineyshar1
      @vineyshar1 9 лет назад

      kmeans does some random initialization at backend which results in slightly different outcome every time you run it.

  • @betzthomas9693
    @betzthomas9693 4 года назад

    what is vector in kmean clustering?

  • @XenomorphLV426
    @XenomorphLV426 4 года назад

    How do I scale the data?

  • @RobertCordrey
    @RobertCordrey 5 лет назад

    Even though it's an old video, a few important parts are intentionally left out. He says that you need to 'normalize your data' and remove any rows that are missing values. Unfortunately these instructions are not given, this only complicates how to properly complete this task from start to finish. It's only a couple lines of code to do these operations yet he doesn't show how to do it.

  • @sj8648
    @sj8648 6 лет назад

    What a beautiful voice!

  • @srivathsesh
    @srivathsesh 7 лет назад

    Thank you, very well illustrated

  • @sathyavel8046
    @sathyavel8046 10 лет назад +52

    am tired of seeing this iris data... lots of videos using the same...why there is no real world examples used .??

    • @influxity2694
      @influxity2694  10 лет назад +33

      I understand what you're saying and I'm sorry you feel that way. The point of the video is to understand k-means and see how it's used in R. The Iris dataset fits k-means well as it has nice clusters that are in lower dimensions. It's also a familiar dataset and easy to get and work with. This allows viewers to focus on the commands and outputs of the algorithm in R and get a better understanding of the algorithm in general without having to worry about more complicated datasets and edge cases. I'll work with some different datasets in future videos but want to pick them so they add to understanding the main point of the video.

    • @BishSinhaExcelsior
      @BishSinhaExcelsior 9 лет назад +14

      mightyvel vel iris data IS real world example but may not be in your area of work :)

    • @Pokemonpets
      @Pokemonpets 8 лет назад

      +mightyvel vel well you are just at the beginning. you will gross out with seeing how primitive examples and algorithm equations without a single example :D

    • @nahid7499
      @nahid7499 7 лет назад

      Bish Sinha ftv6

    • @aslogdahl4469
      @aslogdahl4469 7 лет назад +5

      It is very sad to hear that Irises are not considered to be part of the real world.

  • @sarthakbiswas2201
    @sarthakbiswas2201 8 лет назад

    Can someone do a k-modes as well? For categorical data?
    Since, k-means doesn't work for categorical I guess.

    • @souravdas1983
      @souravdas1983 8 лет назад

      What is the issue in replacing 'kmodes' function in R? For dataset with both categorical and numerical variables, use K-prototype (kproto)

  • @harshalaharivaliveti5185
    @harshalaharivaliveti5185 7 лет назад

    how do u download the data from archive.ics.uci.edu

  • @adityanjsg99
    @adityanjsg99 4 года назад

    Nice video.. Voice fades (not often but can be avoided)

  • @revenez
    @revenez 6 лет назад

    Very informative video. Thank you.

  • @HibaYahyaoui
    @HibaYahyaoui 9 лет назад

    Thank you very much, it is verry helpful tuto..

  • @muniseswar7526
    @muniseswar7526 7 лет назад

    how to download iris data set

  • @kaiyuwang2822
    @kaiyuwang2822 10 лет назад +1

    fantastic mate!

  • @otomehusband
    @otomehusband 6 лет назад

    thanks for providing this good videos

  • @sarahlacroix2350
    @sarahlacroix2350 9 лет назад +1

    This is awesome. Thank you !!!

  • @miguelguilherme4331
    @miguelguilherme4331 11 лет назад

    Amazing job! Thanks

  • @utkucansa
    @utkucansa 8 лет назад

    I liked the video. But the interpretation part could be wider. Thanks though,
    Cheers,

  • @vivekjoshi937
    @vivekjoshi937 7 лет назад

    This is amazing learning

  • @arpitbhatnagar2154
    @arpitbhatnagar2154 7 лет назад

    Thank you!
    It was helpful

  • @TheZvercica
    @TheZvercica 9 лет назад

    I am not able to download database!!!!! I'll go crazzy

    • @yovanyluis
      @yovanyluis 9 лет назад +1

      Hi, what I did, is to download "iris.data" and open it in a text editor like Notepad++, add a header, just write at the first line "sepal.length","sepal.width","petal.length","petal.width","class", it is required for a csv file. Then save it as "iris.csv", and now you can follow the video :D

    • @ajufsd
      @ajufsd 9 лет назад +1

      iris is there in R by default. jz key in "data(iris)" and you are in.

  • @rachidaitlhaj9176
    @rachidaitlhaj9176 8 лет назад

    Good Job

  • @Eleni.314
    @Eleni.314 6 лет назад

    Thank you sir!

  • @zhou6075
    @zhou6075 2 года назад

    thanks

  • @pepikkk10
    @pepikkk10 7 лет назад

    thanks a mil :D

  • @kanikaswap
    @kanikaswap 7 лет назад +1

    good

  • @vktonline
    @vktonline 8 лет назад

    nice

  • @subhashishkoirala9292
    @subhashishkoirala9292 7 лет назад

    all of you all who are having " Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)" problem
    instead of
    results

  • @mm_007
    @mm_007 2 года назад

    To keep your voice audible, do you have to pay extra taxes?