K-means clustering: how it works

Поделиться
HTML-код
  • Опубликовано: 1 окт 2024
  • Full lecture: bit.ly/K-means
    The K-means algorithm starts by placing K points (centroids) at random locations in space. We then perform the following steps iteratively: (1) for each instance, we assign it to a cluster with the nearest centroid, and (2) we move each centroid to the mean of the instances assigned to it. The algorithm continues until no instances change cluster membership.

Комментарии • 386

  • @feraudyh
    @feraudyh 9 лет назад +598

    I have rarely seen a clearer presentation.

  • @sedgeleyp
    @sedgeleyp Год назад +7

    Sounds just like Andrew Tate

    • @xxcodingtime
      @xxcodingtime 6 месяцев назад

      Thanks ı dont want to listen more ….

  • @undefined915
    @undefined915 8 лет назад +72

    seriously, best K means explained

  • @abail7010
    @abail7010 6 лет назад +56

    Thank you! My prof tried to explain this in 1.30h.
    You did it in 7 minutes and I understood it even better.

    • @induction7895
      @induction7895 2 года назад

      This is because your brain is already primed to learn and knows what it understands and doesn't understand about the topic.

    • @abail7010
      @abail7010 2 года назад

      @@induction7895 What are you trying to say?

  • @tomjuggles
    @tomjuggles 7 лет назад +8

    So much better than the official university explanation. Thanks a lot for this work!

  • @christianeheiligers3629
    @christianeheiligers3629 9 лет назад +27

    Fantastically clear and uncomplicated description of k means clustering.

  • @denizgursel
    @denizgursel 10 лет назад +14

    finally, someone who explained it step by step and visually, thanks a lot!

    • @vlavrenko
      @vlavrenko  10 лет назад

      Thanks!

    • @Nuzee03
      @Nuzee03 10 лет назад

      Victor Lavrenko if you have categorical data what should you run if not k means??

    • @vlavrenko
      @vlavrenko  10 лет назад

      With categorical data the notion of a "mean" or "centroid" is not so straightforward. You could use the mode (most frequent attribute value) instead of the mean. Or use agglomerative clustering, which does not require you to instantiate centroids.

  • @GyManiac
    @GyManiac 8 лет назад +40

    Awesome explanation. Good and clear introduction with the theory and a perfect example afterwards.
    Many thanks to you sir.

  • @munindra4511
    @munindra4511 5 лет назад +1

    You can find more information here:
    infohub69.blogspot.com/2019/09/what-is-k-means-clustering.html

  • @__enx__
    @__enx__ 5 лет назад +2

    Thank you. you made it clear with example
    Lets cluster out those 108 downvotes

  • @davidfield5295
    @davidfield5295 8 лет назад +2

    The example at the end was very helpful, thanks for the video

  • @wesleylondon5916
    @wesleylondon5916 4 года назад +1

    Why couldn't my teacher just say this?

  • @sanshinron
    @sanshinron 8 лет назад +4

    Awesome video. Could you also explain how to decide how many centroids an algorithm should use?

    • @philippederome2434
      @philippederome2434 5 лет назад

      See elbow point in next proposed RUclips video: ruclips.net/video/4b5d3muPQmA/видео.html

  • @BloopComics
    @BloopComics 3 года назад +1

    How do I add the 2 to the 4?

  • @abhinaik4u
    @abhinaik4u 8 лет назад +3

    Sir, you made it so simple. Thanks. After searching a lot, finally this video made my concepts clear.

  • @yasirhilal7814
    @yasirhilal7814 9 лет назад +20

    Thank you ...very simple....... very useful

    • @vlavrenko
      @vlavrenko  9 лет назад +4

      You're welcome! Thank you for the kind words.

    • @malusivanpandy9535
      @malusivanpandy9535 9 лет назад +1

    • @lazyac_
      @lazyac_ 8 лет назад

      +Victor Lavrenko you are a great teacher! this is soo clear! formula on wikipedia just confuses me, i thought i will never understand this, thank you.

  • @saherios
    @saherios 8 лет назад +1

    Great and clear explanation. Could you possibly add a video of the formal proof of why this algorithm always converges?

  • @AI-HOMELAB
    @AI-HOMELAB 4 месяца назад

    great explanation, simple and visualized. Thanks! =)

  • @Duaj8
    @Duaj8 7 лет назад +3

    This is the best K-means explanation i've ever seen!
    Great job, thank you!

  • @arsenalfan251
    @arsenalfan251 6 лет назад +1

    1000% clearest k-means explanation out there

  • @renzocoppola4664
    @renzocoppola4664 7 лет назад +1

    This kind of stuff on youtube makes my life so much easier.

  • @shaisimonson3330
    @shaisimonson3330 7 лет назад +2

    +How might one decide what the best K is? That is, what if all you have ar ethe points, can this be used to fiind a reasonable K and their members? Or do you just have to run the algorithm for K = 2 to? and look at each result?

    • @shaahin6818
      @shaahin6818 6 лет назад

      Shai Simonson several methods are available for determination of the K. Just search

  • @boughariourabii9496
    @boughariourabii9496 2 года назад

    7/ hedha fih a8lat
    silhouette = []
    for i in range(2, 8):
    CHA_model=AgglomerativeClustering(n_clusters=i, affinity="manhattan", linkage="complete")
    CHA_model.fit(arr_h)
    labels = CHA_model.predict(arr_h)
    silhouette.append(silhouette_score(arr_h,labels, metric='manhattan'))

  • @Nissearne12
    @Nissearne12 7 лет назад

    Good that you also describe what Eucalidian means Thank's

  • @DINO5551000
    @DINO5551000 2 года назад

    Um, if my data set are 2D x,y coordinates, how exactly am I supposed to calculate the mean then?

  • @nateanderson4080
    @nateanderson4080 7 лет назад

    I assign attributes all the time in my k-means clusters... for example I will add a male-female categorical. If the data is scaled from 0 to 12, I will assign a 0 to males and 12 to females. For example, if the resulting cluster, gives 6 average then I would say the cluster is equally male and female, a result of 3 would be 'leans male', etc.

  • @rishbhardwaj1431
    @rishbhardwaj1431 5 лет назад +1

    Whaaaa. I got k-means in 2.5 minutes! This was amazing!

  • @prateekyadav7682
    @prateekyadav7682 7 лет назад

    This guy sounds exactly like Martin Scorcese. I'm not kidding, even the way of talking is same to a tee!!

  • @iCore7Gaming
    @iCore7Gaming 4 года назад

    idk if it's just me but all these equations just confuse me.

  • @沈天鱼
    @沈天鱼 3 года назад

    It's 2021 now, this video is still much much better than my prof's class

  • @payelbanerjee9192
    @payelbanerjee9192 7 лет назад

    dear sir , i have studies that there are two types of clustering algorithm linear and non linear . Under linear clustering algorithm there lies k means , Fuzzy c means, hierarchical , quality threshold , whereas under non linear clustering algorithm there lies density based , model based , graph based algorithms ... ok will you please explain what do we understand by linear clustering algorithm >??What is the main difference between linear and non linear clustering algorithm ?/

  • @rodrigoperea8750
    @rodrigoperea8750 4 года назад

    thanks...great explanation (tip: as you said, quick way to visualize starts at 4:50 )!

  • @joseluiz_bh
    @joseluiz_bh 8 лет назад

    I mean, i must cluster considering more than one variable!

  • @andrewchristian6009
    @andrewchristian6009 4 года назад

    thank you for you explanation !!! you helped me solve this problem

  • @paumasia475
    @paumasia475 3 года назад

    Amazing channel!! What did you studied in university??

  • @joseluiz_bh
    @joseluiz_bh 8 лет назад

    Thank you! I have a doubt: your example uses just one variable (or one set of values)! How clusters considering more than a set of values (for example, 2 or 3 variables - like MIN_TEMPERATURE, MAX_TEMPERATURE, MONTH) are calculated?

  • @tymofei8586
    @tymofei8586 4 года назад

    What about marginal,extreme values? They may change average vector dramatically

  • @techienomadiso8970
    @techienomadiso8970 3 года назад

    Wow this is Soo well explained... 7years later it's still🔥🔥🔥🔥

  • @xinyuechang6062
    @xinyuechang6062 2 года назад

    I know the attribute can only be numeric, but for categorical, is that possible to represent each category by a number, and then run the k-means?

  • @eyasabiedan6950
    @eyasabiedan6950 6 лет назад

    ok you sound like that scientest from the simpsons so you can't be wrong :D !!!

  • @rojitrojit6576
    @rojitrojit6576 4 года назад

    my man sounding like Gale Boetticher From Breaking Bad.

  • @creedddzz
    @creedddzz 9 лет назад

    Very nice explanation. Than you.

  • @JC-yf9mz
    @JC-yf9mz 4 года назад

    One question: what if after first assignment of all points to one single cluster and centroids recalculation we still have all points closer to that very first centroid? Then, i guess, it will fail

  • @havel7661
    @havel7661 2 года назад

    This is awesome! Thanks for your explanation!

  • @ZlobnyiSerg
    @ZlobnyiSerg 5 лет назад +1

    Best 7 minutes spent to understand! Thank you

  • @taowei2336
    @taowei2336 Год назад

    Very clear explanation. Thanks for your work

  • @gautamkarmakar3443
    @gautamkarmakar3443 8 лет назад

    I am basically covering all your classes and making all the notes try my best to make the best of these resources.

  • @randygrozario2453
    @randygrozario2453 5 лет назад

    set the speed at 0.75 and thank me later :)

  • @bing13bong
    @bing13bong 3 года назад

    That moment when you go "Ooooohhhhhhhh. My professor's an idiot"

  • @ApexPredator283
    @ApexPredator283 2 месяца назад

    What happens if one of the centeroids does not get any datapoint assigned to it in the first clusterin round?

  • @XDFcooler
    @XDFcooler 5 лет назад

    Clusters R very nice 2 eat. That's why the internet is too fat, it eats too many cookies.

  • @BuckySeifert
    @BuckySeifert 9 лет назад

    Thanks. this was very helpful. Glad to have found a video that explained it in a manner that didn't take too long to get the message through.
    By the by, your voice reminds me of the character Pritchard from Deus Ex: Human Revolution. Sorry if you get that a lot.

  • @gregsadler968
    @gregsadler968 6 лет назад

    I had no idea Tom Hardy made videos about machine learning!!

  • @someaccount3438
    @someaccount3438 5 лет назад

    Could we, instead of choosing k completely random points, instead choose k points randomly from our data set? Also, instead of waiting for none of our points to move cluster to stop the looping, wouldn't it be much faster (albeit less precise) to stop once we see that the difference between the new and old clusters is very small?

  • @lokesh542
    @lokesh542 4 года назад

    Loved the way your explained the concepts

  • @nO_d3N1AL
    @nO_d3N1AL 10 лет назад +2

    Best explanation I've ever come across

  • @BelyaevValera
    @BelyaevValera 7 лет назад

    Плохо понимаю по английски, но объясняет лучше чем некоторые на русском :)

  • @arber10
    @arber10 4 года назад

    At 1:10, "...you assign, this point x_j...". You mean, x_i. Thank you for this video!

  • @PatrickBateman12420
    @PatrickBateman12420 7 лет назад

    almost 250k views on a K-Means Clustering Algo Video ?!?! RESPECT !!!

  • @charlesdale2600
    @charlesdale2600 5 лет назад

    What is you have point that is the same distance to both the red and yellow triangle?

  • @nsv.2344
    @nsv.2344 9 лет назад

    I have a doubt, why are we not computing sigma, the variance and the prior probability and recomputing just the means? I know that for GMM we do calculate means and variances and priors, and if we don't recompute them for GMM, we get K-means.... but my question is why are we NOT recomputing in K-means? Thanks in advance and awesome videos :D

  • @CoryBradley
    @CoryBradley 6 лет назад

    Can anyone address this: if you have a mix of categorical, ordinal and nominal variables and you want to find the cluster, what analysis is that?

  • @Nissearne12
    @Nissearne12 7 лет назад

    Thank's especialy the example you show helps a lot for me to understand.

  • @ilhamsafeek3167
    @ilhamsafeek3167 4 года назад

    Poor explaination. Cannot be understood by everyone

  • @tylermerlin8320
    @tylermerlin8320 5 лет назад

    I would start by what it accomplishes.

  • @incorectusername
    @incorectusername 4 года назад

    I wish the lecturer i have at uni was as good as you prof Victor.

  • @justpressstart
    @justpressstart 8 лет назад

    That was brilliant, I understood it straight away.

  • @admnemma381
    @admnemma381 7 лет назад

    Thank you Victor verry clear presentation

  • @lizravenwood5317
    @lizravenwood5317 6 лет назад

    Wow. Excellent explanation. Thanks.

  • @MarianSauter
    @MarianSauter 6 лет назад

    Clear explanation, thanks!

  • @hoanguyenfutu
    @hoanguyenfutu 6 лет назад

    are there any software for outlier detection using K means?

  • @rodrigoloza4263
    @rodrigoloza4263 8 лет назад

    Greetings, very nice video. I have a question, do we define the distortion equation as being the number of iterations that we set for the algorithm to converge? i mean we may get stuck in some local optima for some random starting points. Thus, we should make the algorithm run from the very beginning (from the random initialization) several times in order to, hopefully, find the global minimum in one of the iterations from the beginning.

  • @jijithjeevan4310
    @jijithjeevan4310 2 года назад

    Thank you..very well explained

  • @ericarnaud5062
    @ericarnaud5062 10 лет назад +2

    Thank you victor, that the best tuto about the K-means I ever watched....

    • @vlavrenko
      @vlavrenko  10 лет назад +1

      Thanks! Happy to know this is helpful

  • @mayurkulkarni755
    @mayurkulkarni755 8 лет назад

    I've got professor in university who's got PhD in Machine Learning and MS in whatever but he doesn't seem to teach as good as you. Best explanation of k means on internet!. Thank you very much

  • @boughariourabii9496
    @boughariourabii9496 2 года назад

    4/a = df_heart[df_heart['restecg'] == "normal"].index
    df_heart.drop(index=list(a), inplace=True)

  • @remilacan6192
    @remilacan6192 10 лет назад

    Thank you very much, after have found some pages with mathematics formulas, Now I understand the implémentation !

    • @vlavrenko
      @vlavrenko  10 лет назад

      Very happy you find it useful, thanks!

  • @BeradinhoYilmaz
    @BeradinhoYilmaz Год назад

    Hi sir is k means and kneighborhood algorithms are same ?

  • @SulacoCalico
    @SulacoCalico 9 лет назад

    Really useful, but man that was some fast talking !

  • @DereC519
    @DereC519 6 месяцев назад

    ty

  • @1976turkish
    @1976turkish 8 лет назад

    Very succinctly explained. Well done

  • @mementovivere2
    @mementovivere2 5 лет назад

    Brilliant video! Many thanks!

  • @jamesang7861
    @jamesang7861 5 лет назад

    thanks this video is purrrfect!!

  • @tipums123
    @tipums123 7 лет назад

    Very good presentation!!!!

  • @tilakshenoy8440
    @tilakshenoy8440 8 лет назад +1

    Thank you so much for the crystal clear and precise presentation Sir!! :)

  • @MrLazini
    @MrLazini 2 года назад

    this was amazing and blazingly fast ! thanks !

  • @vladislavkrestinin235
    @vladislavkrestinin235 4 года назад

    Thanks for clear explanation!

  • @GamingPotatoHD
    @GamingPotatoHD 3 года назад

    I thought he was talking in 1.5x lol.

  • @arnabthakuria2243
    @arnabthakuria2243 5 лет назад

    very nice explaination ,very clear and concise.thank you very much

  • @valentincalomme
    @valentincalomme 7 лет назад

    What happens if a centroid has no point assigned to itself?

  • @Ivan-cp2hn
    @Ivan-cp2hn 6 лет назад

    i hope the video can start directly from 4:23

  • @ShadyaTears
    @ShadyaTears 3 года назад

    Awesome explanation, thank you Victor!

  • @AndyKong51
    @AndyKong51 6 лет назад

    Very clear illustrations

  • @nithinkumarrnk
    @nithinkumarrnk 7 лет назад

    Very well explained. Thanks

  • @Hounnomefigo
    @Hounnomefigo 7 лет назад

    I understand more with you in english language that on my teacher's book in italian. Thank you!

  • @justintimekim
    @justintimekim 10 лет назад

    best k-means explanation ever, thanks
    one question i have is,
    considering the randomly selected centroids,
    will the end results always be the same?

    • @vlavrenko
      @vlavrenko  10 лет назад

      Justin Kim Thanks for the kind words.
      No, the end result will be different each time you run K-means from different starting points. The algorithm finds only a local minimum of the error (intra-cluster variance).
      If computation time is not an issue, you run the algorithm multiple times, and in the end pick the clustering that gives the lowest intra-cluster variance -- it is, in a way, the best fit of K clusters to your data.

  • @p190188
    @p190188 9 лет назад

    Doubt : if there are multiple attributes like (X1-Xn),(Y1-Yn),(Z1-Zn) etc how do we compute the centroid.?

  • @ArjunSK
    @ArjunSK Год назад

    Great tutorial!

  • @corvomichele
    @corvomichele 6 лет назад

    Incredible teaching skills!

  • @grace1735
    @grace1735 2 года назад

    Clear explanation, thank you