K-means clustering: how it works
HTML-код
- Опубликовано: 1 окт 2024
- Full lecture: bit.ly/K-means
The K-means algorithm starts by placing K points (centroids) at random locations in space. We then perform the following steps iteratively: (1) for each instance, we assign it to a cluster with the nearest centroid, and (2) we move each centroid to the mean of the instances assigned to it. The algorithm continues until no instances change cluster membership.
I have rarely seen a clearer presentation.
Thank you!
+Victor Lavrenko
No. Thank YOU
I totally agree!
V10DC86
agree
Sounds just like Andrew Tate
Thanks ı dont want to listen more ….
seriously, best K means explained
Thank you! My prof tried to explain this in 1.30h.
You did it in 7 minutes and I understood it even better.
This is because your brain is already primed to learn and knows what it understands and doesn't understand about the topic.
@@induction7895 What are you trying to say?
So much better than the official university explanation. Thanks a lot for this work!
Fantastically clear and uncomplicated description of k means clustering.
finally, someone who explained it step by step and visually, thanks a lot!
Thanks!
Victor Lavrenko if you have categorical data what should you run if not k means??
With categorical data the notion of a "mean" or "centroid" is not so straightforward. You could use the mode (most frequent attribute value) instead of the mean. Or use agglomerative clustering, which does not require you to instantiate centroids.
Awesome explanation. Good and clear introduction with the theory and a perfect example afterwards.
Many thanks to you sir.
You can find more information here:
infohub69.blogspot.com/2019/09/what-is-k-means-clustering.html
Thank you. you made it clear with example
Lets cluster out those 108 downvotes
The example at the end was very helpful, thanks for the video
Why couldn't my teacher just say this?
Awesome video. Could you also explain how to decide how many centroids an algorithm should use?
See elbow point in next proposed RUclips video: ruclips.net/video/4b5d3muPQmA/видео.html
How do I add the 2 to the 4?
Sir, you made it so simple. Thanks. After searching a lot, finally this video made my concepts clear.
Thank you ...very simple....... very useful
You're welcome! Thank you for the kind words.
,
+Victor Lavrenko you are a great teacher! this is soo clear! formula on wikipedia just confuses me, i thought i will never understand this, thank you.
Great and clear explanation. Could you possibly add a video of the formal proof of why this algorithm always converges?
great explanation, simple and visualized. Thanks! =)
This is the best K-means explanation i've ever seen!
Great job, thank you!
1000% clearest k-means explanation out there
This kind of stuff on youtube makes my life so much easier.
+How might one decide what the best K is? That is, what if all you have ar ethe points, can this be used to fiind a reasonable K and their members? Or do you just have to run the algorithm for K = 2 to? and look at each result?
Shai Simonson several methods are available for determination of the K. Just search
7/ hedha fih a8lat
silhouette = []
for i in range(2, 8):
CHA_model=AgglomerativeClustering(n_clusters=i, affinity="manhattan", linkage="complete")
CHA_model.fit(arr_h)
labels = CHA_model.predict(arr_h)
silhouette.append(silhouette_score(arr_h,labels, metric='manhattan'))
Good that you also describe what Eucalidian means Thank's
Um, if my data set are 2D x,y coordinates, how exactly am I supposed to calculate the mean then?
I assign attributes all the time in my k-means clusters... for example I will add a male-female categorical. If the data is scaled from 0 to 12, I will assign a 0 to males and 12 to females. For example, if the resulting cluster, gives 6 average then I would say the cluster is equally male and female, a result of 3 would be 'leans male', etc.
Whaaaa. I got k-means in 2.5 minutes! This was amazing!
This guy sounds exactly like Martin Scorcese. I'm not kidding, even the way of talking is same to a tee!!
idk if it's just me but all these equations just confuse me.
It's 2021 now, this video is still much much better than my prof's class
dear sir , i have studies that there are two types of clustering algorithm linear and non linear . Under linear clustering algorithm there lies k means , Fuzzy c means, hierarchical , quality threshold , whereas under non linear clustering algorithm there lies density based , model based , graph based algorithms ... ok will you please explain what do we understand by linear clustering algorithm >??What is the main difference between linear and non linear clustering algorithm ?/
thanks...great explanation (tip: as you said, quick way to visualize starts at 4:50 )!
I mean, i must cluster considering more than one variable!
thank you for you explanation !!! you helped me solve this problem
Amazing channel!! What did you studied in university??
Thank you! I have a doubt: your example uses just one variable (or one set of values)! How clusters considering more than a set of values (for example, 2 or 3 variables - like MIN_TEMPERATURE, MAX_TEMPERATURE, MONTH) are calculated?
What about marginal,extreme values? They may change average vector dramatically
Wow this is Soo well explained... 7years later it's still🔥🔥🔥🔥
I know the attribute can only be numeric, but for categorical, is that possible to represent each category by a number, and then run the k-means?
ok you sound like that scientest from the simpsons so you can't be wrong :D !!!
my man sounding like Gale Boetticher From Breaking Bad.
Very nice explanation. Than you.
One question: what if after first assignment of all points to one single cluster and centroids recalculation we still have all points closer to that very first centroid? Then, i guess, it will fail
This is awesome! Thanks for your explanation!
Best 7 minutes spent to understand! Thank you
Very clear explanation. Thanks for your work
I am basically covering all your classes and making all the notes try my best to make the best of these resources.
set the speed at 0.75 and thank me later :)
That moment when you go "Ooooohhhhhhhh. My professor's an idiot"
What happens if one of the centeroids does not get any datapoint assigned to it in the first clusterin round?
Clusters R very nice 2 eat. That's why the internet is too fat, it eats too many cookies.
Thanks. this was very helpful. Glad to have found a video that explained it in a manner that didn't take too long to get the message through.
By the by, your voice reminds me of the character Pritchard from Deus Ex: Human Revolution. Sorry if you get that a lot.
I had no idea Tom Hardy made videos about machine learning!!
Could we, instead of choosing k completely random points, instead choose k points randomly from our data set? Also, instead of waiting for none of our points to move cluster to stop the looping, wouldn't it be much faster (albeit less precise) to stop once we see that the difference between the new and old clusters is very small?
Loved the way your explained the concepts
Best explanation I've ever come across
Thanks!
Плохо понимаю по английски, но объясняет лучше чем некоторые на русском :)
At 1:10, "...you assign, this point x_j...". You mean, x_i. Thank you for this video!
almost 250k views on a K-Means Clustering Algo Video ?!?! RESPECT !!!
What is you have point that is the same distance to both the red and yellow triangle?
I have a doubt, why are we not computing sigma, the variance and the prior probability and recomputing just the means? I know that for GMM we do calculate means and variances and priors, and if we don't recompute them for GMM, we get K-means.... but my question is why are we NOT recomputing in K-means? Thanks in advance and awesome videos :D
Can anyone address this: if you have a mix of categorical, ordinal and nominal variables and you want to find the cluster, what analysis is that?
Thank's especialy the example you show helps a lot for me to understand.
Poor explaination. Cannot be understood by everyone
I would start by what it accomplishes.
I wish the lecturer i have at uni was as good as you prof Victor.
That was brilliant, I understood it straight away.
Thank you Victor verry clear presentation
Wow. Excellent explanation. Thanks.
Clear explanation, thanks!
are there any software for outlier detection using K means?
Greetings, very nice video. I have a question, do we define the distortion equation as being the number of iterations that we set for the algorithm to converge? i mean we may get stuck in some local optima for some random starting points. Thus, we should make the algorithm run from the very beginning (from the random initialization) several times in order to, hopefully, find the global minimum in one of the iterations from the beginning.
Thank you..very well explained
Thank you victor, that the best tuto about the K-means I ever watched....
Thanks! Happy to know this is helpful
I've got professor in university who's got PhD in Machine Learning and MS in whatever but he doesn't seem to teach as good as you. Best explanation of k means on internet!. Thank you very much
4/a = df_heart[df_heart['restecg'] == "normal"].index
df_heart.drop(index=list(a), inplace=True)
Thank you very much, after have found some pages with mathematics formulas, Now I understand the implémentation !
Very happy you find it useful, thanks!
Hi sir is k means and kneighborhood algorithms are same ?
Really useful, but man that was some fast talking !
ty
Very succinctly explained. Well done
Brilliant video! Many thanks!
thanks this video is purrrfect!!
Very good presentation!!!!
Thank you so much for the crystal clear and precise presentation Sir!! :)
this was amazing and blazingly fast ! thanks !
Thanks for clear explanation!
I thought he was talking in 1.5x lol.
very nice explaination ,very clear and concise.thank you very much
What happens if a centroid has no point assigned to itself?
i hope the video can start directly from 4:23
Awesome explanation, thank you Victor!
Very clear illustrations
Very well explained. Thanks
I understand more with you in english language that on my teacher's book in italian. Thank you!
best k-means explanation ever, thanks
one question i have is,
considering the randomly selected centroids,
will the end results always be the same?
Justin Kim Thanks for the kind words.
No, the end result will be different each time you run K-means from different starting points. The algorithm finds only a local minimum of the error (intra-cluster variance).
If computation time is not an issue, you run the algorithm multiple times, and in the end pick the clustering that gives the lowest intra-cluster variance -- it is, in a way, the best fit of K clusters to your data.
Doubt : if there are multiple attributes like (X1-Xn),(Y1-Yn),(Z1-Zn) etc how do we compute the centroid.?
Great tutorial!
Incredible teaching skills!
Clear explanation, thank you