@@statquest It went well, thank you! Hopefully I get good grades. I was thinking of suggesting that it would be great if you could cover Markov Chain Monte Carlo and related topics. Thank you again! Your channel has been incredibly helpful!
00:10 K-nearest neighbors is a simple algorithm for classifying data. 00:50 Clustering data using PCA and classifying new cell type 01:29 K-nearest neighbors classifies new data based on nearest annotated cells. 02:12 K-nearest neighbors algorithm assigns a category based on the majority of nearest neighbors' votes. 02:59 K-nearest neighbors algorithm classifies unknown points based on nearest neighbors 03:40 K-nearest neighbors can avoid ties by using an odd K value. 04:22 Choosing the best value for K is crucial for K-nearest neighbors. 05:01 Categories with few samples are outvoted
These videos are just amazing and clearly are extremely successful in simplifying topics that are usually thought of as difficult. Can you please also make videos on its code in python/R..? and of naive bayes too maybe. That would be super useful. Thank you very much for this level of awesome content.
Just wow thanks Josh. You are just great. One doubt however, if k values are large will outliers not affect my algo? Effect of outliers in knn? Please answer.
Your video is amazing as always... It would be great if you can include how to choose the value for 'k' and evaluation metrics for kNN. Also, if I understand it right, there is no actual "training" happening in kNN. It is about arranging the points on the cartesian plane and when a new data point comes, it will again be placed on the same plane and depending on the value of "k", it will be classified. Correct me if I'm wrong.
Hi. Yes, you are right. KNN is easy to implement and understand and has been widely used in academia and industry for decades. You may utilise the cross-validation technique and the validation datasets to select the value for k.
That is awsom how you explain this topics. One suggestion, you could show how the 7 nearest ist red, 3 nearest ist orange and 1 nearest is green for the point in the middle. By my eyes, the 1 nearest neigbour ist still red! and it makes me confuse what does nearest means actually :)
@@lowqualitydude8460 Thanks! Unfortunately, since the original comment, RUclips has discontinued the feature that let me make small changes to a video. However, if I ever update this one with something new, I'll be sure to make this more obvious.
When we have categorical variable like Yes/No or type of job (which can take four values: business, healthcare, engineering, or education), how can we calculate distances? Is knn useful at all?
If there is a distance metric, then it KNN will work, and there are distance metrics for categorical variables. See stackoverflow.com/questions/2981743/ways-to-calculate-similarity/2983763#2983763
Really great video(s) :-) One question about the heatmap 4:00: On the x-axis you have datapoints (width?), and the y-axis are elements (squares) with values (colors). What is the meaning of a color? And how do you plot a sample? Sample is something like x_i = [v1, v2, ..., vn] where n is the number of squares in the diagonal?
what exactly does the model store after being trained? Does it store all the features and labels in our training data? So that while predicting it can compare with all distances?
Hello Josh, can you please explain why 3 nearest neighbours are Orange and 1 nearest neighbour is green (the red one looks closest to the black spot to me)? I might have misunderstood the meaning of k nearest neighbours, though PS: loved your explanation, thank you!
Low values of K(k=1 or K=2) can be noisy. But in your example, the cells are evenly space. K=1 seems to be perfect or do not have outliers. Or do you mean that in real cases, there is a cluster and not evenly spread like yours?
Thanks for the very informative info ! Though I have a question , if my dataset is filled with just categorical string data. So no numerical data . Is there a way I can still use knn to predict ? I heard about encoding the string to numerical value but that seems very complex with big dataset .
If you use R, then you can use a Random Forest to cluster anything and then apply KNN to that clustering: ruclips.net/video/sQ870aTKqiM/видео.html If you don't use R, you can use target encoding: ruclips.net/video/589nCGeWG1w/видео.html
I love this guy's shtick. Corny, slightly annoying music, although I'm sure he is a great musician. Slightly condescending voice when he goes over the material... like "I'm making this so fucking easy for you... you can't possibly not understand this". It's actually quite calming. He speaks slowly too. You don't have to constantly pause his videos. I understand everyone of his videos. If I don't, it's because I didn't yet watch any prerequisite videos that he tells you at the beginning to watch. He never takes for granted that you understand some detail. This is the BIGGEST freakin' mistake of educators. Some damn variable in a formula that they forget to explain. Also, he will use the simplest example possible so that you understand. I am returning to school, grad school in the ML track for computer science. I don't remember much of the math that I took 20 years ago. This guy is a lifesaver. Wish I watched these when I started. I will be watching all of his videos. After I graduate and make some money, I'm sending him some bucks thru Patreon. Thanks man!
BAM! Thank you very much! I think I must have "resting condescending voice" - because several people have made the comment that I sound a little condescending - but trust me this is not intentional! :)
@@statquest It's actually reassuring. You know, when you are talking to someone who is freaking out? And you make it sound like "Dood, this not that hard."
I'm working on it. I have 5 videos so far, and 5 more to go before I have the whole playlist. Here's the link to the first one: ruclips.net/video/CqOfi41LfDw/видео.html and the other links are here: statquest.org/video-index/
This video is good as usual but I think there should be some more concepts explained. Like distance metrics, lazy algorithm property of KNN and elbow method.
Hi . Thanks for video . About the concept of KNN , how the location of unknown cell change in scatter plot . You must change the location of that? And second question, we should change the value of k to reach best k ?
The location of the "unknown cell" is fixed - it does not change. Just the classification changes. I offer a few thoughts about how to pick 'k' at 4:12, but, other than that, you can use cross validation: ruclips.net/video/fSytzGwwBVw/видео.html
Is pretending part of the training data as "unknown" means, we separete the data to training and test data ? Before using the test data to try our algorithm for the optimal K value ?
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
Would you please make a video to explain the difference between -
1) KNN and K-Means
2) Classification and Clustering
@@mohammadumarshaikh7787 Here's my video on K-means: ruclips.net/video/4b5d3muPQmA/видео.html
Whenever I search for a video tutorial, and you pop up in the search results, my heart fills with joy!!! ^^
Thank you once again!
Hooray!!!!! :)
same here ..not started the video yet but only 1 video on knn .....dont know if i can understand very very well like linear regression
I'm taking a machine learning course at university, and I've been blessed with having found your channel. Keep up the great content!
Hooray! I'm glad the videos are helpful. :)
Five minutes explains better than some teachers spent one hour. :)
Thank you! :)
Better than teacher spending semester for me
hahahahaha
@@free_thinker4958 wtf really? also my teacher took 5 minutes that's why I understood nothing
For real, this channel is a godsend.
INTRO IS LEGENDARY BRO : )
Yup, that's a good one. :)
Every time I see your videos I'm simply amazed how you manage to make things simple,it's like 1+1=2, respect
Thank you! :)
When a random RUclips channel explains it better than your University Professor....
Keep it up!
Wow, thanks!
This is by far the best video on KNN algo ! Thanks Josh
You are doing awesome work Sir..have watched your other videos as well..very intuitive and logically explained
This channel is salt of the Earth
Thanks!
I am brushing up on my ML terminology and StatQuest always comes to the rescue!! BAM!
bam!
it is good to listen to your music in your website after watching this clear-explained video. thanks a lot.
Thank you so much! :)
Thank you josh and the FFGDUNCCH (the friendly folks from the genetics department at the university of north carolina at chapel hill)
Triple bam! :)
This channel is GOD SENT. Period.
Thanks!
When I search for something and find it on StatQuest channel. Super BAM!!
YES!
I can't believe how good you are at explaining this. wow!!!
bam!
Man, you are a legend, if I pass from the exam on Monday (which I am pretty hopeless), I will buy one of your shirts next month
Hooray! Good luck with your exam! :)
@@statquest Hey, I failed :D but still, I learnt a lot, thanks!
@@eltajbabazade1189 Better luck next time! :)
@@eltajbabazade1189 I hope you graduated successfully 🙂.
Your videos are K-nearest perfection :)
Ha! Very funny.
@@statquest Noice 👍 Thanks 👍
Thank you so much for saving our time sir❤ love from Srilanka 🇱🇰
bam!
Thank you so much. So useful honestly - i didnt get this from a 2 hour lecture
Glad it was helpful!
Hey Josh! This is just a thank you note saying if I pass the upcoming exam, then it would be all because of you! ❤
Good luck!!! Let me know how it goes!
@@statquest It went well, thank you! Hopefully I get good grades. I was thinking of suggesting that it would be great if you could cover Markov Chain Monte Carlo and related topics. Thank you again! Your channel has been incredibly helpful!
@@suparnaroy2829 I'm glad it went well! And I'll keep those topics in mind.
I am so glad I found this channel.
Thanks!
Your videos are sooo great, I can't stop watching 💖💖 thank you
Hooray!!!!
StatQuest with Josh Starmer can you add an ICA as well?
It's on the to-do list, but it might be a while before I get to it.
StatQuest with Josh Starmer 😔😕 that's sad, but i look forward to it. You explain beautifully sir! 💪🏼👊🏼
Very well explained and loved your uke intro by the way :)
Thank you!
one video explained better than a whole semester
Awesome! :)
Very clear, I got the idea of this concept right away.
Well done, thanks!
THanks!
WOWW! This was super helpful!
Thanks Josh!
Glad it was helpful!
Another exciting episode of statquest!
bam! :)
Where would I be without StatQuest? Luckily, I now have the statistical tool to estimate this!
bam!
you are the master of machine learning
:)
It is unfair that I can't give this video another like.
:)
Simple and Clear explanation. Thank you!
Thanks!
Thank you! This helped me so much in understanding KNN faster :D
Hooray!!! :)
00:10 K-nearest neighbors is a simple algorithm for classifying data.
00:50 Clustering data using PCA and classifying new cell type
01:29 K-nearest neighbors classifies new data based on nearest annotated cells.
02:12 K-nearest neighbors algorithm assigns a category based on the majority of nearest neighbors' votes.
02:59 K-nearest neighbors algorithm classifies unknown points based on nearest neighbors
03:40 K-nearest neighbors can avoid ties by using an odd K value.
04:22 Choosing the best value for K is crucial for K-nearest neighbors.
05:01 Categories with few samples are outvoted
You forgot the bam! :)
That opening banjo solo is prettt sweet.
Thanks!
Ohhh man this so simple
Thqqq for this type of explanation
Most welcome 😊
You're a legend at explaining.
:)
Easy to understand and straightforward. Thanks.
Thanks!
Summarised in a very short video....just perfect
Thank you! :)
Thank you, very clear and to the point explanation !
These videos are just amazing and clearly are extremely successful in simplifying topics that are usually thought of as difficult. Can you please also make videos on its code in python/R..? and of naive bayes too maybe. That would be super useful. Thank you very much for this level of awesome content.
I'll keep that in mind.
Dang. Simple and to the point! Thank you!
Thanks!
Clear and concise explanation. Thank you :)
Thanks! :)
Thank you for your Clear explanation.
You're welcome! :)
I love you sir! Your video save my life!
Happy to help!
Great explanation! BAM! Great illustrations! Double BAM!!
Thank you very much! :)
Great video man
Thanks!
Thank you. Very good explanation in such a short time.
Thanks! :)
awesome explanation ! thank you so much!
Thank you! :)
BAM! Amazing explanation!
Thanks!
You are amazing! Thank u so much.
Cheers from BRAZIL
Muito obrigado! :)
Just wow thanks Josh. You are just great. One doubt however, if k values are large will outliers not affect my algo? Effect of outliers in knn? Please answer.
I believe that large values for K will provide some protection from outliers.
BAM!!! That was great as usual.
Hooray! Thank you! :)
You're a legend ! Thank you :)
Thanks!
Wow! such a great explainer
Glad you think so!
Loved it.... Thank you 😊
Glad you enjoyed it!
Best explanation ever, thank you!!!
Thanks!
thank you so much.This was well explained.
Thanks!
Josh was in Bangaluru, I saw him there!!
bam! :)
Your video is amazing as always... It would be great if you can include how to choose the value for 'k' and evaluation metrics for kNN. Also, if I understand it right, there is no actual "training" happening in kNN. It is about arranging the points on the cartesian plane and when a new data point comes, it will again be placed on the same plane and depending on the value of "k", it will be classified. Correct me if I'm wrong.
Hi. Yes, you are right. KNN is easy to implement and understand and has been widely used in academia and industry for decades. You may utilise the cross-validation technique and the validation datasets to select the value for k.
Amazing explanation! Thank you!
Many thanks for the clear explanation
Thanks! :)
You are awesome man!!
Thanks!
THANK YOU JOSH!
Anytime! :)
That is awsom how you explain this topics. One suggestion, you could show how the 7 nearest ist red, 3 nearest ist orange and 1 nearest is green for the point in the middle. By my eyes, the 1 nearest neigbour ist still red! and it makes me confuse what does nearest means actually :)
What time point, minutes and seconds, are you referring to?
@@statquest 02:36, it confuses me too
@@lowqualitydude8460 Thanks! Unfortunately, since the original comment, RUclips has discontinued the feature that let me make small changes to a video. However, if I ever update this one with something new, I'll be sure to make this more obvious.
awesome! You should do a quadratic discriminant analysis to go with your awesome one on LDA
My 10 year old hums statquest song made me realise I my new obsession with this
bam!
Thanks sir, great explanation!
Glad you liked it!
Well explained, thank you good sir!
Glad it was helpful!
Bam! Smart and clear as usual.
is considering this my favourite channel makes me a nerd ?
It makes you awesome! :)
BAM!!! You nailed it.
Thank you! :)
Great tutorial!
Thank you!
lifesaver! thank you!
Glad it helped!
Good job ! I loved the videooo :)
Thanks!
So much clearer than my lecturer fam
Thanks!
@@statquest no, thank you :)
Thank you so much
No problem!
When we have categorical variable like Yes/No or type of job (which can take four values: business, healthcare, engineering, or education), how can we calculate distances? Is knn useful at all?
If there is a distance metric, then it KNN will work, and there are distance metrics for categorical variables. See stackoverflow.com/questions/2981743/ways-to-calculate-similarity/2983763#2983763
Thanks, you're great
Thanks!
Really great video(s) :-) One question about the heatmap 4:00: On the x-axis you have datapoints (width?), and the y-axis are elements (squares) with values (colors). What is the meaning of a color? And how do you plot a sample? Sample is something like x_i = [v1, v2, ..., vn] where n is the number of squares in the diagonal?
To learn more about heatmaps, checkout: ruclips.net/video/oMtDyOn2TCc/видео.html and ruclips.net/video/7xHsRkOdVwo/видео.html
@@statquest Ah I see, because the columns are similar, you can take a sample that are similar to two samples.
Good stuff, thanks! Do you have any videos about survival analysis?
what exactly does the model store after being trained? Does it store all the features and labels in our training data? So that while predicting it can compare with all distances?
I believe so.
Nice video well done
Thanks!
Hello Josh, can you please explain why 3 nearest neighbours are Orange and 1 nearest neighbour is green (the red one looks closest to the black spot to me)?
I might have misunderstood the meaning of k nearest neighbours, though
PS: loved your explanation, thank you!
What time point, minutes and seconds, are you asking about?
@@statquest I'm sorry I forgot to mention it, it's at 2:29
@@sarvesh_7736 Of the 11 colored dots that are closest to the big black dot, 7 of them are red, 3 of them are orange and 1 one of them is green.
THANK YOU!
YOU HAVE SAVED ME :D
Awesome! :)
I like your bandcamp!
Hooray! Thank you! :)
Thank you!
You bet!
Low values of K(k=1 or K=2) can be noisy. But in your example, the cells are evenly space. K=1 seems to be perfect or do not have outliers. Or do you mean that in real cases, there is a cluster and not evenly spread like yours?
This is just an example of the principals behind k-nearest neighbors.
Thanks for the very informative info ! Though I have a question , if my dataset is filled with just categorical string data. So no numerical data . Is there a way I can still use knn to predict ? I heard about encoding the string to numerical value but that seems very complex with big dataset .
If you use R, then you can use a Random Forest to cluster anything and then apply KNN to that clustering: ruclips.net/video/sQ870aTKqiM/видео.html If you don't use R, you can use target encoding: ruclips.net/video/589nCGeWG1w/видео.html
thank you so much for this video! i have my midterm tomorrow and im so scared :(
Good luck!!
Thanks alot for this video.
Hooray! :)
I love this guy's shtick. Corny, slightly annoying music, although I'm sure he is a great musician. Slightly condescending voice when he goes over the material... like "I'm making this so fucking easy for you... you can't possibly not understand this". It's actually quite calming. He speaks slowly too. You don't have to constantly pause his videos. I understand everyone of his videos. If I don't, it's because I didn't yet watch any prerequisite videos that he tells you at the beginning to watch.
He never takes for granted that you understand some detail. This is the BIGGEST freakin' mistake of educators. Some damn variable in a formula that they forget to explain. Also, he will use the simplest example possible so that you understand.
I am returning to school, grad school in the ML track for computer science. I don't remember much of the math that I took 20 years ago. This guy is a lifesaver. Wish I watched these when I started. I will be watching all of his videos.
After I graduate and make some money, I'm sending him some bucks thru Patreon.
Thanks man!
BAM! Thank you very much! I think I must have "resting condescending voice" - because several people have made the comment that I sound a little condescending - but trust me this is not intentional! :)
@@statquest It's actually reassuring. You know, when you are talking to someone who is freaking out? And you make it sound like "Dood, this not that hard."
@@Steve-3P0 Nice! :)
Your videos are really great! Clear and detailed explanation. Can you please make a similar detailed playlist for neural networks?
I'm working on it. I have 5 videos so far, and 5 more to go before I have the whole playlist. Here's the link to the first one: ruclips.net/video/CqOfi41LfDw/видео.html and the other links are here: statquest.org/video-index/
@@statquest Yes I have seen those videos, just wanted to know whether there are more videos to come. Eagerly waiting!
@@unnatinandrekar99 The next one comes out on Monday, and then the rest will come out, one or two per week, for the next month.
@@statquest BAM!!!! That's prefect!!!!!!!!
This video is good as usual but I think there should be some more concepts explained. Like distance metrics, lazy algorithm property of KNN and elbow method.
Thanks for the feedback.
Omg thank you so much
No problem!
Omg, thank you so much!!!!!
Happy to help!
Hi . Thanks for video . About the concept of KNN , how the location of unknown cell change in scatter plot . You must change the location of that? And second question, we should change the value of k to reach best k ?
The location of the "unknown cell" is fixed - it does not change. Just the classification changes. I offer a few thoughts about how to pick 'k' at 4:12, but, other than that, you can use cross validation: ruclips.net/video/fSytzGwwBVw/видео.html
StatQuest with Josh Starmer I got it . Thank you 🙏 😀
BAM!
:)
Is pretending part of the training data as "unknown" means, we separete the data to training and test data ? Before using the test data to try our algorithm for the optimal K value ?
What time point, minutes and seconds, are you asking about?
watch for the stats, stay for the intro songs
bam! :)
Is it actually based on just vote? or are the votes weighted based on the distance to the data point?
You can do it either way. If you add weights, then it is "weighted K-NN"