StatQuest: K-nearest neighbors, Clearly Explained

Поделиться
HTML-код
  • Опубликовано: 4 янв 2025

Комментарии • 454

  • @statquest
    @statquest  2 года назад +13

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @mohammadumarshaikh7787
      @mohammadumarshaikh7787 23 дня назад

      Would you please make a video to explain the difference between -
      1) KNN and K-Means
      2) Classification and Clustering

    • @statquest
      @statquest  22 дня назад

      @@mohammadumarshaikh7787 Here's my video on K-means: ruclips.net/video/4b5d3muPQmA/видео.html

  • @alexanderpalm6407
    @alexanderpalm6407 4 года назад +176

    Whenever I search for a video tutorial, and you pop up in the search results, my heart fills with joy!!! ^^
    Thank you once again!

    • @statquest
      @statquest  4 года назад +5

      Hooray!!!!! :)

    • @siddheshbalsaraf1776
      @siddheshbalsaraf1776 3 года назад

      same here ..not started the video yet but only 1 video on knn .....dont know if i can understand very very well like linear regression

  • @thinkalinkle
    @thinkalinkle 6 лет назад +173

    I'm taking a machine learning course at university, and I've been blessed with having found your channel. Keep up the great content!

    • @statquest
      @statquest  6 лет назад +11

      Hooray! I'm glad the videos are helpful. :)

  • @raytang1867
    @raytang1867 6 лет назад +519

    Five minutes explains better than some teachers spent one hour. :)

    • @statquest
      @statquest  6 лет назад +8

      Thank you! :)

    • @free_thinker4958
      @free_thinker4958 4 года назад +23

      Better than teacher spending semester for me

    • @TeacherMarcus1010
      @TeacherMarcus1010 3 года назад

      hahahahaha

    • @spano1723
      @spano1723 3 года назад +4

      @@free_thinker4958 wtf really? also my teacher took 5 minutes that's why I understood nothing

    • @NerdByFate
      @NerdByFate 3 года назад +3

      For real, this channel is a godsend.

  • @mandarkulkarni4741
    @mandarkulkarni4741 4 года назад +66

    INTRO IS LEGENDARY BRO : )

    • @statquest
      @statquest  4 года назад +2

      Yup, that's a good one. :)

  • @rahulsadanandan5076
    @rahulsadanandan5076 4 года назад +26

    Every time I see your videos I'm simply amazed how you manage to make things simple,it's like 1+1=2, respect

  • @spacemeter3001
    @spacemeter3001 4 года назад +15

    When a random RUclips channel explains it better than your University Professor....
    Keep it up!

  • @ayush612
    @ayush612 6 лет назад +6

    This is by far the best video on KNN algo ! Thanks Josh

    • @ayush612
      @ayush612 6 лет назад

      You are doing awesome work Sir..have watched your other videos as well..very intuitive and logically explained

  • @칸츄리혜올희
    @칸츄리혜올희 9 месяцев назад +3

    This channel is salt of the Earth

  • @Guinhulol
    @Guinhulol 11 месяцев назад +1

    I am brushing up on my ML terminology and StatQuest always comes to the rescue!! BAM!

  • @ycao6
    @ycao6 6 лет назад +11

    it is good to listen to your music in your website after watching this clear-explained video. thanks a lot.

  • @laiscarraro9960
    @laiscarraro9960 3 года назад +6

    Thank you josh and the FFGDUNCCH (the friendly folks from the genetics department at the university of north carolina at chapel hill)

  • @qicai3682
    @qicai3682 Год назад +2

    This channel is GOD SENT. Period.

  • @atifayaz3495
    @atifayaz3495 4 года назад +5

    When I search for something and find it on StatQuest channel. Super BAM!!

  • @lucarauchenberger628
    @lucarauchenberger628 3 года назад +3

    I can't believe how good you are at explaining this. wow!!!

  • @eltajbabazade1189
    @eltajbabazade1189 4 года назад +3

    Man, you are a legend, if I pass from the exam on Monday (which I am pretty hopeless), I will buy one of your shirts next month

    • @statquest
      @statquest  4 года назад +2

      Hooray! Good luck with your exam! :)

    • @eltajbabazade1189
      @eltajbabazade1189 3 года назад +1

      @@statquest Hey, I failed :D but still, I learnt a lot, thanks!

    • @statquest
      @statquest  3 года назад

      @@eltajbabazade1189 Better luck next time! :)

    • @MillerMoore-gq2pe
      @MillerMoore-gq2pe 6 месяцев назад +1

      @@eltajbabazade1189 I hope you graduated successfully 🙂.

  • @THEMATT222
    @THEMATT222 2 года назад +7

    Your videos are K-nearest perfection :)

    • @statquest
      @statquest  2 года назад +2

      Ha! Very funny.

    • @THEMATT222
      @THEMATT222 2 года назад +2

      @@statquest Noice 👍 Thanks 👍

  • @ShafniSide
    @ShafniSide Год назад +1

    Thank you so much for saving our time sir❤ love from Srilanka 🇱🇰

  • @jovinMendes
    @jovinMendes 3 года назад +4

    Thank you so much. So useful honestly - i didnt get this from a 2 hour lecture

    • @statquest
      @statquest  3 года назад +1

      Glad it was helpful!

  • @suparnaroy2829
    @suparnaroy2829 Год назад +1

    Hey Josh! This is just a thank you note saying if I pass the upcoming exam, then it would be all because of you! ❤

    • @statquest
      @statquest  Год назад +1

      Good luck!!! Let me know how it goes!

    • @suparnaroy2829
      @suparnaroy2829 Год назад +1

      @@statquest It went well, thank you! Hopefully I get good grades. I was thinking of suggesting that it would be great if you could cover Markov Chain Monte Carlo and related topics. Thank you again! Your channel has been incredibly helpful!

    • @statquest
      @statquest  Год назад

      @@suparnaroy2829 I'm glad it went well! And I'll keep those topics in mind.

  • @chandananarayanan5498
    @chandananarayanan5498 3 года назад +1

    I am so glad I found this channel.

  • @fafamnzm
    @fafamnzm 6 лет назад +9

    Your videos are sooo great, I can't stop watching 💖💖 thank you

    • @statquest
      @statquest  6 лет назад

      Hooray!!!!

    • @fafamnzm
      @fafamnzm 6 лет назад +1

      StatQuest with Josh Starmer can you add an ICA as well?

    • @statquest
      @statquest  6 лет назад

      It's on the to-do list, but it might be a while before I get to it.

    • @fafamnzm
      @fafamnzm 6 лет назад +1

      StatQuest with Josh Starmer 😔😕 that's sad, but i look forward to it. You explain beautifully sir! 💪🏼👊🏼

  • @NoMeVayasDePr0
    @NoMeVayasDePr0 2 года назад +1

    Very well explained and loved your uke intro by the way :)

  • @husamalsalek2333
    @husamalsalek2333 5 лет назад +3

    one video explained better than a whole semester

  • @il5083
    @il5083 2 года назад +1

    Very clear, I got the idea of this concept right away.
    Well done, thanks!

  • @NiceHirwa
    @NiceHirwa 7 месяцев назад +1

    WOWW! This was super helpful!
    Thanks Josh!

    • @statquest
      @statquest  7 месяцев назад

      Glad it was helpful!

  • @SuperBlackHurricane
    @SuperBlackHurricane Год назад +1

    Another exciting episode of statquest!

  • @DRmrTG
    @DRmrTG Год назад +1

    Where would I be without StatQuest? Luckily, I now have the statistical tool to estimate this!

  • @arms1991
    @arms1991 3 года назад +1

    you are the master of machine learning

  • @pablo_brianese
    @pablo_brianese 4 года назад +2

    It is unfair that I can't give this video another like.

  • @sabindawadi741
    @sabindawadi741 Год назад +1

    Simple and Clear explanation. Thank you!

  • @derpfaceonigiri4950
    @derpfaceonigiri4950 6 лет назад +6

    Thank you! This helped me so much in understanding KNN faster :D

  • @aashidm
    @aashidm 9 месяцев назад

    00:10 K-nearest neighbors is a simple algorithm for classifying data.
    00:50 Clustering data using PCA and classifying new cell type
    01:29 K-nearest neighbors classifies new data based on nearest annotated cells.
    02:12 K-nearest neighbors algorithm assigns a category based on the majority of nearest neighbors' votes.
    02:59 K-nearest neighbors algorithm classifies unknown points based on nearest neighbors
    03:40 K-nearest neighbors can avoid ties by using an odd K value.
    04:22 Choosing the best value for K is crucial for K-nearest neighbors.
    05:01 Categories with few samples are outvoted

    • @statquest
      @statquest  9 месяцев назад

      You forgot the bam! :)

  • @davidk7212
    @davidk7212 2 года назад +2

    That opening banjo solo is prettt sweet.

  • @arshadshaik822
    @arshadshaik822 3 года назад +1

    Ohhh man this so simple
    Thqqq for this type of explanation

  • @thechesslobster2768
    @thechesslobster2768 4 года назад +1

    You're a legend at explaining.

  • @hamedsharifian
    @hamedsharifian 3 года назад +1

    Easy to understand and straightforward. Thanks.

  • @mahipkhandelwal3072
    @mahipkhandelwal3072 4 года назад +1

    Summarised in a very short video....just perfect

  • @Oliver-nt8pw
    @Oliver-nt8pw 5 лет назад +6

    Thank you, very clear and to the point explanation !

  • @grishmareddy7882
    @grishmareddy7882 6 месяцев назад +1

    These videos are just amazing and clearly are extremely successful in simplifying topics that are usually thought of as difficult. Can you please also make videos on its code in python/R..? and of naive bayes too maybe. That would be super useful. Thank you very much for this level of awesome content.

    • @statquest
      @statquest  6 месяцев назад

      I'll keep that in mind.

  • @VH-yg8rx
    @VH-yg8rx 2 года назад +1

    Dang. Simple and to the point! Thank you!

  • @prekshyabasnet6854
    @prekshyabasnet6854 5 лет назад +6

    Clear and concise explanation. Thank you :)

  • @stalindavid8208
    @stalindavid8208 6 лет назад +6

    Thank you for your Clear explanation.

  • @silvenlau7436
    @silvenlau7436 2 года назад +1

    I love you sir! Your video save my life!

  • @daniekpo
    @daniekpo 4 года назад +1

    Great explanation! BAM! Great illustrations! Double BAM!!

    • @statquest
      @statquest  4 года назад +1

      Thank you very much! :)

  • @matthewerwine8333
    @matthewerwine8333 3 месяца назад +1

    Great video man

  • @igorristovski1309
    @igorristovski1309 5 лет назад +1

    Thank you. Very good explanation in such a short time.

  • @bealynor
    @bealynor Год назад +1

    awesome explanation ! thank you so much!

  • @ritwicverma
    @ritwicverma 4 года назад +2

    BAM! Amazing explanation!

  • @henriquenonenmacher8701
    @henriquenonenmacher8701 4 года назад +1

    You are amazing! Thank u so much.
    Cheers from BRAZIL

  • @hemantdas9546
    @hemantdas9546 4 года назад +1

    Just wow thanks Josh. You are just great. One doubt however, if k values are large will outliers not affect my algo? Effect of outliers in knn? Please answer.

    • @statquest
      @statquest  4 года назад +1

      I believe that large values for K will provide some protection from outliers.

  • @prashantbisht2219
    @prashantbisht2219 4 года назад +1

    BAM!!! That was great as usual.

    • @statquest
      @statquest  4 года назад

      Hooray! Thank you! :)

  • @amine1995athlete
    @amine1995athlete 3 года назад +1

    You're a legend ! Thank you :)

  • @m0tivati0n71
    @m0tivati0n71 Год назад +1

    Wow! such a great explainer

  • @grovvy_essence.1070
    @grovvy_essence.1070 3 года назад +1

    Loved it.... Thank you 😊

    • @statquest
      @statquest  3 года назад

      Glad you enjoyed it!

  • @oscargalovich1323
    @oscargalovich1323 3 года назад +1

    Best explanation ever, thank you!!!

  • @abenadonkor2796
    @abenadonkor2796 2 года назад +1

    thank you so much.This was well explained.

  • @adityanjsg99
    @adityanjsg99 26 дней назад +1

    Josh was in Bangaluru, I saw him there!!

  • @hianjana
    @hianjana 5 лет назад +5

    Your video is amazing as always... It would be great if you can include how to choose the value for 'k' and evaluation metrics for kNN. Also, if I understand it right, there is no actual "training" happening in kNN. It is about arranging the points on the cartesian plane and when a new data point comes, it will again be placed on the same plane and depending on the value of "k", it will be classified. Correct me if I'm wrong.

    • @startrek3779
      @startrek3779 3 года назад

      Hi. Yes, you are right. KNN is easy to implement and understand and has been widely used in academia and industry for decades. You may utilise the cross-validation technique and the validation datasets to select the value for k.

  • @kuangliew
    @kuangliew 7 лет назад +2

    Amazing explanation! Thank you!

  • @user20517
    @user20517 4 года назад +1

    Many thanks for the clear explanation

  • @shwetanknaveen
    @shwetanknaveen 3 года назад +1

    You are awesome man!!

  • @MegaCliff1234
    @MegaCliff1234 4 года назад +1

    THANK YOU JOSH!

  • @maryamzirak6291
    @maryamzirak6291 Год назад

    That is awsom how you explain this topics. One suggestion, you could show how the 7 nearest ist red, 3 nearest ist orange and 1 nearest is green for the point in the middle. By my eyes, the 1 nearest neigbour ist still red! and it makes me confuse what does nearest means actually :)

    • @statquest
      @statquest  Год назад

      What time point, minutes and seconds, are you referring to?

    • @lowqualitydude8460
      @lowqualitydude8460 Месяц назад +1

      @@statquest 02:36, it confuses me too

    • @statquest
      @statquest  Месяц назад

      ​@@lowqualitydude8460 Thanks! Unfortunately, since the original comment, RUclips has discontinued the feature that let me make small changes to a video. However, if I ever update this one with something new, I'll be sure to make this more obvious.

  • @mastermike890
    @mastermike890 7 лет назад +3

    awesome! You should do a quadratic discriminant analysis to go with your awesome one on LDA

  • @sanaali3069
    @sanaali3069 Год назад +1

    My 10 year old hums statquest song made me realise I my new obsession with this

  • @omarmarie7802
    @omarmarie7802 2 года назад +1

    Thanks sir, great explanation!

    • @statquest
      @statquest  2 года назад +1

      Glad you liked it!

  • @csbanki
    @csbanki 2 года назад +1

    Well explained, thank you good sir!

    • @statquest
      @statquest  2 года назад

      Glad it was helpful!

  • @pouce902
    @pouce902 7 лет назад +1

    Bam! Smart and clear as usual.

  • @hamzamhadhbi3060
    @hamzamhadhbi3060 3 года назад +1

    is considering this my favourite channel makes me a nerd ?

    • @statquest
      @statquest  3 года назад +1

      It makes you awesome! :)

  • @clementchidozie4009
    @clementchidozie4009 4 года назад +1

    BAM!!! You nailed it.

  • @shamanthrajreddy1230
    @shamanthrajreddy1230 2 года назад +1

    Great tutorial!

  • @taetaereporter
    @taetaereporter Год назад +1

    lifesaver! thank you!

  • @ratnakaramsravanti2042
    @ratnakaramsravanti2042 Год назад +1

    Good job ! I loved the videooo :)

  • @JakeBSHere
    @JakeBSHere 3 года назад +1

    So much clearer than my lecturer fam

  • @debatradas1597
    @debatradas1597 2 года назад +1

    Thank you so much

  • @yashmehta4481
    @yashmehta4481 3 года назад +1

    When we have categorical variable like Yes/No or type of job (which can take four values: business, healthcare, engineering, or education), how can we calculate distances? Is knn useful at all?

    • @statquest
      @statquest  3 года назад +1

      If there is a distance metric, then it KNN will work, and there are distance metrics for categorical variables. See stackoverflow.com/questions/2981743/ways-to-calculate-similarity/2983763#2983763

  • @TomYoungblood-cx7wm
    @TomYoungblood-cx7wm 2 месяца назад +1

    Thanks, you're great

  • @TheTessatje123
    @TheTessatje123 Год назад

    Really great video(s) :-) One question about the heatmap 4:00: On the x-axis you have datapoints (width?), and the y-axis are elements (squares) with values (colors). What is the meaning of a color? And how do you plot a sample? Sample is something like x_i = [v1, v2, ..., vn] where n is the number of squares in the diagonal?

    • @statquest
      @statquest  Год назад +1

      To learn more about heatmaps, checkout: ruclips.net/video/oMtDyOn2TCc/видео.html and ruclips.net/video/7xHsRkOdVwo/видео.html

    • @TheTessatje123
      @TheTessatje123 Год назад +1

      @@statquest Ah I see, because the columns are similar, you can take a sample that are similar to two samples.

  • @lilmoesk899
    @lilmoesk899 7 лет назад +2

    Good stuff, thanks! Do you have any videos about survival analysis?

  • @devolopeagae5411
    @devolopeagae5411 11 месяцев назад

    what exactly does the model store after being trained? Does it store all the features and labels in our training data? So that while predicting it can compare with all distances?

    • @statquest
      @statquest  11 месяцев назад

      I believe so.

  • @danny89620
    @danny89620 Год назад +1

    Nice video well done

  • @sarvesh_7736
    @sarvesh_7736 4 года назад +1

    Hello Josh, can you please explain why 3 nearest neighbours are Orange and 1 nearest neighbour is green (the red one looks closest to the black spot to me)?
    I might have misunderstood the meaning of k nearest neighbours, though
    PS: loved your explanation, thank you!

    • @statquest
      @statquest  4 года назад +1

      What time point, minutes and seconds, are you asking about?

    • @sarvesh_7736
      @sarvesh_7736 4 года назад

      @@statquest I'm sorry I forgot to mention it, it's at 2:29

    • @statquest
      @statquest  4 года назад

      @@sarvesh_7736 Of the 11 colored dots that are closest to the big black dot, 7 of them are red, 3 of them are orange and 1 one of them is green.

  • @sciencetrainee3583
    @sciencetrainee3583 5 лет назад +2

    THANK YOU!
    YOU HAVE SAVED ME :D

  • @TheSuperninja10
    @TheSuperninja10 6 лет назад +2

    I like your bandcamp!

    • @statquest
      @statquest  6 лет назад +1

      Hooray! Thank you! :)

  • @omarbenazza
    @omarbenazza 4 года назад +1

    Thank you!

  • @WordofSpirit
    @WordofSpirit 3 года назад

    Low values of K(k=1 or K=2) can be noisy. But in your example, the cells are evenly space. K=1 seems to be perfect or do not have outliers. Or do you mean that in real cases, there is a cluster and not evenly spread like yours?

    • @statquest
      @statquest  3 года назад

      This is just an example of the principals behind k-nearest neighbors.

  • @ItsAllRelative
    @ItsAllRelative Год назад

    Thanks for the very informative info ! Though I have a question , if my dataset is filled with just categorical string data. So no numerical data . Is there a way I can still use knn to predict ? I heard about encoding the string to numerical value but that seems very complex with big dataset .

    • @statquest
      @statquest  Год назад

      If you use R, then you can use a Random Forest to cluster anything and then apply KNN to that clustering: ruclips.net/video/sQ870aTKqiM/видео.html If you don't use R, you can use target encoding: ruclips.net/video/589nCGeWG1w/видео.html

  • @shawteex3
    @shawteex3 10 месяцев назад +1

    thank you so much for this video! i have my midterm tomorrow and im so scared :(

    • @statquest
      @statquest  10 месяцев назад +1

      Good luck!!

  • @MB-vd6hc
    @MB-vd6hc 4 года назад +1

    Thanks alot for this video.

  • @Steve-3P0
    @Steve-3P0 4 года назад +6

    I love this guy's shtick. Corny, slightly annoying music, although I'm sure he is a great musician. Slightly condescending voice when he goes over the material... like "I'm making this so fucking easy for you... you can't possibly not understand this". It's actually quite calming. He speaks slowly too. You don't have to constantly pause his videos. I understand everyone of his videos. If I don't, it's because I didn't yet watch any prerequisite videos that he tells you at the beginning to watch.
    He never takes for granted that you understand some detail. This is the BIGGEST freakin' mistake of educators. Some damn variable in a formula that they forget to explain. Also, he will use the simplest example possible so that you understand.
    I am returning to school, grad school in the ML track for computer science. I don't remember much of the math that I took 20 years ago. This guy is a lifesaver. Wish I watched these when I started. I will be watching all of his videos.
    After I graduate and make some money, I'm sending him some bucks thru Patreon.
    Thanks man!

    • @statquest
      @statquest  4 года назад

      BAM! Thank you very much! I think I must have "resting condescending voice" - because several people have made the comment that I sound a little condescending - but trust me this is not intentional! :)

    • @Steve-3P0
      @Steve-3P0 4 года назад +1

      @@statquest It's actually reassuring. You know, when you are talking to someone who is freaking out? And you make it sound like "Dood, this not that hard."

    • @statquest
      @statquest  4 года назад

      @@Steve-3P0 Nice! :)

  • @unnatinandrekar99
    @unnatinandrekar99 3 года назад

    Your videos are really great! Clear and detailed explanation. Can you please make a similar detailed playlist for neural networks?

    • @statquest
      @statquest  3 года назад +1

      I'm working on it. I have 5 videos so far, and 5 more to go before I have the whole playlist. Here's the link to the first one: ruclips.net/video/CqOfi41LfDw/видео.html and the other links are here: statquest.org/video-index/

    • @unnatinandrekar99
      @unnatinandrekar99 3 года назад +1

      @@statquest Yes I have seen those videos, just wanted to know whether there are more videos to come. Eagerly waiting!

    • @statquest
      @statquest  3 года назад +1

      @@unnatinandrekar99 The next one comes out on Monday, and then the rest will come out, one or two per week, for the next month.

    • @unnatinandrekar99
      @unnatinandrekar99 3 года назад +1

      @@statquest BAM!!!! That's prefect!!!!!!!!

  • @PradeepSharma-fl2uy
    @PradeepSharma-fl2uy 4 года назад

    This video is good as usual but I think there should be some more concepts explained. Like distance metrics, lazy algorithm property of KNN and elbow method.

    • @statquest
      @statquest  4 года назад

      Thanks for the feedback.

  • @ja1211
    @ja1211 7 месяцев назад +1

    Omg thank you so much

  • @MrRynRules
    @MrRynRules 4 года назад +1

    Omg, thank you so much!!!!!

  • @parisbeauty5750
    @parisbeauty5750 4 года назад +1

    Hi . Thanks for video . About the concept of KNN , how the location of unknown cell change in scatter plot . You must change the location of that? And second question, we should change the value of k to reach best k ?

    • @statquest
      @statquest  4 года назад +1

      The location of the "unknown cell" is fixed - it does not change. Just the classification changes. I offer a few thoughts about how to pick 'k' at 4:12, but, other than that, you can use cross validation: ruclips.net/video/fSytzGwwBVw/видео.html

    • @parisbeauty5750
      @parisbeauty5750 4 года назад +1

      StatQuest with Josh Starmer I got it . Thank you 🙏 😀

  • @coredump7827
    @coredump7827 5 лет назад +26

    BAM!

  • @Rushield3981cc
    @Rushield3981cc 2 года назад

    Is pretending part of the training data as "unknown" means, we separete the data to training and test data ? Before using the test data to try our algorithm for the optimal K value ?

    • @statquest
      @statquest  2 года назад

      What time point, minutes and seconds, are you asking about?

  • @rockfordlines3547
    @rockfordlines3547 2 года назад +1

    watch for the stats, stay for the intro songs

  • @ah2522
    @ah2522 4 года назад +1

    Is it actually based on just vote? or are the votes weighted based on the distance to the data point?

    • @statquest
      @statquest  4 года назад

      You can do it either way. If you add weights, then it is "weighted K-NN"