StatQuest: K-nearest neighbors, Clearly Explained

Поделиться
HTML-код
  • Опубликовано: 29 сен 2024

Комментарии • 442

  • @statquest
    @statquest  2 года назад +11

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @raytang1867
    @raytang1867 5 лет назад +499

    Five minutes explains better than some teachers spent one hour. :)

    • @statquest
      @statquest  5 лет назад +8

      Thank you! :)

    • @free_thinker4958
      @free_thinker4958 4 года назад +22

      Better than teacher spending semester for me

    • @TeacherMarcus1010
      @TeacherMarcus1010 3 года назад

      hahahahaha

    • @spano1723
      @spano1723 2 года назад +4

      @@free_thinker4958 wtf really? also my teacher took 5 minutes that's why I understood nothing

    • @NerdByFate
      @NerdByFate 2 года назад +3

      For real, this channel is a godsend.

  • @vyvu443
    @vyvu443 2 года назад

    Could you create the video with Edited Nearest Neighbors and Condensed Nearest Neighbors? Thank you.

    • @statquest
      @statquest  2 года назад

      I'll keep that in mind.

  • @benitshetty8492
    @benitshetty8492 4 года назад +1

    any video on Naive Bayes algorithm?

    • @statquest
      @statquest  4 года назад +1

      Not yet, but it is on the to-do list.

    • @benitshetty8492
      @benitshetty8492 4 года назад +1

      Well I appeared for the stat test today. Ur videos on LDA and KNN helped a lot. But ya I wish even naive Bayes was available!

  • @lucaslai6782
    @lucaslai6782 4 года назад

    Hello Josh, Could you tell me when and where I should use KNN, KMC, Hierarchical, or other unsupervised machine learnings? By this I mean, are there any metrics to judge which one is better? Or in which situation, this one is more suitable than another one?

    • @statquest
      @statquest  4 года назад

      It depends on the field and your goal. Often heatmaps are clustered with hierarchal clustering. PCA is often combined with KNN.

  • @bachphantat
    @bachphantat Год назад

    I have read somewhere that you take the k nearest points then take their average? So which one is correct? or in which situation?

    • @statquest
      @statquest  Год назад +1

      In this example, we're using k-nearest neighbors for classification. For example, we might ask if some new person likes the movie Troll 2 or not. In that case, taking the average doesn't make any sense. However, if we were trying to solve some sort of regression problem, like predict how tall some might be, it might make sense to take the average of the k-nearest neighbors.

  • @ilya9481
    @ilya9481 3 года назад

    Is it a good strategy to choose K as the size of the smallest category? so it doesn't out-vote a category with a small amount of samples?

    • @statquest
      @statquest  3 года назад

      It depends, sometimes k should be smaller.

  • @chenzeping9603
    @chenzeping9603 2 года назад

    What if the labels are not binary and you do 3nn and the three neighbours are all diff?

    • @statquest
      @statquest  2 года назад

      Then you would call it a tie and fail to classify the new data.

  • @alexanderpalm6407
    @alexanderpalm6407 4 года назад +157

    Whenever I search for a video tutorial, and you pop up in the search results, my heart fills with joy!!! ^^
    Thank you once again!

    • @statquest
      @statquest  4 года назад +4

      Hooray!!!!! :)

    • @siddheshbalsaraf1776
      @siddheshbalsaraf1776 3 года назад

      same here ..not started the video yet but only 1 video on knn .....dont know if i can understand very very well like linear regression

  • @thinkalinkle
    @thinkalinkle 5 лет назад +164

    I'm taking a machine learning course at university, and I've been blessed with having found your channel. Keep up the great content!

    • @statquest
      @statquest  5 лет назад +10

      Hooray! I'm glad the videos are helpful. :)

  • @mandarkulkarni4741
    @mandarkulkarni4741 4 года назад +63

    INTRO IS LEGENDARY BRO : )

    • @statquest
      @statquest  4 года назад +2

      Yup, that's a good one. :)

  • @rahulsadanandan5076
    @rahulsadanandan5076 4 года назад +26

    Every time I see your videos I'm simply amazed how you manage to make things simple,it's like 1+1=2, respect

  • @spacemeter3001
    @spacemeter3001 3 года назад +15

    When a random RUclips channel explains it better than your University Professor....
    Keep it up!

  • @coredump7827
    @coredump7827 4 года назад +26

    BAM!

  • @pablo_brianese
    @pablo_brianese 3 года назад +2

    It is unfair that I can't give this video another like.

  • @ycao6
    @ycao6 5 лет назад +11

    it is good to listen to your music in your website after watching this clear-explained video. thanks a lot.

  • @Steve-3P0
    @Steve-3P0 4 года назад +6

    I love this guy's shtick. Corny, slightly annoying music, although I'm sure he is a great musician. Slightly condescending voice when he goes over the material... like "I'm making this so fucking easy for you... you can't possibly not understand this". It's actually quite calming. He speaks slowly too. You don't have to constantly pause his videos. I understand everyone of his videos. If I don't, it's because I didn't yet watch any prerequisite videos that he tells you at the beginning to watch.
    He never takes for granted that you understand some detail. This is the BIGGEST freakin' mistake of educators. Some damn variable in a formula that they forget to explain. Also, he will use the simplest example possible so that you understand.
    I am returning to school, grad school in the ML track for computer science. I don't remember much of the math that I took 20 years ago. This guy is a lifesaver. Wish I watched these when I started. I will be watching all of his videos.
    After I graduate and make some money, I'm sending him some bucks thru Patreon.
    Thanks man!

    • @statquest
      @statquest  4 года назад

      BAM! Thank you very much! I think I must have "resting condescending voice" - because several people have made the comment that I sound a little condescending - but trust me this is not intentional! :)

    • @Steve-3P0
      @Steve-3P0 4 года назад +1

      @@statquest It's actually reassuring. You know, when you are talking to someone who is freaking out? And you make it sound like "Dood, this not that hard."

    • @statquest
      @statquest  4 года назад

      @@Steve-3P0 Nice! :)

  • @laiscarraro9960
    @laiscarraro9960 3 года назад +4

    Thank you josh and the FFGDUNCCH (the friendly folks from the genetics department at the university of north carolina at chapel hill)

  • @hemantdas9546
    @hemantdas9546 4 года назад +1

    Just wow thanks Josh. You are just great. One doubt however, if k values are large will outliers not affect my algo? Effect of outliers in knn? Please answer.

    • @statquest
      @statquest  4 года назад +1

      I believe that large values for K will provide some protection from outliers.

  • @atifayaz3495
    @atifayaz3495 4 года назад +5

    When I search for something and find it on StatQuest channel. Super BAM!!

  • @THEMATT222
    @THEMATT222 2 года назад +5

    Your videos are K-nearest perfection :)

    • @statquest
      @statquest  2 года назад +2

      Ha! Very funny.

    • @THEMATT222
      @THEMATT222 2 года назад +2

      @@statquest Noice 👍 Thanks 👍

  • @rahulparmar2814
    @rahulparmar2814 2 года назад +2

    Bam!! Now ik things better than before.... Double Bam!! .

  • @fafamnzm3126
    @fafamnzm3126 6 лет назад +9

    Your videos are sooo great, I can't stop watching 💖💖 thank you

    • @statquest
      @statquest  6 лет назад

      Hooray!!!!

    • @fafamnzm3126
      @fafamnzm3126 6 лет назад +1

      StatQuest with Josh Starmer can you add an ICA as well?

    • @statquest
      @statquest  6 лет назад

      It's on the to-do list, but it might be a while before I get to it.

    • @fafamnzm3126
      @fafamnzm3126 6 лет назад +1

      StatQuest with Josh Starmer 😔😕 that's sad, but i look forward to it. You explain beautifully sir! 💪🏼👊🏼

  • @칸츄리혜올희
    @칸츄리혜올희 6 месяцев назад +3

    This channel is salt of the Earth

  • @ayush612
    @ayush612 6 лет назад +6

    This is by far the best video on KNN algo ! Thanks Josh

    • @ayush612
      @ayush612 6 лет назад

      You are doing awesome work Sir..have watched your other videos as well..very intuitive and logically explained

  • @derpfaceonigiri4950
    @derpfaceonigiri4950 6 лет назад +6

    Thank you! This helped me so much in understanding KNN faster :D

  • @grishmareddy7882
    @grishmareddy7882 3 месяца назад +1

    These videos are just amazing and clearly are extremely successful in simplifying topics that are usually thought of as difficult. Can you please also make videos on its code in python/R..? and of naive bayes too maybe. That would be super useful. Thank you very much for this level of awesome content.

    • @statquest
      @statquest  3 месяца назад

      I'll keep that in mind.

  • @prekshyabasnet6854
    @prekshyabasnet6854 5 лет назад +6

    Clear and concise explanation. Thank you :)

  • @stalindavid8208
    @stalindavid8208 6 лет назад +6

    Thank you for your Clear explanation.

  • @HerambLonkar95
    @HerambLonkar95 2 года назад +2

    Came here to learn kNN, ended up learning guitar!

  • @hianjana
    @hianjana 5 лет назад +5

    Your video is amazing as always... It would be great if you can include how to choose the value for 'k' and evaluation metrics for kNN. Also, if I understand it right, there is no actual "training" happening in kNN. It is about arranging the points on the cartesian plane and when a new data point comes, it will again be placed on the same plane and depending on the value of "k", it will be classified. Correct me if I'm wrong.

    • @startrek3779
      @startrek3779 2 года назад

      Hi. Yes, you are right. KNN is easy to implement and understand and has been widely used in academia and industry for decades. You may utilise the cross-validation technique and the validation datasets to select the value for k.

  • @jovinMendes
    @jovinMendes 3 года назад +4

    Thank you so much. So useful honestly - i didnt get this from a 2 hour lecture

    • @statquest
      @statquest  3 года назад +1

      Glad it was helpful!

  • @qicai3682
    @qicai3682 Год назад +2

    This channel is GOD SENT. Period.

  • @mastermike890
    @mastermike890 7 лет назад +3

    awesome! You should do a quadratic discriminant analysis to go with your awesome one on LDA

  • @sarvesh_7736
    @sarvesh_7736 4 года назад +1

    Hello Josh, can you please explain why 3 nearest neighbours are Orange and 1 nearest neighbour is green (the red one looks closest to the black spot to me)?
    I might have misunderstood the meaning of k nearest neighbours, though
    PS: loved your explanation, thank you!

    • @statquest
      @statquest  4 года назад +1

      What time point, minutes and seconds, are you asking about?

    • @sarvesh_7736
      @sarvesh_7736 4 года назад

      @@statquest I'm sorry I forgot to mention it, it's at 2:29

    • @statquest
      @statquest  4 года назад

      @@sarvesh_7736 Of the 11 colored dots that are closest to the big black dot, 7 of them are red, 3 of them are orange and 1 one of them is green.

  • @krishnadaskp21
    @krishnadaskp21 4 года назад +5

    Intro "Na na na na na na.. StatQuest"

  • @ShafniSide
    @ShafniSide Год назад +1

    Thank you so much for saving our time sir❤ love from Srilanka 🇱🇰

  • @learntry9074
    @learntry9074 2 месяца назад +1

    sad that finding this now. happy that I found it , rather not finding ever :)

    • @statquest
      @statquest  2 месяца назад +1

      better late than never! :)

  • @BeSharpInCSharp
    @BeSharpInCSharp 4 года назад +3

    I stopped watching the videos given by my college now when I stumbled upon these

  • @lucarauchenberger628
    @lucarauchenberger628 2 года назад +3

    I can't believe how good you are at explaining this. wow!!!

  • @shawteex3
    @shawteex3 7 месяцев назад +1

    thank you so much for this video! i have my midterm tomorrow and im so scared :(

  • @sanaali3069
    @sanaali3069 Год назад +1

    My 10 year old hums statquest song made me realise I my new obsession with this

  • @rrrprogram8667
    @rrrprogram8667 6 лет назад +2

    Ohhh.. I got confused with K-means clustering and K-nearest neighbors......

  • @ah2522
    @ah2522 4 года назад +1

    Is it actually based on just vote? or are the votes weighted based on the distance to the data point?

    • @statquest
      @statquest  4 года назад

      You can do it either way. If you add weights, then it is "weighted K-NN"

  • @eltajbabazade1189
    @eltajbabazade1189 3 года назад +3

    Man, you are a legend, if I pass from the exam on Monday (which I am pretty hopeless), I will buy one of your shirts next month

    • @statquest
      @statquest  3 года назад +2

      Hooray! Good luck with your exam! :)

    • @eltajbabazade1189
      @eltajbabazade1189 3 года назад +1

      @@statquest Hey, I failed :D but still, I learnt a lot, thanks!

    • @statquest
      @statquest  3 года назад

      @@eltajbabazade1189 Better luck next time! :)

    • @MillerMoore-gq2pe
      @MillerMoore-gq2pe 3 месяца назад +1

      @@eltajbabazade1189 I hope you graduated successfully 🙂.

  • @saramansour3124
    @saramansour3124 5 лет назад +1

    Hello Josh how are you. I was wondering if you may kindly explain the Naive Bayes, to be clearly explained :)

  • @Oliver-nt8pw
    @Oliver-nt8pw 5 лет назад +6

    Thank you, very clear and to the point explanation !

  • @lilmoesk899
    @lilmoesk899 7 лет назад +2

    Good stuff, thanks! Do you have any videos about survival analysis?

  • @msb3559
    @msb3559 4 года назад +2

    Hail Joshua!!

  • @NiceHirwa
    @NiceHirwa 4 месяца назад +1

    WOWW! This was super helpful!
    Thanks Josh!

    • @statquest
      @statquest  4 месяца назад

      Glad it was helpful!

  • @hamzamhadhbi3060
    @hamzamhadhbi3060 3 года назад +1

    is considering this my favourite channel makes me a nerd ?

    • @statquest
      @statquest  3 года назад +1

      It makes you awesome! :)

  • @arshadshaik822
    @arshadshaik822 3 года назад +1

    Ohhh man this so simple
    Thqqq for this type of explanation

  • @zaneford9851
    @zaneford9851 3 года назад +1

    I liked the video immediately after hearing the guitar intro

  • @yashmehta4481
    @yashmehta4481 3 года назад +1

    When we have categorical variable like Yes/No or type of job (which can take four values: business, healthcare, engineering, or education), how can we calculate distances? Is knn useful at all?

    • @statquest
      @statquest  3 года назад +1

      If there is a distance metric, then it KNN will work, and there are distance metrics for categorical variables. See stackoverflow.com/questions/2981743/ways-to-calculate-similarity/2983763#2983763

  • @bealynor
    @bealynor Год назад +1

    awesome explanation ! thank you so much!

  • @Guinhulol
    @Guinhulol 8 месяцев назад +1

    I am brushing up on my ML terminology and StatQuest always comes to the rescue!! BAM!

  • @DRmrTG
    @DRmrTG 8 месяцев назад +1

    Where would I be without StatQuest? Luckily, I now have the statistical tool to estimate this!

  • @m0tivati0n71
    @m0tivati0n71 Год назад +1

    Wow! such a great explainer

  • @rockfordlines3547
    @rockfordlines3547 Год назад +1

    watch for the stats, stay for the intro songs

  • @evancooper7336
    @evancooper7336 3 года назад +1

    Quite dissapointed at the lake of bams

  • @nmana9759
    @nmana9759 4 года назад +1

    How should i cite this information?

    • @statquest
      @statquest  4 года назад +1

      You can cite StatQuest the same way you would cite any other website or RUclips video. Here is an example:
      StatQuest. “R-squared explained” RUclips, Joshua Starmer, 3 Feb. 2015, ruclips.net/video/2AQKmw14mHM/видео.html.

  • @davidk7212
    @davidk7212 2 года назад +2

    That opening banjo solo is prettt sweet.

  • @taetaereporter
    @taetaereporter Год назад +1

    lifesaver! thank you!

  • @silvenlau7436
    @silvenlau7436 2 года назад +1

    I love you sir! Your video save my life!

  • @amine1995athlete
    @amine1995athlete 3 года назад +1

    You're a legend ! Thank you :)

  • @raptorrv1828
    @raptorrv1828 4 года назад +2

    BAM! Amazing explanation!

  • @husamalsalek2333
    @husamalsalek2333 4 года назад +2

    one video explained better than a whole semester

  • @ItsAllRelative
    @ItsAllRelative Год назад

    Thanks for the very informative info ! Though I have a question , if my dataset is filled with just categorical string data. So no numerical data . Is there a way I can still use knn to predict ? I heard about encoding the string to numerical value but that seems very complex with big dataset .

    • @statquest
      @statquest  Год назад

      If you use R, then you can use a Random Forest to cluster anything and then apply KNN to that clustering: ruclips.net/video/sQ870aTKqiM/видео.html If you don't use R, you can use target encoding: ruclips.net/video/589nCGeWG1w/видео.html

  • @jaewoochoi1187
    @jaewoochoi1187 2 года назад +1

    Thanks for your youtube :)

  • @henriquenonenmacher8701
    @henriquenonenmacher8701 4 года назад +1

    You are amazing! Thank u so much.
    Cheers from BRAZIL

  • @Karthikeyan__j__
    @Karthikeyan__j__ 3 года назад +1

    finally...! i can pass my exams..!1

  • @kuangliew
    @kuangliew 7 лет назад +2

    Amazing explanation! Thank you!

  • @ratnakaramsravanti2042
    @ratnakaramsravanti2042 Год назад +1

    Good job ! I loved the videooo :)

  • @parisbeauty5750
    @parisbeauty5750 4 года назад +1

    Hi . Thanks for video . About the concept of KNN , how the location of unknown cell change in scatter plot . You must change the location of that? And second question, we should change the value of k to reach best k ?

    • @statquest
      @statquest  4 года назад +1

      The location of the "unknown cell" is fixed - it does not change. Just the classification changes. I offer a few thoughts about how to pick 'k' at 4:12, but, other than that, you can use cross validation: ruclips.net/video/fSytzGwwBVw/видео.html

    • @parisbeauty5750
      @parisbeauty5750 4 года назад +1

      StatQuest with Josh Starmer I got it . Thank you 🙏 😀

  • @TheSuperninja10
    @TheSuperninja10 6 лет назад +2

    I like your bandcamp!

    • @statquest
      @statquest  6 лет назад +1

      Hooray! Thank you! :)

  • @grovvy_essence.1070
    @grovvy_essence.1070 2 года назад +1

    Loved it.... Thank you 😊

    • @statquest
      @statquest  2 года назад

      Glad you enjoyed it!

  • @durgaprasadup5236
    @durgaprasadup5236 5 лет назад +1

    Explanation was very very very good. Easily understandable by anyone almost.
    Can you please do a video on KFold and StratifiedKFold with an example using python. Also, can you explain cross_val_score in details

    • @statquest
      @statquest  5 лет назад +1

      I have a video on cross validation that covers the concepts in K-Fold cross validation. That might not be exactly what you are looking for, but just in case, here is the link: ruclips.net/video/fSytzGwwBVw/видео.html

  • @snowdenfu
    @snowdenfu 3 года назад +1

    very machine learning voice 😂

  • @pawel040408
    @pawel040408 2 года назад +1

    It was super simple indeed!

  • @NoMeVayasDePr0
    @NoMeVayasDePr0 Год назад +1

    Very well explained and loved your uke intro by the way :)

  • @il5083
    @il5083 2 года назад +1

    Very clear, I got the idea of this concept right away.
    Well done, thanks!

  • @daniekpo
    @daniekpo 3 года назад +1

    Great explanation! BAM! Great illustrations! Double BAM!!

    • @statquest
      @statquest  3 года назад +1

      Thank you very much! :)

  • @matthewerwine8333
    @matthewerwine8333 17 дней назад +1

    Great video man

  • @ja1211
    @ja1211 4 месяца назад +1

    Omg thank you so much

  • @sabindawadi741
    @sabindawadi741 Год назад +1

    Simple and Clear explanation. Thank you!

  • @sillycat-s6m
    @sillycat-s6m 7 месяцев назад

    I don't understand the purpose of using PCA. Since we have a dataset of known categories, why can't we directly calculate the Euclidean distance between samples of unknown categories and samples of known categories and determine the category to which they belong, like K-means. This sounds stupid but I'm actually a little confused.

    • @statquest
      @statquest  7 месяцев назад

      PCA can help remove noise from the data and it can also make it easier to see what the data look like. That said, it's totally optional.

  • @aminah363
    @aminah363 Год назад +1

    Please do a video on K-Medoid

  • @sciencetrainee3583
    @sciencetrainee3583 4 года назад +2

    THANK YOU!
    YOU HAVE SAVED ME :D

  • @abenadonkor2796
    @abenadonkor2796 2 года назад +1

    thank you so much.This was well explained.

  • @suparnaroy2829
    @suparnaroy2829 Год назад +1

    Hey Josh! This is just a thank you note saying if I pass the upcoming exam, then it would be all because of you! ❤

    • @statquest
      @statquest  Год назад +1

      Good luck!!! Let me know how it goes!

    • @suparnaroy2829
      @suparnaroy2829 Год назад +1

      @@statquest It went well, thank you! Hopefully I get good grades. I was thinking of suggesting that it would be great if you could cover Markov Chain Monte Carlo and related topics. Thank you again! Your channel has been incredibly helpful!

    • @statquest
      @statquest  Год назад

      @@suparnaroy2829 I'm glad it went well! And I'll keep those topics in mind.

  • @aashidm
    @aashidm 6 месяцев назад

    00:10 K-nearest neighbors is a simple algorithm for classifying data.
    00:50 Clustering data using PCA and classifying new cell type
    01:29 K-nearest neighbors classifies new data based on nearest annotated cells.
    02:12 K-nearest neighbors algorithm assigns a category based on the majority of nearest neighbors' votes.
    02:59 K-nearest neighbors algorithm classifies unknown points based on nearest neighbors
    03:40 K-nearest neighbors can avoid ties by using an odd K value.
    04:22 Choosing the best value for K is crucial for K-nearest neighbors.
    05:01 Categories with few samples are outvoted

    • @statquest
      @statquest  6 месяцев назад

      You forgot the bam! :)

  • @mahipkhandelwal3072
    @mahipkhandelwal3072 3 года назад +1

    Summarised in a very short video....just perfect

  • @fairchild9able
    @fairchild9able 6 лет назад +3

    Elegant! BAM

  • @VH-yg8rx
    @VH-yg8rx Год назад +1

    Dang. Simple and to the point! Thank you!

  • @dvkmrishab4850
    @dvkmrishab4850 5 лет назад +1

    Is it possible for you to add naive bayes classifier to the statquests?

    • @statquest
      @statquest  5 лет назад

      It's on the to-do list, and I just bumped it a little closer to the top with your vote.

    • @dvkmrishab4850
      @dvkmrishab4850 5 лет назад

      Thank you!!!

  • @WordofSpirit
    @WordofSpirit 3 года назад

    Low values of K(k=1 or K=2) can be noisy. But in your example, the cells are evenly space. K=1 seems to be perfect or do not have outliers. Or do you mean that in real cases, there is a cluster and not evenly spread like yours?

    • @statquest
      @statquest  3 года назад

      This is just an example of the principals behind k-nearest neighbors.

  • @mukeshkumar-kh2fh
    @mukeshkumar-kh2fh 2 года назад

    sir can we replace NaN value of column by mean in such a way that if other parameter value is in a particular range than find the mean and replace .
    Example..if column BMI has NaN value then if age of that person is 45 then we first find the mean BMI of people with a age of range 40 to 50 and replace with this.Similarly,for other person have NaN BMI ... then first check the age of that person and set an interval age and find mean and replace...

    • @statquest
      @statquest  2 года назад +1

      There are a million ways to fill in missing data. Maybe one day I'll do a whole video on the topic.

  • @heatherhutchinson3625
    @heatherhutchinson3625 4 года назад +1

    watching at 2x speed is recommended. :)

  • @ArunKumar-yb2jn
    @ArunKumar-yb2jn 2 года назад

    Sorry, but you lost me at 0:54. What is PCA? Do you think expanding that abbreviation will help new comers like me?

    • @statquest
      @statquest  2 года назад

      Here's a video that explains PCA: ruclips.net/video/FgakZw6K1QQ/видео.html

  • @maryamzirak6291
    @maryamzirak6291 10 месяцев назад

    That is awsom how you explain this topics. One suggestion, you could show how the 7 nearest ist red, 3 nearest ist orange and 1 nearest is green for the point in the middle. By my eyes, the 1 nearest neigbour ist still red! and it makes me confuse what does nearest means actually :)

    • @statquest
      @statquest  10 месяцев назад

      What time point, minutes and seconds, are you referring to?