Это видео недоступно.
Сожалеем об этом.

K Nearest Neighbor classification with Intuition and practical solution

Поделиться
HTML-код
  • Опубликовано: 14 авг 2024
  • In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification.
    Github link: github.com/kri...
    You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
    Packt url : prod.packtpub....
    Amazon url: www.amazon.com...

Комментарии • 93

  • @programmingpurpose1329
    @programmingpurpose1329 2 года назад +2

    This explaination is one of the most precise explanation that I have seen on Internet.

  • @nelsondelarosa5490
    @nelsondelarosa5490 9 месяцев назад +1

    This is in fact well explained, defining every term, and assuming no previous knowledge. Thanks so much!

  • @sandipansarkar9211
    @sandipansarkar9211 3 года назад +7

    Cool. Also finished my practice in Jupyter notebook. Thanks

  • @kiruthigan2014
    @kiruthigan2014 3 года назад +7

    Loved Ur videos and Ur taste in music..kadhal vanthale in the bookmark 😂❤️🔥

  • @vaibhavchaudhary1569
    @vaibhavchaudhary1569 3 года назад +9

    Feature scaling (StandardScalar) should be applied after train test split. As it will not lead to information leak.

  • @Tapsthequant
    @Tapsthequant 5 лет назад +5

    Thank you, you asked a question I had in my head, looking forward to applying the suggested solution, about imbalanced dataset...

  • @sivareddynagireddy56
    @sivareddynagireddy56 2 года назад

    No words about u r explanation sir,simple lucid way explanation !!!!!

  • @padduchennamsetti6516
    @padduchennamsetti6516 10 дней назад

    congratulations krish on 1million subscribers🥳

  • @ijeffking
    @ijeffking 5 лет назад +5

    Very well explained again. Thank you so much.

  • @TechnoSparkBigData
    @TechnoSparkBigData 4 года назад +1

    Sir you are great inspiration to me. Thanks a lot for making every complex problem simpler.

  • @shyam15287
    @shyam15287 4 года назад +1

    All the best Superb Explanation you are a superb resource u will reach great heights continue ur good work

  • @903vishnu
    @903vishnu 3 года назад +1

    really its good... but you mentioned K=150, as per my knowledge we are not supposed to take even number. there might be chance of equal number of classes got selected nearest neighbor... algorithm may not be able to estimate the class for new record...

  • @shubhamsongire6712
    @shubhamsongire6712 2 года назад

    Thank you so much Krish for this great playlist. You are gem

  • @MaanVihaan
    @MaanVihaan 4 года назад

    Very nice sir ur explanation and coding technique is very nice....
    I am new learner of data science please keep uploading such video and new techniques of different kinds of algorithms which help us make easy to understand to deal with different kinds of datasets.

  • @codyphillippi8831
    @codyphillippi8831 3 года назад +4

    This is awesome! Thank you so much. I am working on a project at work for lead segmentation to help us find our "ideal lead" for our sales reps with a lot of very messy data. This is a great starting point. Quick question (might be a loaded question ha) - after we find these clusters, how do we go about seeing the "cluster profiles"? Or what all data points make up these clusters (in categorical form)

    • @joeljoseph26
      @joeljoseph26 8 месяцев назад

      use any visualization library to see the clustering.

  • @DeepakSharma-od5ym
    @DeepakSharma-od5ym 4 года назад +1

    error_rate = []
    for i in (1,40):
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train,y_train)
    pred_i = knn.predict(X_test)
    error_rate.append(np.mean(pred_i != y_test))
    plt.figure(figsize = (10,6))
    plt.plot(range(1,40), error_rate, color = 'blue', linstyle = 'dashed', marker = 'o' )
    plt.xlabel('k')
    plt.ylabel('error rate')
    My above code giving error "x and y must have same first dimension, but have shapes (39,) and (2,)"
    Please suggest

  • @deshduniya360scan7
    @deshduniya360scan7 3 года назад

    Explain like a pro,thank you

  • @sazeebulbashar5686
    @sazeebulbashar5686 2 года назад

    Thank You Naik......
    This is a very helpful video

  • @mahikhan5716
    @mahikhan5716 3 года назад +1

    how can we choose optimal value of k by KNN ?

  • @shreeyajoshi9771
    @shreeyajoshi9771 2 года назад

    Thank you very much for this video. Helped a lot!

  • @joeljoseph26
    @joeljoseph26 8 месяцев назад

    Minkowski distance = (Manhattan) and (Euclidean)

  • @abdelrhmandameen2215
    @abdelrhmandameen2215 3 года назад +1

    Great work, thank you

  • @appiahdennis2725
    @appiahdennis2725 3 года назад

    Respect Krish Naik

  • @Kishor-ai
    @Kishor-ai Год назад

    Thanks for this video krish naik sir🤩

  • @scifimoviesinparts3837
    @scifimoviesinparts3837 3 года назад

    At 18:52, you said larger value of K will lead to overfitting, which is not true. Smaller value of K leads to overfitting. I think, if there are 2 K-values giving same error, we choose the one that is bigger because it is less impacted by outliers.

  • @vignesh7687
    @vignesh7687 3 года назад +2

    Sooper Explanation Krish. I have a doubt here.. When do we need to use MinMaxScaler() and when do we use StandardScaler()? Is there any difference? or we have to try using both and see which gives better results? Please clarify

    • @ankurkaiser
      @ankurkaiser 7 месяцев назад

      Hope this answer finds you well, MinMaxScaler() an StandardScaler() are basically the same standard process except for Normalization data does't follow Gaussian Distribution and for Standardization it should. Normalization is used with models like KNN and Neutral Networks. It rescales the data between 0 to 1, so if your data doesnt't follow GD and you data ppoints are basically closer to 0/1 or if your business requirements are to normalize the data you go with MinMaxScaler(), else in general we use StandardScaler(), and its fast and easier to implement.

  • @sandipansarkar9211
    @sandipansarkar9211 3 года назад

    great explanation Krish.

  • @indrajitbanerjee5131
    @indrajitbanerjee5131 3 года назад

    Nicely explained.

  • @rambaldotra2221
    @rambaldotra2221 3 года назад

    Grateful Sir ✨✨Thanks A lot.

  • @chaithanyar9961
    @chaithanyar9961 5 лет назад +1

    Hello sir , will this code work in tensor flow?? any changes to be made if I want excecute it in tf

  • @krish3486
    @krish3486 Год назад

    Sir why there we check only 1 to 40 neighbours only in the for loop

  • @aditisrivastava7079
    @aditisrivastava7079 5 лет назад

    Very well explained

  • @ManashreeKorgaonkar
    @ManashreeKorgaonkar Год назад

    Thank you so much for sharing this information. I'd just one doubt sir if we will scale before train_test_split wont it be lead to data leakage? as during scaling process during fit when it consider average of all the data points it also take the value of test data set so my model will already have some hint regarding it??

  • @yathishl9973
    @yathishl9973 4 года назад +3

    Hi Krish, you are really amazing. I learn many things from you.
    I have a doubt, what measures should I take if the error rate increases with K-Value, please advice

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1 3 года назад +2

      Then you should decrease that k value, too small k value leads to overfitting, too large k value leads to underfitting , you have to wisely choose some middle value ☺️so that both bias and variance become as less as possible

    • @adipurnomo5683
      @adipurnomo5683 3 года назад

      If K too small that Will sensitive to outlier, if K too large that other class Will be included

  • @asawanted
    @asawanted 3 года назад

    What if we choose a K value and hit a local optima? How would we know if I should stop at that K value or proceed to a higher value in search of global optima?

  • @louerleseigneur4532
    @louerleseigneur4532 3 года назад

    Thanks Krish

  • @ramu7762
    @ramu7762 2 года назад

    spot on. thank you.

  • @laxmiagarwal3285
    @laxmiagarwal3285 3 года назад

    This is very nice video.. But I'm having one doubt..what value u are taking for calculating the mean of error rate as prediceted values are in terms of 0 and 1

  • @adhvaithstudio6412
    @adhvaithstudio6412 3 года назад

    can you exlplain how hyper parameters will helps in what scenarious

  • @shaz-z506
    @shaz-z506 4 года назад

    Hi Krish,
    Just wanna verify since you've said at 5:10 that model is ready, but KNN is instance-based learning with no parameter assumption then I don't think so that it creates any model out of the algorithm. Please let me know I'm wrong as I need some clarity.

  • @mdmynuddin1888
    @mdmynuddin1888 3 года назад

    if both category same neighbor point , than which category belongs to new data point?

  • @makrandrastogi5588
    @makrandrastogi5588 3 года назад

    can anybody tell why in most of the cases we use euclidian distance and not manhattan distance ?

  • @vishalaaa1
    @vishalaaa1 3 года назад

    Thank you

  • @princessazula99
    @princessazula99 2 года назад

    for my assignment i am not allowed to use import packages for knn but I have to write it myself. do you have a code without the imported knn method?

  • @devinpython5555
    @devinpython5555 4 года назад +1

    Could you please explain to me why fit and transform is done for the x values (in the above example leaving target column rest data is x values)

  • @colabwork1910
    @colabwork1910 2 года назад

    Awesome

  • @birendrasingh7133
    @birendrasingh7133 3 года назад

    Awesome 😁

  • @shaz-z506
    @shaz-z506 5 лет назад

    Hi Krish,
    In what scenario we'll use manhattan over euclidean.

  • @parammehta3559
    @parammehta3559 3 года назад

    Sir is it normal that sometimes as the value of n_neighbors is increasing the error rate is also increasing?

  • @madeye1258
    @madeye1258 3 года назад

    5.03 , if we are classifying the points based on the number of points next to it, then why we need to calculate the distance in step 2

    • @adipurnomo5683
      @adipurnomo5683 3 года назад +1

      Because calculate distance purpose is to sort value training data point before voting based K value.

  • @manusharma8527
    @manusharma8527 4 года назад

    i am not getting any any Classified Data csv file on Keggal.Please can you tell me the real name of that csv file

  • @vibhutigoyal769
    @vibhutigoyal769 3 года назад

    Is knn non- linear algorithm???

  • @sunilkumarkatta9062
    @sunilkumarkatta9062 3 года назад

    How we will get error value to calculate accurate k value😅

  • @Anubhav_Rajput07007
    @Anubhav_Rajput07007 3 года назад

    #Hi Krish, hope you are doing well. i trying to find the best value for K. but the code is not execute.. its running last 20 mint.

    • @techtalki
      @techtalki 3 года назад

      It will check all the cases of 'K'. If you want to speed up choose the less value of K or a smaller dataset.

  • @uchuynguyen9927
    @uchuynguyen9927 4 года назад

    where you taking np.mean(pred_i != y_test), i think it should be pred_i = knn.predict(y_test) so then we will compare the predict y_test to actual y_test, then we''ll find the errors. If i wrong can somebody explain, thank you!

    • @manikaransingh3234
      @manikaransingh3234 4 года назад +4

      No, I'm sorry but you're not right!
      actually, pred_i is already predicted values with knn model (what you say it should be, its already done in the line above)
      There is nothing like finding error because it is a classification problem, not a regression one.
      suppose,
      pred_i=[1,1,1,1,0,1,0,0,0,1]
      test_i= [1,1,1,1,1,1,0,1,0,1]
      pred_i != test_i will result in [f,f,f,f,t,f,f,t,f,f] f= False, t=True
      thenn np.mean will take mean of true values which in this case will be 0.2(The error).
      I hope you get it

    • @shivangiexclusive123
      @shivangiexclusive123 3 года назад

      @@manikaransingh3234 mean of true values mean??

    • @shivangiexclusive123
      @shivangiexclusive123 3 года назад

      how is it 0.2 ??

    • @manikaransingh3234
      @manikaransingh3234 3 года назад

      @@shivangiexclusive123 mean of true values is the number of True values divided by the sample.
      The result has 2 True values out of 10.
      2/10 = 0.2

    • @shivangiexclusive123
      @shivangiexclusive123 3 года назад

      Ok got it..thanks

  • @sriramswar
    @sriramswar 5 лет назад

    Hi Krish, unable to open ipynb file in Jupiter noteboox. Getting the below error:
    Error loading notebook
    Unreadable Notebook: C:\Users\Srira\01_K_Nearest_Neighbors.ipynb NotJSONError('Notebook does not appear to be JSON: \'\
    \

    • @krishnaik06
      @krishnaik06  5 лет назад

      Dear Sriram I am able to open the ipynb file. Please use the jupyter notebook to open the file

    • @sriramswar
      @sriramswar 5 лет назад

      @@krishnaik06 Hi Krish, I used Jupyter Notebook only. Not sure, if there is a problem at my end. Also, a suggestion! It would be better if random_state parameter is used in the code/tutorial so that everyone gets consistent results. I got different results when I tried the same code and I was confused for a moment and then understood the reason. Others may get confused, so just giving a suggestion!

    • @krishnaik06
      @krishnaik06  5 лет назад

      Then probably there may be a problem with jupyter notebook file

  • @shayanshafiqahmad
    @shayanshafiqahmad 4 года назад

    What is the reason for taking pred_i !=y_test?

    • @shivaprakashranga8688
      @shivaprakashranga8688 4 года назад

      Pred_i value contains all the prediction values (like 1,0,1,0,0,1...) upon y_test(1,0,0,1,1,...) when K=1, pred_i !=y_test takes the value which is not predicted correctly(error) . No need to take correct predicted values. ex: out of 100 data points 60 not predicted correctly wrt y_test so these 60 data points we calculate mean. This will be continue for K=2,3.. upto 40. which ever having low mean value that we consider for K (elbow method)

    • @im_joshila_aj
      @im_joshila_aj 4 года назад

      So this pred_i! =y_test will return a true /false value right? In the form of 0 and 1 and then mean will be calculated?

  • @weekendresearcher
    @weekendresearcher 4 года назад

    So, shud K always be an Odd number?

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1 3 года назад

      If you choose k value odd ,then there is more probability that tie will not occur, but still there are tie breaker available , so that we can have flexibility in choosing the k value☺️like sometime we consider weighted Nearest Neighbours or use the class with the nearest neighbor among tied groups, sometime we use a random tiebreaker among tied groups etc☺️✌️

  • @tagheuer001
    @tagheuer001 Год назад

    Why do repeat phrases so many times?

  • @sensei-guide
    @sensei-guide 4 года назад

    As my k value increase my error rate also increasing bro

    • @ahmedjyad
      @ahmedjyad 4 года назад

      It's a normal outcome and common example of overfitting. Basically, if you k value is too high, you risk the chance of having an algorithm that just outputs the class with the highest occurrence.

    • @yathishl9973
      @yathishl9973 4 года назад

      @@ahmedjyad Hi, thanks for your input, please advice how to correct it

    • @ahmedjyad
      @ahmedjyad 4 года назад

      @@yathishl9973 Changing your k value is an example of hyperparamter tuning, which is a process to find the best parameter that produces the best classification model. You shouldn't really have a very high k value because that would result in over-fitting. So basically you getting higher error as u increase the k value is basically correct itself as it is expected. I hope you understand what I am trying to say. If not feel free to reach out to me.

  • @ArunKumar-yb2jn
    @ArunKumar-yb2jn 2 года назад

    Krish - This seems to be a repeat of over a thousand similar videos on the internet, barring a few. What new insight have you brought here? You didn't define what that Y and X were and simply jumped into drawing X marks on the chart. Why do we need intuition of KNN? Why can't we really understand what it IS? This sort of explanation 'appears' to be clear, but in fact it really doesn't add to a student's understanding. Please take some actual data points and run the algorithm.

  • @dragolov
    @dragolov 3 года назад

    These are 2 musical (jazz) solos generated using K Nearest Neighbor classifier:
    ruclips.net/video/zt3oZ1U5ADo/видео.html
    ruclips.net/video/Shetz_3KWks/видео.html