K-Nearest Neighbors (KNN) with R | Classification and Regression Examples

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • Provides concepts and steps for applying knn algorithm for classification and regression problems.
    R code: github.com/bkr...
    Data file: github.com/bkr...
    More ML videos: goo.gl/WHHqWP
    R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.
    #knn #ml #machinelearning #datascience

Комментарии • 174

  • @zeenpc5645
    @zeenpc5645 5 лет назад +14

    I watched a lots of tutorial on RUclips, but nothing can compare to this.
    Simple and straight to point, great editing and teacher who was born with natural talent called teaching

  • @neerajraut6473
    @neerajraut6473 5 лет назад +7

    U have this way of simplifying things. I personally have learnt a lot from all your videos and i am always eager and waiting for your new videos. I know you must be having a very busy schedule sir, but if you could upload videos more frequently, it would mean so much to all of us :)

    • @bkrai
      @bkrai  5 лет назад

      Thanks for your comments and feedback!

  • @mapa5000
    @mapa5000 Год назад +1

    outstanding explanation !! ... In my opinion, there's not even 1 second of waste in this class. Thank you Prof ! Greetings from Houston !!!

    • @bkrai
      @bkrai  Год назад

      You are welcome!

  • @DanTaninecz
    @DanTaninecz 5 лет назад +2

    You are a great teacher. The explanations for sub equations are very very appreciated.

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @crossray974
    @crossray974 3 года назад +1

    Now I am back again with my next task, when i search in youtube i do like: k-means clustering Bharatendra :) to find the best learning stuff. Thanks again Dr. Bharatendra!

    • @bkrai
      @bkrai  3 года назад

      You are very welcome!

  • @crossray974
    @crossray974 3 года назад +2

    Dear Dr. Bharatendra, this is amazing stuff again. Thank you so much for sharing your great knowledge!

    • @bkrai
      @bkrai  3 года назад

      Most welcome!

  • @joychakroborty7541
    @joychakroborty7541 4 года назад +1

    easy....descriptive ..... all in 1 package ... waiting to hear more from you SIR

    • @bkrai
      @bkrai  4 года назад

      Thanks for your comments!

  • @rengarrick2020
    @rengarrick2020 5 лет назад +3

    this channel is very underrated thanks bro

    • @bkrai
      @bkrai  5 лет назад

      Thanks!

  • @hellomacha4388
    @hellomacha4388 4 года назад +1

    You are explaining in a nice manner

    • @bkrai
      @bkrai  4 года назад

      Thanks for comments!

  • @eluwila
    @eluwila 3 года назад +1

    No words about the explanation. Perfect
    Thank you very much

    • @bkrai
      @bkrai  3 года назад

      You are most welcome!

  • @danniely270
    @danniely270 4 года назад +1

    this channel is better than any of the lecture in my uni. I would pay to learn from you if there was any chance.

    • @bkrai
      @bkrai  4 года назад +1

      Thanks for your feedback and comments!

  • @meshackamimo1945
    @meshackamimo1945 5 лет назад +4

    as usual,awesome communication skills at work here...Keep up!

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @Asdfasdffff
    @Asdfasdffff 5 лет назад +3

    Big thanks! From Kazakhstan;)

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @khananwar4305
    @khananwar4305 5 лет назад +3

    Thankyou.. Thankyou so much sir... You are bringing new ways to teach us and making it easy to grasp the concept...thankyou please make some end to end case studies approach to make us understand the whole pipeline... 🙏...thankyou for the videos..

    • @bkrai
      @bkrai  5 лет назад

      Thanks for feedback and suggestion! I'm adding end case studies to my list.

  • @kenjitakatuzi2824
    @kenjitakatuzi2824 5 лет назад +1

    Thank very much from Brazil. Your videos are helping me a lot!

    • @bkrai
      @bkrai  5 лет назад +1

      Thanks for comments!

  • @amnindersekhon779
    @amnindersekhon779 3 года назад +1

    Thank you very much Dr. Rai

    • @bkrai
      @bkrai  3 года назад

      You're most welcome!

  • @monserratdelcano5902
    @monserratdelcano5902 4 года назад +2

    PERFECT !!!!!! amazing teacher !!

    • @bkrai
      @bkrai  4 года назад

      Thank you! 😃

  • @nazlimehrazar8234
    @nazlimehrazar8234 4 года назад +1

    Millions of thanks, it is a perfect training. Going to all the details. You saved my life :D

    • @bkrai
      @bkrai  4 года назад +1

      Thanks for comments!

  • @flamboyantperson5936
    @flamboyantperson5936 5 лет назад +1

    Wow Sir what a great teaching. I have been waiting for you videos and you have come with a new topic and I am so happy to see in my notification that you have uploaded a video. Thank you so much for the great content and immense knowledge. KNN was a much needed topic from your side and you have done it. Thank you so much sir :)

    • @bkrai
      @bkrai  5 лет назад

      Thanks for your feedback and comments!

  • @ravindarmadishetty736
    @ravindarmadishetty736 5 лет назад +3

    Important points are covered sir....Thank you

    • @bkrai
      @bkrai  5 лет назад

      Thanks for feedback!

    • @ravindarmadishetty736
      @ravindarmadishetty736 5 лет назад

      Sir, can you make one video on upsampling, downsampling, both and Smote with example

  • @azzeddinereghais8943
    @azzeddinereghais8943 2 года назад

    Thank you sir for all your valuable lectures
    I hope that in the near future, you will discuss compositional analysis (CoDa) and explain how to perform Robust PCA and Clustering

  • @RahulDas-ki7tg
    @RahulDas-ki7tg 5 лет назад +1

    Sir, I am working as a data analyst in a NBFC, I am building a predictive model of the probability of default of a customer. I have used logistics regression and random forest. Sir which algorithm will be the best to get optimum results. Another request to you sir of you make tutorial vedio for all this ml algorithm in python script. I and the people like myself will be glad and benefited.Thank you so much Sir for sharing of your valuable knowledge.

    • @bkrai
      @bkrai  5 лет назад

      Have you tried xgb? Usually it is known for getting good results. Here is the link:
      ruclips.net/video/woVTNwRrFHE/видео.html
      In addition, I'll also do some python videos in next few months.

  • @debasishmishra1493
    @debasishmishra1493 4 года назад +2

    Sir if cross validation is carried out while using training data, than what is the necessity to divide the data into train and test dataset? I mean what if we use entire dataset for cross validation ...will that suffice? And there was one request..if you can make a video on ensembling using caret package, for different classifiers!

    • @bkrai
      @bkrai  4 года назад +1

      It still better to use test data. During cross validation at some point or the other, all training data points are used. Test data will be something not seen by the model. Also thanks for the suggestion!

    • @debasishmishra1493
      @debasishmishra1493 4 года назад +1

      @@bkrai Thank you sir. Your videos help me lot in my research work.

    • @bkrai
      @bkrai  4 года назад

      You are very welcome!

  • @lovebhartiya
    @lovebhartiya 5 лет назад +1

    nice explanation

    • @bkrai
      @bkrai  5 лет назад

      thanks for comments!

  • @kavyashree228
    @kavyashree228 5 лет назад +2

    Thanks a lot for all your video.

    • @bkrai
      @bkrai  5 лет назад +1

      Thanks for feedback!

  • @dimplekashyap1
    @dimplekashyap1 5 лет назад +2

    Keep up the good work.

    • @bkrai
      @bkrai  5 лет назад

      Thx for comments!

  • @lmoraferia
    @lmoraferia 3 года назад +1

    Awesome video Sir! Thanks for sharing.
    I used to do knn analysis by using functions from FNN library; however I noticed you are using caret library. How could I identify the best library for this ML method and others?
    Also, What would you recommend for implementing knn method for product recommendation in an online store? , Should I use R?
    Thank you and best regards from Mexico

    • @bkrai
      @bkrai  2 года назад

      Seeing this today. Caret is more versatile. For product recommendation, you can certainly use R.

  • @galk32
    @galk32 5 лет назад +2

    Thank you very much sir!

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @muhammadsulaiman6
    @muhammadsulaiman6 3 года назад +1

    Thank you sir very informative for finding k value.
    Why does k=1 in knn give the best accuracy?

    • @bkrai
      @bkrai  3 года назад +1

      It depends on data.

  • @dileep3549
    @dileep3549 5 лет назад +2

    Thanks a lot sir . It is very helpful

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @saisaranv
    @saisaranv 5 лет назад +1

    Simply superb explanation and eagerly waiting for the videos on ML AND AI. If u don't mind can u help like how to predict more of positive cases in the above classification prob?? In my job im facing the issue of classification prob . Even I have tried with class imbalance with help of u r video . Used rose and undersamples methods also. Still getting the worse results. Pls help us to tune for the high sensitivity in classification problems

    • @bkrai
      @bkrai  5 лет назад +1

      Have you tried xgb? Usually it is known for getting good results. Here is the link:
      ruclips.net/video/woVTNwRrFHE/видео.html

    • @saisaranv
      @saisaranv 5 лет назад

      @@bkrai thanks u so much for the reply sir... I will try today

    • @saisaranv
      @saisaranv 5 лет назад

      small doubt..while working with xgboost can i need to correct class imbalance? or can i directly work with my original data (my data is having 0.98938295 0.01061705 ), pls advise me

  • @Didanihaaaa
    @Didanihaaaa 5 лет назад +2

    Good job! Thanks

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments!

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    Would request pls made a seperate video on model training.Which comprises all the methods

    • @bkrai
      @bkrai  4 года назад

      The video will become too long. I've this playlist that has 10 must know machine learning methods:
      ruclips.net/p/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O

  • @LUCA-xv2sq
    @LUCA-xv2sq 2 года назад +1

    Dear Dr.Bharatendra, thank you a lot for your videos, I'm new to ML and I developed the following workflow using a for loop:
    1-Split data into 80/20
    2-Train on the 80
    3-Test on the 20
    4-Get confusion matrix-->get metrics like accuracy
    5-repeat the same thing n times
    6-average the accuracy from the confusion matrix you got n times.
    is this correct? when I look at your tutorial or the caret package, to evaluate a model they take the accuracy from the training phase before any testing, and then they do the testing/prediction once! is my method correct? because I feel I'm testing n times, and thus my ML have seen all my data in the process since in every iteration, he get the 80 then test on the 20 and repeat.
    Should I use the accuracy I got from the confusion matrix to compare between algorithm or not?
    Thank you a lot again!

    • @bkrai
      @bkrai  2 года назад

      Refer to this playlist for detailed coverage:
      ruclips.net/video/s23CMIjfwHk/видео.html

  • @tabarakhossain4830
    @tabarakhossain4830 4 года назад +2

    Sir your tutorial is outstanding comparing to the others.IBut I face a problem here.When I prepare the knn model then an error occurred and that is
    fit

    • @bkrai
      @bkrai  4 года назад

      Check your data and see why variable 'admit' was not found. Probably you are using different data that doesn't have 'admit' variable.

  • @SandeepKumar-me6qr
    @SandeepKumar-me6qr 5 лет назад +2

    Thank you sir for the lesson. I have one doubt, how to decide the value of k for our model?

    • @bkrai
      @bkrai  5 лет назад

      This algorithm gives you the best value of k.

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    And for the same at 9:22 it is being used as set.seed(222)

    • @bkrai
      @bkrai  4 года назад

      It ensures repeatability of results. With a different number your result can be different compared to what I got.

  • @drkim2
    @drkim2 5 лет назад +1

    Thanks !!

    • @bkrai
      @bkrai  5 лет назад

      Welcome!

  • @esar1499
    @esar1499 4 года назад +1

    Hello mr Rai! I would like to ask you a question? Do we need to standarize in LDA as well if the independent variables have totally different magnitudes???( I am asking because the discriminant functions are already scaled as you mentioned in the LDA video)

    • @bkrai
      @bkrai  4 года назад +1

      If in doubt, always do it. Standardization has negative impact.

  • @deepthibhadran4181
    @deepthibhadran4181 4 года назад +1

    Sir can you please make one video on TOPSIS and compromise programming method

    • @bkrai
      @bkrai  4 года назад

      Thanks, I've added it to my list.

  • @nikinlee3032
    @nikinlee3032 4 года назад +1

    Hello sir. Thanks for your video so much. I learn much R method. And I am confused that except for KNN, do other machine learning methods need standardization for the data? if so, what methods need?

    • @bkrai
      @bkrai  4 года назад

      If in doubt, it is better to do it. Because if you do not do it when needed, then threre could be problem.

  • @kenn756
    @kenn756 5 лет назад +2

    hi, I really like your videos. can you please do a video on stacking?

    • @bkrai
      @bkrai  5 лет назад

      Thanks for comments and suggestion! I've added it to my list.

  • @vigneshappanraj6381
    @vigneshappanraj6381 4 года назад +1

    If categorical variables are in independent variables,don't we have to create dummy variables, or the package will deal with it ?

    • @bkrai
      @bkrai  4 года назад

      It's always better to convert categorical independent variables to dummy variables.

  • @kebag1
    @kebag1 5 лет назад +1

    Thanks for sharing! If we wanted to use the other method of normalizing the data, what would be that R code?

    • @bkrai
      @bkrai  5 лет назад

      Let's say your data is called "data" and the variables you want to standardize is "V1". Then the code will be:
      (data$V1 - min(data$V1)) / ((max(data$V1) - min(data$V1))

    • @kebag1
      @kebag1 5 лет назад

      thank you so much! I tried that but the fit() doesn't work. I had replace the preProcess with normalize.@@bkrai

  • @ianjiang2949
    @ianjiang2949 3 года назад +1

    OMG this is so helpful

    • @bkrai
      @bkrai  3 года назад

      Thanks for comments!

  • @harish00784
    @harish00784 3 года назад +1

    After finding the optimal k value, (ie)k=33 in above example. If we want to use that k value and find accuracy onceagain, where we must include that found k value? I mean, applying library (class) -> getting labels and put the found k value. Does this work?

    • @bkrai
      @bkrai  3 года назад

      The model will automatically use optimal k values for predictions.

  • @flamboyantperson5936
    @flamboyantperson5936 5 лет назад +3

    Hello Sir, when we can expect your new videos? Thank you

    • @bkrai
      @bkrai  5 лет назад +1

      This month I'll work on few more.

    • @flamboyantperson5936
      @flamboyantperson5936 5 лет назад +1

      That would be great Sir. Eagerly waiting for next video. @@bkrai

    • @bkrai
      @bkrai  5 лет назад

      Thanks!

  • @azmatraja5767
    @azmatraja5767 3 года назад +1

    FIRST OF ALL THANKS FOR THE VIDEOS, SIR WHAT WE NEED TO DO FOR THREE LEVELS CLASS

    • @bkrai
      @bkrai  3 года назад +1

      Three levels should work fine.

    • @azmatraja5767
      @azmatraja5767 3 года назад

      @@bkrai thank you
      yes i tried for 3 class got the results
      is it okay with efficiency 70.6%?

  • @yogifirman5081
    @yogifirman5081 3 года назад +1

    Why when you want to do confusion matrix error data` and `reference` should be factors with the same levels.

    • @bkrai
      @bkrai  3 года назад

      We need to compare apples to apples.

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    17:59 Model Performance

  • @akhlaqdanish374
    @akhlaqdanish374 Год назад +1

    Sir i have a problem, here we face for the given problem Sir please help me.
    Error in model.frame.default(form = data.f$Consanguinity ~ ., data = training, :
    variable lengths differ (found for 'Age')

    • @bkrai
      @bkrai  Год назад

      check your data and make sure they have same length.

  • @000Requiem
    @000Requiem 2 года назад +1

    Why you did not scale the data before running the KNN?

    • @bkrai
      @bkrai  2 года назад

      Check at 10:30 point. That's where it was addressed.

  • @babajee4644
    @babajee4644 5 лет назад +2

    its fine informative, helpful but how we get the table for practice???

    • @bkrai
      @bkrai  5 лет назад

      Here is the link that is now also added below the video.
      Data file: goo.gl/D2Asm7

  • @surajitnandy9222
    @surajitnandy9222 5 лет назад +2

    Thanks for sharing this video. I'm a new learner. I'm trying to replicate the same code for practice not sure why I'm getting Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument file

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    11:02 Model Performance

  • @faroqueahmed3606
    @faroqueahmed3606 2 года назад

    Dear sir, Thanks for your great content. Can you help, how can we get the first 5 nearest neighbors of a row (data point) based on the euclidean distance by KNN in R?

  • @oowe679
    @oowe679 5 лет назад

    Sir, 2 questions..chas is factor variable in the original data. Does it require to convert to numeric for knn? Secondly there is high correlation between independent variables (indus,nox,tax,dis) How to handle in the model?

    • @dhavalpatel1843
      @dhavalpatel1843 4 года назад

      k-NN involves calculating distances between datapoints, we must use numeric.

  • @nokiamadyaningrum9116
    @nokiamadyaningrum9116 5 лет назад

    Hello Mr. Bharatendra. I am a student in my thesis using K-Nearest Neighbor on Rstudio, I want to ask you, how do you calculate variable importance manually, sir? And how do I display the error level with a graph from the results of cross validation, sir?

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    hello sir can you explanin this syntax set.seed(1234) what does this 1234 describes in the bracket

    • @bkrai
      @bkrai  4 года назад

      It ensures repeatability of results. With a different number your result can be different compared to what I got.

  • @mohamedabusheha4574
    @mohamedabusheha4574 5 лет назад

    Hi, I have question about the K-Nearest Neighbor method with regression, can we get the estimates coefficient of the predictors variable as the regular regression. Also, if you don't mind would you provide us with a new topic that explain how to use Support vector with regression?.

  • @alessandrorosati969
    @alessandrorosati969 Год назад +1

    when i have non-numeric variables in my dataset i can use K-Nearest Neighbors?

    • @bkrai
      @bkrai  Год назад

      You need to find a way to covert them into numbers first.

    • @alessandrorosati969
      @alessandrorosati969 Год назад

      @@bkrai Thanks

  • @ksaghir10
    @ksaghir10 3 года назад

    Hi,
    how can i get the confusion matrix for the model with the highest accuracy (testing range of values of k), if I perform LOOCV using whole data instead of splitting into test and training set

  • @arsalanriaz7784
    @arsalanriaz7784 4 года назад +1

    can we use the model for more than two classes?
    like red, blue, purple and black

    • @bkrai
      @bkrai  4 года назад

      Should work fine.

  • @purinjikongaboss4900
    @purinjikongaboss4900 4 года назад +1

    Sir, Boston housing dataset is missing can add the link please

    • @bkrai
      @bkrai  4 года назад

      The data used here is admission data. Link is available in the description area.

  • @ryleehall4295
    @ryleehall4295 5 лет назад +1

    I am trying to use KNN for my survey data, but almost all of the variables are factors/categorical. Can I still use the knn method?

    • @bkrai
      @bkrai  5 лет назад

      yes that should work fine.

    • @ryleehall4295
      @ryleehall4295 5 лет назад

      @@bkrai I was able to get K=29, but the issue I am having is that all of the values for the confusion matrix are 0 and statistics are all NaN (since it is categorical data). Do you know of a way to get the accuracy in this case?
      Also, I have another question. If I have my training data as 80% and testing data as 20% how can I do prediction without getting a length error?
      Sorry for all of the questions, I just want to understand how to do this correctly

  • @amnindersekhon779
    @amnindersekhon779 3 года назад +1

    Dr. Rai why p=o.75 for caret?

    • @amnindersekhon779
      @amnindersekhon779 3 года назад +1

      I tried ??caret in R studio, where the package stated p=0.75, and I am little confused with this. Thanks in advance. Amninder

    • @bkrai
      @bkrai  3 года назад

      At what point in the video are you referring to?

  • @thejuhulikal6290
    @thejuhulikal6290 3 года назад

    Hello sir, thanks again for this algorithm video , at the step of variable importance I am getting this error
    "Error in auc3_(actual, predicted, ranks) :
    Not compatible with requested type: [type=character; target=double]."
    What I should do to solve this!!
    Thanks again

  • @RevenueRocketeers
    @RevenueRocketeers 3 года назад

    I am getting an error varImp(fit) - as error auc3 _

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    17:23 Boston housing Pricing Data Partition

  • @anuragraju1159
    @anuragraju1159 4 года назад

    Sir, i am not able to understand trainControl and train functions in R, though i have gone through the documentation. Is there any video explaining these functionalities?

    • @asthamalhotra2345
      @asthamalhotra2345 4 года назад

      hi traincontrol is from a package names caret. It is a faster method to train models with additional paramenters like repetitions, scaling (all done simultaneously instead of doing it in different steps) topepo.github.io/caret/ this is the GitHub of the caret package developer max kuhn. Hope this helps

  • @ZeeNoorTrip
    @ZeeNoorTrip 2 года назад +1

    y do we need to use variable as factor?

    • @bkrai
      @bkrai  2 года назад

      Probably this link can help answer:
      ruclips.net/video/ftjNuPkPQB4/видео.html

    • @ZeeNoorTrip
      @ZeeNoorTrip 2 года назад +1

      @@bkrai thank you
      Do you have any video which contain 200 variables and 1000s of observations

    • @bkrai
      @bkrai  2 года назад

      not yet.

  • @sharmiochannel
    @sharmiochannel 4 года назад +1

    how to implement from scratch KNN?

    • @bkrai
      @bkrai  4 года назад

      Implementation depends on business background. Where are you planning to implement it?

  • @surajitnandy9222
    @surajitnandy9222 5 лет назад +1

    Sorry here's the code which I'm using . I'm trying to replicate the same code for practice not sure why I'm getting Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument file

  • @a.latifpatwary3826
    @a.latifpatwary3826 5 лет назад

    How can I run some specific k values like k= 1, 15, 40, 80, 120 in knn regression tuneGrid?

    • @dhavalpatel1843
      @dhavalpatel1843 4 года назад

      knn = KNeighborsClassifier(n_neighbors=1)
      knn.fit(X, y)
      y_pred = knn.predict(X)
      print(metrics.accuracy_score(y, y_pred))
      (For 1)

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    "Data Partition" 7:28

  • @tarkatirtha
    @tarkatirtha 4 года назад +1

    Your content is VERY GOOD but sound quality is bad. Also pls do NOT experiment with visual effects in such videos.

    • @bkrai
      @bkrai  4 года назад

      Thanks for suggestions!

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    "Classification" 5:54

  • @earlymorningcodes6100
    @earlymorningcodes6100 4 года назад +1

    "K nearest neighbour method" 8:31

    • @bkrai
      @bkrai  4 года назад +1

      Thx

    • @bkrai
      @bkrai  4 года назад

      Thx

  • @mutindafestus5619
    @mutindafestus5619 5 лет назад

    hello
    can i connect with you ?