Applying and Understanding K-Nearest Neighbors (KNN) in R

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 52

  • @josejdiazcaballero1646
    @josejdiazcaballero1646 2 года назад +8

    The video does not show the predictions and if the predictions were good or bad, otherwise a very clear explanation of everything else.

    • @josejdiazcaballero1646
      @josejdiazcaballero1646 2 года назад

      Maybe something like this
      ##create confusion matrix
      tab

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад +2

      @@josejdiazcaballero1646 Thanks for the contribution! I somehow missed the prediction portion. (Which is incredibly important). But, the code you provided should do the trick.

    • @patriklindstrom6424
      @patriklindstrom6424 Год назад

      Hello! The predictions and test_labels are of different lengths. How do solve this? @@josejdiazcaballero1646

    • @patriklindstrom6424
      @patriklindstrom6424 Год назад

      This should do the trick:
      tab

  • @N1loon
    @N1loon 3 года назад +6

    Nice video! But as far as I understand, isn't the statement at 01:45 wrong? The smaller the K value, the greater the overfit and not the greater the underfit...?

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад +3

      Great catch! You are absolutely correct. I mixed up k value meanings. The smaller k value leads to greater OVERFIT. This is true because you are finding groupings for each pair. And, this will tremendously skew your predictions.

  • @sudiptomitra
    @sudiptomitra 3 года назад +2

    Good visualization of graphical plots. I'd have liked more if you'd have had shown selecting the optimum k-value, plotting of k-values with respective accuracy rates & confusionMatrix.

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад +2

      I'm glad you liked it! There are definitely quite a few ways on choosing k-values. I just stuck with the rule of thumb. Visualizing the results of K-values would have been a great addition to the video. And, why didn't I think of the confusion matrix! haha.
      I'll keep your suggestions in mind for future content! :)

  • @chiningnick
    @chiningnick Месяц назад

    KNN with k=1 is generally considered overfit.

  • @maarten4132
    @maarten4132 3 года назад +2

    Hello, but how do you make a prediction that has new value's for the variables. Many thanks

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад +1

      If you mean by new variables as in new features (columns altogether), you can't. :(

    • @maarten4132
      @maarten4132 3 года назад +1

      @@SpencerPaoHere hello Spencer, I mean I need to predict in which category my species falls. For example I have information that sepal length is 6, sepal width is 4, petal length is 1.5 and petal width is 0.4. I just want to know with these variables where my new data point will be categorised with the knn model. For example logistic regression can do this, but my data is non-linear… many thanks, also for the video,
      Maarten

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад +3

      Oooh. Yes. You can use the predict() function.
      I.e predict(model_name, new_observations)
      Where new_observations have the same number of features as input.

    • @maarten4132
      @maarten4132 3 года назад

      @@SpencerPaoHere thank you, I will try it out! 🙂

  • @baruchschwartz819
    @baruchschwartz819 3 года назад

    unfortunately, last plot is obstructed by the preview of your next videos. Otherwise, nice job! I would be curious to how to deal with non numeric data as well!

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад

      WOOPS! Sorry about that. I tweaked the video recommendations so that you can see at least a snippet of the plot :)

  • @eastbrunswickcricketleague4139
    @eastbrunswickcricketleague4139 2 года назад +1

    how do i check the accuracy % of knn predictor?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      Jose commented on this video stating that you can do the following:
      Maybe something like this
      ##create confusion matrix
      tab

  • @anisrozi4903
    @anisrozi4903 Год назад

    hi, i would to know if its about to impute missing values using knn.. can we use the same code?

    • @SpencerPaoHere
      @SpencerPaoHere  Год назад

      Hello! There are certain features that can help with imputations and one of them can be knn! I did do a video on imputations which might be helpful -> ruclips.net/video/MpnxwNXGV-E/видео.html

  • @أسيدمحمد-ه2ز
    @أسيدمحمد-ه2ز 3 года назад

    Realy good job.
    I wont to how you get the
    Test_lables

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад +1

      Thanks for the support!
      Can you provide a timestamp or row number so that I can check out what you are trying to retrieve?

    • @أسيدمحمد-ه2ز
      @أسيدمحمد-ه2ز 3 года назад

      @@SpencerPaoHere row # 26

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад +1

      @@أسيدمحمد-ه2ز In my script, row labels are data[,5] where data is the iris dataset. data = iris. So, first you would have to create the row_labels = iris[,5]. Then, you can call row_labels[-train_ind]. The reason why you were getting that error was because your "row_labels" was not initialized

    • @أسيدمحمد-ه2ز
      @أسيدمحمد-ه2ز 3 года назад

      @@SpencerPaoHere 👍 thanks alot

  • @115liamsi
    @115liamsi 3 года назад

    Hello Spencer, do you know how I can select the data which were easiest to classify and hardest to classify (highest and lowest probabilities of the correct class).

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад +1

      Hi!
      You can check out knn.probability() in the knnflex package if that is what you are looking for.

    • @115liamsi
      @115liamsi 3 года назад

      @@SpencerPaoHere thank you so much !!!!!

  • @tansutazegul8297
    @tansutazegul8297 Год назад

    how do we estimate the classification of a completely new data?

    • @SpencerPaoHere
      @SpencerPaoHere  Год назад

      Plug in the completley new dataset into your trained algorithm to obtain the predicted classifications. (Your new dataset must have the same features and feature types)

  • @euphoria1725
    @euphoria1725 2 года назад

    may i know what if the predictor is categorical? how should I do?

    • @euphoria1725
      @euphoria1725 2 года назад

      should i include categorical predictor in the knn model?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад +1

      @@euphoria1725 Yes! You can use categorical variables in knn models, but you will probably have to one hot encode the categories such that the data types are numerical. Then, you'd have to find an appropriate distance function that can handle binary outcomes.

    • @euphoria1725
      @euphoria1725 2 года назад

      @@SpencerPaoHere thanks for the reply! what if the categorical variable have 3 or more values and is not ordinal?
      like "english", "Chinese", "Japanese", can I set them as 1, 2, 3?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад +1

      @@euphoria1725 One hot encoding will take care of multi-categorial variables in 1 column. It just creates n columns with True/False values where n is the number of categories.

    • @channingcrawford3886
      @channingcrawford3886 11 месяцев назад

      I had the same question, thanks for this explanation@@SpencerPaoHere

  • @TJAllard
    @TJAllard 3 года назад

    Good video, although the plotting doesn't work. I did like the walk-thru though.

    • @SpencerPaoHere
      @SpencerPaoHere  3 года назад

      Glad you liked it! How curious. Can you give me a timestamp where the plotting does not work?

    • @gabrielhimelfarb1757
      @gabrielhimelfarb1757 3 года назад

      in the plot, at geom_text(labes=predicted) instead of test_labels

    • @aaronbeyerlein3076
      @aaronbeyerlein3076 3 года назад +1

      @@gabrielhimelfarb1757 I had same (i think) problem. error was a mismatch between length(data) and length of the geom_text(aes(label = test_labels). I had assigned the wrong value to test_labels. I had row_labels[train_ind] but should be row_labels[-train_ind]. Plotting works then.

  • @ilhembenhenda3416
    @ilhembenhenda3416 2 года назад

    Can you help me i need knn with Mapreduce in R

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      ooh. Fancy. What's the question?

    • @ilhembenhenda3416
      @ilhembenhenda3416 2 года назад

      I need algorithm knn with Mapreduce classification with language r

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      @@ilhembenhenda3416 Hmm. If I am following correctly, you are having issues streaming data using Mapreduce to a knn algorithm? i.e distributed compute from some database?

    • @ilhembenhenda3416
      @ilhembenhenda3416 2 года назад

      @@SpencerPaoHere yes exactly that

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад

      @@ilhembenhenda3416 Now, that is a loaded question probably worth a video on its own. Unless there is a specific question, I won't be able to help.

  • @firstlast-wz9jv
    @firstlast-wz9jv Год назад

    I appreciate sharing the knowledge... but I don't think that it's a good video to learn. Thank you though.

  • @zqadri08
    @zqadri08 2 года назад

    how do i check the accuracy % of knn predictor?

    • @SpencerPaoHere
      @SpencerPaoHere  2 года назад +1

      Jose commented on this: You can try the following
      ##create confusion matrix
      tab