@@josejdiazcaballero1646 Thanks for the contribution! I somehow missed the prediction portion. (Which is incredibly important). But, the code you provided should do the trick.
Nice video! But as far as I understand, isn't the statement at 01:45 wrong? The smaller the K value, the greater the overfit and not the greater the underfit...?
Great catch! You are absolutely correct. I mixed up k value meanings. The smaller k value leads to greater OVERFIT. This is true because you are finding groupings for each pair. And, this will tremendously skew your predictions.
Good visualization of graphical plots. I'd have liked more if you'd have had shown selecting the optimum k-value, plotting of k-values with respective accuracy rates & confusionMatrix.
I'm glad you liked it! There are definitely quite a few ways on choosing k-values. I just stuck with the rule of thumb. Visualizing the results of K-values would have been a great addition to the video. And, why didn't I think of the confusion matrix! haha. I'll keep your suggestions in mind for future content! :)
@@SpencerPaoHere hello Spencer, I mean I need to predict in which category my species falls. For example I have information that sepal length is 6, sepal width is 4, petal length is 1.5 and petal width is 0.4. I just want to know with these variables where my new data point will be categorised with the knn model. For example logistic regression can do this, but my data is non-linear… many thanks, also for the video, Maarten
Oooh. Yes. You can use the predict() function. I.e predict(model_name, new_observations) Where new_observations have the same number of features as input.
unfortunately, last plot is obstructed by the preview of your next videos. Otherwise, nice job! I would be curious to how to deal with non numeric data as well!
Hello! There are certain features that can help with imputations and one of them can be knn! I did do a video on imputations which might be helpful -> ruclips.net/video/MpnxwNXGV-E/видео.html
@@أسيدمحمد-ه2ز In my script, row labels are data[,5] where data is the iris dataset. data = iris. So, first you would have to create the row_labels = iris[,5]. Then, you can call row_labels[-train_ind]. The reason why you were getting that error was because your "row_labels" was not initialized
Hello Spencer, do you know how I can select the data which were easiest to classify and hardest to classify (highest and lowest probabilities of the correct class).
Plug in the completley new dataset into your trained algorithm to obtain the predicted classifications. (Your new dataset must have the same features and feature types)
@@euphoria1725 Yes! You can use categorical variables in knn models, but you will probably have to one hot encode the categories such that the data types are numerical. Then, you'd have to find an appropriate distance function that can handle binary outcomes.
@@SpencerPaoHere thanks for the reply! what if the categorical variable have 3 or more values and is not ordinal? like "english", "Chinese", "Japanese", can I set them as 1, 2, 3?
@@euphoria1725 One hot encoding will take care of multi-categorial variables in 1 column. It just creates n columns with True/False values where n is the number of categories.
@@gabrielhimelfarb1757 I had same (i think) problem. error was a mismatch between length(data) and length of the geom_text(aes(label = test_labels). I had assigned the wrong value to test_labels. I had row_labels[train_ind] but should be row_labels[-train_ind]. Plotting works then.
@@ilhembenhenda3416 Hmm. If I am following correctly, you are having issues streaming data using Mapreduce to a knn algorithm? i.e distributed compute from some database?
The video does not show the predictions and if the predictions were good or bad, otherwise a very clear explanation of everything else.
Maybe something like this
##create confusion matrix
tab
@@josejdiazcaballero1646 Thanks for the contribution! I somehow missed the prediction portion. (Which is incredibly important). But, the code you provided should do the trick.
Hello! The predictions and test_labels are of different lengths. How do solve this? @@josejdiazcaballero1646
This should do the trick:
tab
Nice video! But as far as I understand, isn't the statement at 01:45 wrong? The smaller the K value, the greater the overfit and not the greater the underfit...?
Great catch! You are absolutely correct. I mixed up k value meanings. The smaller k value leads to greater OVERFIT. This is true because you are finding groupings for each pair. And, this will tremendously skew your predictions.
Good visualization of graphical plots. I'd have liked more if you'd have had shown selecting the optimum k-value, plotting of k-values with respective accuracy rates & confusionMatrix.
I'm glad you liked it! There are definitely quite a few ways on choosing k-values. I just stuck with the rule of thumb. Visualizing the results of K-values would have been a great addition to the video. And, why didn't I think of the confusion matrix! haha.
I'll keep your suggestions in mind for future content! :)
KNN with k=1 is generally considered overfit.
Hello, but how do you make a prediction that has new value's for the variables. Many thanks
If you mean by new variables as in new features (columns altogether), you can't. :(
@@SpencerPaoHere hello Spencer, I mean I need to predict in which category my species falls. For example I have information that sepal length is 6, sepal width is 4, petal length is 1.5 and petal width is 0.4. I just want to know with these variables where my new data point will be categorised with the knn model. For example logistic regression can do this, but my data is non-linear… many thanks, also for the video,
Maarten
Oooh. Yes. You can use the predict() function.
I.e predict(model_name, new_observations)
Where new_observations have the same number of features as input.
@@SpencerPaoHere thank you, I will try it out! 🙂
unfortunately, last plot is obstructed by the preview of your next videos. Otherwise, nice job! I would be curious to how to deal with non numeric data as well!
WOOPS! Sorry about that. I tweaked the video recommendations so that you can see at least a snippet of the plot :)
how do i check the accuracy % of knn predictor?
Jose commented on this video stating that you can do the following:
Maybe something like this
##create confusion matrix
tab
hi, i would to know if its about to impute missing values using knn.. can we use the same code?
Hello! There are certain features that can help with imputations and one of them can be knn! I did do a video on imputations which might be helpful -> ruclips.net/video/MpnxwNXGV-E/видео.html
Realy good job.
I wont to how you get the
Test_lables
Thanks for the support!
Can you provide a timestamp or row number so that I can check out what you are trying to retrieve?
@@SpencerPaoHere row # 26
@@أسيدمحمد-ه2ز In my script, row labels are data[,5] where data is the iris dataset. data = iris. So, first you would have to create the row_labels = iris[,5]. Then, you can call row_labels[-train_ind]. The reason why you were getting that error was because your "row_labels" was not initialized
@@SpencerPaoHere 👍 thanks alot
Hello Spencer, do you know how I can select the data which were easiest to classify and hardest to classify (highest and lowest probabilities of the correct class).
Hi!
You can check out knn.probability() in the knnflex package if that is what you are looking for.
@@SpencerPaoHere thank you so much !!!!!
how do we estimate the classification of a completely new data?
Plug in the completley new dataset into your trained algorithm to obtain the predicted classifications. (Your new dataset must have the same features and feature types)
may i know what if the predictor is categorical? how should I do?
should i include categorical predictor in the knn model?
@@euphoria1725 Yes! You can use categorical variables in knn models, but you will probably have to one hot encode the categories such that the data types are numerical. Then, you'd have to find an appropriate distance function that can handle binary outcomes.
@@SpencerPaoHere thanks for the reply! what if the categorical variable have 3 or more values and is not ordinal?
like "english", "Chinese", "Japanese", can I set them as 1, 2, 3?
@@euphoria1725 One hot encoding will take care of multi-categorial variables in 1 column. It just creates n columns with True/False values where n is the number of categories.
I had the same question, thanks for this explanation@@SpencerPaoHere
Good video, although the plotting doesn't work. I did like the walk-thru though.
Glad you liked it! How curious. Can you give me a timestamp where the plotting does not work?
in the plot, at geom_text(labes=predicted) instead of test_labels
@@gabrielhimelfarb1757 I had same (i think) problem. error was a mismatch between length(data) and length of the geom_text(aes(label = test_labels). I had assigned the wrong value to test_labels. I had row_labels[train_ind] but should be row_labels[-train_ind]. Plotting works then.
Can you help me i need knn with Mapreduce in R
ooh. Fancy. What's the question?
I need algorithm knn with Mapreduce classification with language r
@@ilhembenhenda3416 Hmm. If I am following correctly, you are having issues streaming data using Mapreduce to a knn algorithm? i.e distributed compute from some database?
@@SpencerPaoHere yes exactly that
@@ilhembenhenda3416 Now, that is a loaded question probably worth a video on its own. Unless there is a specific question, I won't be able to help.
I appreciate sharing the knowledge... but I don't think that it's a good video to learn. Thank you though.
how do i check the accuracy % of knn predictor?
Jose commented on this: You can try the following
##create confusion matrix
tab