Это видео недоступно.
Сожалеем об этом.
K Nearest Neighbor classification with Intuition and practical solution
HTML-код
- Опубликовано: 14 авг 2024
- In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification.
Github link: github.com/kri...
You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
Packt url : prod.packtpub....
Amazon url: www.amazon.com...
This explaination is one of the most precise explanation that I have seen on Internet.
This is in fact well explained, defining every term, and assuming no previous knowledge. Thanks so much!
Cool. Also finished my practice in Jupyter notebook. Thanks
Loved Ur videos and Ur taste in music..kadhal vanthale in the bookmark 😂❤️🔥
Neenga tamilah
Feature scaling (StandardScalar) should be applied after train test split. As it will not lead to information leak.
TRUE
Thank you, you asked a question I had in my head, looking forward to applying the suggested solution, about imbalanced dataset...
No words about u r explanation sir,simple lucid way explanation !!!!!
congratulations krish on 1million subscribers🥳
Very well explained again. Thank you so much.
Sir you are great inspiration to me. Thanks a lot for making every complex problem simpler.
All the best Superb Explanation you are a superb resource u will reach great heights continue ur good work
really its good... but you mentioned K=150, as per my knowledge we are not supposed to take even number. there might be chance of equal number of classes got selected nearest neighbor... algorithm may not be able to estimate the class for new record...
Thank you so much Krish for this great playlist. You are gem
Very nice sir ur explanation and coding technique is very nice....
I am new learner of data science please keep uploading such video and new techniques of different kinds of algorithms which help us make easy to understand to deal with different kinds of datasets.
This is awesome! Thank you so much. I am working on a project at work for lead segmentation to help us find our "ideal lead" for our sales reps with a lot of very messy data. This is a great starting point. Quick question (might be a loaded question ha) - after we find these clusters, how do we go about seeing the "cluster profiles"? Or what all data points make up these clusters (in categorical form)
use any visualization library to see the clustering.
error_rate = []
for i in (1,40):
knn = KNeighborsClassifier(n_neighbors=i)
knn.fit(X_train,y_train)
pred_i = knn.predict(X_test)
error_rate.append(np.mean(pred_i != y_test))
plt.figure(figsize = (10,6))
plt.plot(range(1,40), error_rate, color = 'blue', linstyle = 'dashed', marker = 'o' )
plt.xlabel('k')
plt.ylabel('error rate')
My above code giving error "x and y must have same first dimension, but have shapes (39,) and (2,)"
Please suggest
Explain like a pro,thank you
Thank You Naik......
This is a very helpful video
how can we choose optimal value of k by KNN ?
Thank you very much for this video. Helped a lot!
Minkowski distance = (Manhattan) and (Euclidean)
Great work, thank you
Respect Krish Naik
Thanks for this video krish naik sir🤩
At 18:52, you said larger value of K will lead to overfitting, which is not true. Smaller value of K leads to overfitting. I think, if there are 2 K-values giving same error, we choose the one that is bigger because it is less impacted by outliers.
Sooper Explanation Krish. I have a doubt here.. When do we need to use MinMaxScaler() and when do we use StandardScaler()? Is there any difference? or we have to try using both and see which gives better results? Please clarify
Hope this answer finds you well, MinMaxScaler() an StandardScaler() are basically the same standard process except for Normalization data does't follow Gaussian Distribution and for Standardization it should. Normalization is used with models like KNN and Neutral Networks. It rescales the data between 0 to 1, so if your data doesnt't follow GD and you data ppoints are basically closer to 0/1 or if your business requirements are to normalize the data you go with MinMaxScaler(), else in general we use StandardScaler(), and its fast and easier to implement.
great explanation Krish.
Nicely explained.
Grateful Sir ✨✨Thanks A lot.
Hello sir , will this code work in tensor flow?? any changes to be made if I want excecute it in tf
Sir why there we check only 1 to 40 neighbours only in the for loop
Very well explained
Thank you so much for sharing this information. I'd just one doubt sir if we will scale before train_test_split wont it be lead to data leakage? as during scaling process during fit when it consider average of all the data points it also take the value of test data set so my model will already have some hint regarding it??
Hi Krish, you are really amazing. I learn many things from you.
I have a doubt, what measures should I take if the error rate increases with K-Value, please advice
Then you should decrease that k value, too small k value leads to overfitting, too large k value leads to underfitting , you have to wisely choose some middle value ☺️so that both bias and variance become as less as possible
If K too small that Will sensitive to outlier, if K too large that other class Will be included
What if we choose a K value and hit a local optima? How would we know if I should stop at that K value or proceed to a higher value in search of global optima?
Thanks Krish
spot on. thank you.
This is very nice video.. But I'm having one doubt..what value u are taking for calculating the mean of error rate as prediceted values are in terms of 0 and 1
can you exlplain how hyper parameters will helps in what scenarious
Hi Krish,
Just wanna verify since you've said at 5:10 that model is ready, but KNN is instance-based learning with no parameter assumption then I don't think so that it creates any model out of the algorithm. Please let me know I'm wrong as I need some clarity.
if both category same neighbor point , than which category belongs to new data point?
can anybody tell why in most of the cases we use euclidian distance and not manhattan distance ?
Thank you
for my assignment i am not allowed to use import packages for knn but I have to write it myself. do you have a code without the imported knn method?
Could you please explain to me why fit and transform is done for the x values (in the above example leaving target column rest data is x values)
yes i also want to know
Because transform it for scale input data.
Awesome
Awesome 😁
Hi Krish,
In what scenario we'll use manhattan over euclidean.
Sir is it normal that sometimes as the value of n_neighbors is increasing the error rate is also increasing?
5.03 , if we are classifying the points based on the number of points next to it, then why we need to calculate the distance in step 2
Because calculate distance purpose is to sort value training data point before voting based K value.
i am not getting any any Classified Data csv file on Keggal.Please can you tell me the real name of that csv file
Is knn non- linear algorithm???
How we will get error value to calculate accurate k value😅
#Hi Krish, hope you are doing well. i trying to find the best value for K. but the code is not execute.. its running last 20 mint.
It will check all the cases of 'K'. If you want to speed up choose the less value of K or a smaller dataset.
where you taking np.mean(pred_i != y_test), i think it should be pred_i = knn.predict(y_test) so then we will compare the predict y_test to actual y_test, then we''ll find the errors. If i wrong can somebody explain, thank you!
No, I'm sorry but you're not right!
actually, pred_i is already predicted values with knn model (what you say it should be, its already done in the line above)
There is nothing like finding error because it is a classification problem, not a regression one.
suppose,
pred_i=[1,1,1,1,0,1,0,0,0,1]
test_i= [1,1,1,1,1,1,0,1,0,1]
pred_i != test_i will result in [f,f,f,f,t,f,f,t,f,f] f= False, t=True
thenn np.mean will take mean of true values which in this case will be 0.2(The error).
I hope you get it
@@manikaransingh3234 mean of true values mean??
how is it 0.2 ??
@@shivangiexclusive123 mean of true values is the number of True values divided by the sample.
The result has 2 True values out of 10.
2/10 = 0.2
Ok got it..thanks
Hi Krish, unable to open ipynb file in Jupiter noteboox. Getting the below error:
Error loading notebook
Unreadable Notebook: C:\Users\Srira\01_K_Nearest_Neighbors.ipynb NotJSONError('Notebook does not appear to be JSON: \'\
\
Dear Sriram I am able to open the ipynb file. Please use the jupyter notebook to open the file
@@krishnaik06 Hi Krish, I used Jupyter Notebook only. Not sure, if there is a problem at my end. Also, a suggestion! It would be better if random_state parameter is used in the code/tutorial so that everyone gets consistent results. I got different results when I tried the same code and I was confused for a moment and then understood the reason. Others may get confused, so just giving a suggestion!
Then probably there may be a problem with jupyter notebook file
What is the reason for taking pred_i !=y_test?
Pred_i value contains all the prediction values (like 1,0,1,0,0,1...) upon y_test(1,0,0,1,1,...) when K=1, pred_i !=y_test takes the value which is not predicted correctly(error) . No need to take correct predicted values. ex: out of 100 data points 60 not predicted correctly wrt y_test so these 60 data points we calculate mean. This will be continue for K=2,3.. upto 40. which ever having low mean value that we consider for K (elbow method)
So this pred_i! =y_test will return a true /false value right? In the form of 0 and 1 and then mean will be calculated?
So, shud K always be an Odd number?
If you choose k value odd ,then there is more probability that tie will not occur, but still there are tie breaker available , so that we can have flexibility in choosing the k value☺️like sometime we consider weighted Nearest Neighbours or use the class with the nearest neighbor among tied groups, sometime we use a random tiebreaker among tied groups etc☺️✌️
Why do repeat phrases so many times?
As my k value increase my error rate also increasing bro
It's a normal outcome and common example of overfitting. Basically, if you k value is too high, you risk the chance of having an algorithm that just outputs the class with the highest occurrence.
@@ahmedjyad Hi, thanks for your input, please advice how to correct it
@@yathishl9973 Changing your k value is an example of hyperparamter tuning, which is a process to find the best parameter that produces the best classification model. You shouldn't really have a very high k value because that would result in over-fitting. So basically you getting higher error as u increase the k value is basically correct itself as it is expected. I hope you understand what I am trying to say. If not feel free to reach out to me.
Krish - This seems to be a repeat of over a thousand similar videos on the internet, barring a few. What new insight have you brought here? You didn't define what that Y and X were and simply jumped into drawing X marks on the chart. Why do we need intuition of KNN? Why can't we really understand what it IS? This sort of explanation 'appears' to be clear, but in fact it really doesn't add to a student's understanding. Please take some actual data points and run the algorithm.
These are 2 musical (jazz) solos generated using K Nearest Neighbor classifier:
ruclips.net/video/zt3oZ1U5ADo/видео.html
ruclips.net/video/Shetz_3KWks/видео.html