People like u are gems who after working hard all days in office takes out time just to do post quality content that too being selfless I can truly understand how much good values a person has
At 10:48 , CountVectorizer() should not be performed before train_test_split(). If you do, this leads to data leakage and is not correct. Correct way is to fit_transform() on train and transform() on test data.
Hello All,This video were for the members, but many of you all had requested this video. So I have uplaoded for everyone. It is also added in NLP playlist Happy Learning!!
Hello Krish, this is a great video. I started my learning into NLP with this. A small question; I followed the entire procedure similarly and I implemented Logistic Regression at the end. It gave me a higher accuracy of 94% with less false positives and negatives as well. I'm just keen to know how PassiveAggressive Classifier is said to be better for NLP applications and why not s simple logistic regression cannot be used. Thank you :)
I am a machine learning engineer , I like that you post such videos but the problem with real data set is that there is no training data . You have to collect and create your own training data . People who are watching this video don't know what is about to hit them once they enter this field. It's not plug and play. I spend 80% of my time creating data and processing it and only 20% actually doing ML In one anomaly detection project we had to use db scan to find noise in the data then we marked the noise dp's as anomalous and cluster dp's as non anomalous. Then used that data to train our ANN.
Hello Krish - This is an amazing video. I have been watching your videos and learned many things. Wonderful contribution towards the aspiring Machine Learning engineers. I have one question, request you to clarify. After this Bag-of-words/TF-IDF model is built, for new sentences, how do we construct the input featues (to be passed to predict function of the model). If such explanation exists in any other video, please point me to that, else would request you to make a short video on this, this will be immensely helpful. Thank you again. - Samir Paul
Hello, Suppose we need to add more features in our X which are not text..i.e suppose we get a sparse matrix after count vectorizer and now we have one more feature length and we want both features.How to combine both?
Dear Krish, for being a data scientist should we need to learn SQL or something like this? If it's need then why it's absent in your data science play list or do you have any idea in future for that. I'm very confused.please info.thnx
Hi @Krish, shouldn't you create the bag of words on the X_train instead of the full dataset ? Otherwise the accuracy will not be the same when providing a new sentence
Hi Sir, I guess there is a data leakage problem. First we need to split train and test and later we have to apply Countvectorizer rite? In the video first the CountVectorizer is applied and later train and test split is done. Please clarify this.
Hey Krish. Great video. Can you let me know how can we make a predictive system once we have tried out different models and selected the one that is more accurate/effecient?
Hi Krish,, It was a nice video, but I have one ques, the condition "If score = previous_score" will satisfy every time right ? As you have set the value of previous_score to ZERO. So what is the use of this ? Don't we have to assign score value to previous_score like this "previous_score = score", after the IF condition ?
Hello krish, For the for loop which generates the corpus. Yours was done in few minutes but for my laptop (i5 7th gen, 1050tx 4gb graphics) it took more than half n hour. It's there anything i need to configure?
Hi Krish First of all thanks for such good videos! I am very new to data science so my question might sound very basic. In the current video and also in some of the other videos in the current playlist, you have mentioned that algorithms like Naive Bayes/ MultinomialNB work very well with text data. In all the samples we are converting sentences to words and then to features (having values 0 and 1 in case of BOW). So post this conversion aren't we just dealing with numeric data rather than textual data? As all the text has been converted to independent features having numeric value. Can't we just use any of classification algorithm? if yes then why we say Naive Bayes works well with text data.
Sir, how can this model predict fake news or real news, when some other external news is given Preprocessed in the same way of x[title]? Does it work for real life? Or just on testing data..
Hello, can you share that csv file? Because when I took first 100 or 200 data it includes other columns as well and call them as "unnamed 5, unnamed 6 ---- and unnamed 685". Furthermore, when I drop NaN values, all the rows are dropped and I am left with 0 rows. OR please tell me how you took first 1000 data. anything may help.
Sir,When I am trying to import the data and read in the colab, then it not is happening. There was some error due to that I have to do these changes, to read the data. df=pd.read_csv('train.csv',engine='python', encoding='utf-8',error_bad_lines=False)
@@K.S_5723 , I have to search it, but you can download from the krish folder and do the same change what I have done, it will work. I f I will get i will send you
Hi sir , Sorry but I have a question ,in the video you were telling that you are going to use text column for countvectrozer ..but you are taking title column. why?
While I was running the ipynb file you gave i can find an error that Unable to allocate 698. MiB for an array with shape (18285, 5000) and data type int64 Can you please explain this
I have followed all codes. Getting error in re.sub(). "NameError:name 're' is not defined". Do I have to install any library to run the re.sub() function?
Superb video .But while practice coding I am stuck at corpus and from there on wards it is all stuck.I have tried a number of times but to no avail.Thanks
I have exported this model as a ".sav" file using Pickle. Now, how can I test this model? I want to write a news statement and want to predict if it is true or not. Please help anyone!
sir, I have a doubt why you havent taken other parameters other than title and we are getting an accuracy of 94 pecent Iam just shocked!!!!! plz reply plz
messages.reset_index(inplace=True) im having error like: AttributeError Traceback (most recent call last) in ----> 1 messages.reset_index(inplace=True) AttributeError: 'function' object has no attribute 'reset_index' can someone help me
It was really helpful. Can u make videos on Grammer Correction using Rule based methord, Language Models & classifiers. its really hard to understand it otherwise
Sir ! I m very new to NLP, Thank you so much for this playlist... i am learning so many things from you....but plx tell me how can i fix this error? 12 from unicodedata import normalize 14 if normalize: ---> 15 cm=cm.astype('float')/cm.sum(axis=1)[:,np.newaxis] 16 print("Normalized Confusion Matrix") 17 else: AttributeError: module 'matplotlib.cm' has no attribute 'astype'
dear sir i am facing this error here: TypeError Traceback (most recent call last) in () 4 corpus = [] 5 for i in range (0, len(messages)): ----> 6 review = re.sub('[^a-zA-Z]', ' ', messages['title'][i]) 7 review = review.lower() 8 review = review.split() /usr/lib/python3.10/re.py in sub(pattern, repl, string, count, flags) 207 a callable, it's passed the Match object and must return 208 a replacement string to be used.""" --> 209 return _compile(pattern, flags).sub(repl, string, count) 210 211 def subn(pattern, repl, string, count=0, flags=0): TypeError: expected string or bytes-like object
People like u are gems who after working hard all days in office takes out time just to do post quality content that too being selfless I can truly understand how much good values a person has
Please upload Part 2, Because now we have so much of time , so keep upload new project videos
At 10:48 , CountVectorizer() should not be performed before train_test_split(). If you do, this leads to data leakage and is not correct. Correct way is to fit_transform() on train and transform() on test data.
Can you please share code how to do that
such an underrated comment....you are on point man
Hello All,This video were for the members, but many of you all had requested this video. So I have uplaoded for everyone. It is also added in NLP playlist Happy Learning!!
Bomb video 💥💥💥, 💞💞💞
Thank you sir
Thankyou soo much sir 👍
sir why you have only used the title column of the dataset for predictions there is one more column as text what about it?
Could you use sklearn for the feature extraction portion and Keras for the model portion?
Great walkthrough but look into your audio!
It's super useful🤩.... thanks for teaching ❤
Hello Krish, this is a great video. I started my learning into NLP with this. A small question;
I followed the entire procedure similarly and I implemented Logistic Regression at the end. It gave me a higher accuracy of 94% with less false positives and negatives as well.
I'm just keen to know how PassiveAggressive Classifier is said to be better for NLP applications and why not s simple logistic regression cannot be used.
Thank you :)
I am a machine learning engineer , I like that you post such videos but the problem with real data set is that there is no training data .
You have to collect and create your own training data . People who are watching this video don't know what is about to hit them once they enter this field.
It's not plug and play.
I spend 80% of my time creating data and processing it and only 20% actually doing ML
In one anomaly detection project we had to use db scan to find noise in the data then we marked the noise dp's as anomalous and cluster dp's as non anomalous. Then used that data to train our ANN.
Just got 97% accuracy by combining both title and text....using passive aggresive classifier..
Sir after this please make one video on How Pandemics impact the Financial Markets or anything regarding covid19 dataset analysis
great idea
Hi Krish, @ 8:28, why did you take messages['title'] ?? I think we should take messages['text']
Hello Krish - This is an amazing video. I have been watching your videos and learned many things. Wonderful contribution towards the aspiring Machine Learning engineers. I have one question, request you to clarify. After this Bag-of-words/TF-IDF model is built, for new sentences, how do we construct the input featues (to be passed to predict function of the model). If such explanation exists in any other video, please point me to that, else would request you to make a short video on this, this will be immensely helpful. Thank you again. - Samir Paul
Thanks Krish, please upload GBM indepth intuition.
Yes GBM is planned
Hello, Suppose we need to add more features in our X which are not text..i.e suppose we get a sparse matrix after count vectorizer and now we have one more feature length and we want both features.How to combine both?
Dear Krish, for being a data scientist should we need to learn SQL or something like this? If it's need then why it's absent in your data science play list or do you have any idea in future for that. I'm very confused.please info.thnx
Yes you need to know only the basic queries. Like group by , Joins, Order By, Select , Where , etc
Hi @Krish, shouldn't you create the bag of words on the X_train instead of the full dataset ? Otherwise the accuracy will not be the same when providing a new sentence
Sir,kindly make some basic video on Pandas
Sir you've not uploaded the video on passive aggressive classifier....Please upload it Sir!!
Hi Sir, I guess there is a data leakage problem. First we need to split train and test and later we have to apply Countvectorizer rite? In the video first the CountVectorizer is applied and later train and test split is done. Please clarify this.
You are right.
Hey Krish. Great video. Can you let me know how can we make a predictive system once we have tried out different models and selected the one that is more accurate/effecient?
Hi Krish,,
It was a nice video, but I have one ques, the condition "If score = previous_score" will satisfy every time right ? As you have set the value of previous_score to ZERO. So what is the use of this ? Don't we have to assign score value to previous_score like this "previous_score = score", after the IF condition ?
Hey did you noticed test and train data have lot of overlap in this problem? and removing overlap will lead to poor prediction by the model
@Krish Naik Sir do you provide paid personal consultation on hourly basis? Its there any way i can connect with you?
Hello krish,
For the for loop which generates the corpus.
Yours was done in few minutes but for my laptop (i5 7th gen, 1050tx 4gb graphics) it took more than half n hour.
It's there anything i need to configure?
You have to reduce the number of dataset (rows) from the csv file. Use a csv editor to do that and not MS Excel (it'll give error).
Nice sir
Thank you Kris
I got error in -- please give ans
In passive aggressive classifier algorithm:
Unexpected keywords argument n_iter(50)
Hi Krish
First of all thanks for such good videos! I am very new to data science so my question might sound very basic.
In the current video and also in some of the other videos in the current playlist, you have mentioned that algorithms like Naive Bayes/ MultinomialNB work very well with text data.
In all the samples we are converting sentences to words and then to features (having values 0 and 1 in case of BOW). So post this conversion aren't we just dealing with numeric data rather than textual data? As all the text has been converted to independent features having numeric value.
Can't we just use any of classification algorithm? if yes then why we say Naive Bayes works well with text data.
Sir @ 8:28, why did you take messages['title'] ?? I think we should take messages['text'].
krish can u make videos regarding BERT can't find any good explanations regarding the same
Yes vidoes are planned
Sir after this please make one video on How To Detect Hate Speech ...
Sir, how can this model predict fake news or real news, when some other external news is given Preprocessed in the same way of x[title]?
Does it work for real life? Or just on testing data..
Waited 2 hours, but it not executed in my lap, so tooked first 1000 data and changed worked
Hello, can you share that csv file? Because when I took first 100 or 200 data it includes other columns as well and call them as "unnamed 5, unnamed 6 ---- and unnamed 685". Furthermore, when I drop NaN values, all the rows are dropped and I am left with 0 rows.
OR please tell me how you took first 1000 data. anything may help.
Never mind I got it.
Tip for others: I was using MS Excel to remove data. Use any csv editor to remove data and not excel.
Sir,When I am trying to import the data and read in the colab, then it not is happening. There was some error due to that I have to do these changes, to read the data.
df=pd.read_csv('train.csv',engine='python', encoding='utf-8',error_bad_lines=False)
@@K.S_5723 , I have to search it, but you can download from the krish folder and do the same change what I have done, it will work.
I f I will get i will send you
Wow!
Hello Krish, I think there is a mistake while explaining True Positive & True negative. Correct me if I'm wrong.
Greta video but I have one query why you use only title feature in modelling you should use all the features
Hi sir ,
Sorry but I have a question ,in the video you were telling that you are going to use text column for countvectrozer ..but you are taking title column. why?
While I was running the ipynb file you gave i can find an error that Unable to allocate 698. MiB for an array with shape (18285, 5000) and data type int64
Can you please explain this
Increase your Ram or use Google Colab
I have followed all codes. Getting error in re.sub(). "NameError:name 're' is not defined".
Do I have to install any library to run the re.sub() function?
Just import re is enough.
Why do we use Naive Bayes for NLP problems ?
I am trying this project in kaggle with gpu enable but gpu is not working showing 0% usage there can yu tell me why
Hi guys, could anyone explain to me what the coefficients of the models are. Why do we have those numbers?
Superb video .But while practice coding I am stuck at corpus and from there on wards it is all stuck.I have tried a number of times but to no avail.Thanks
You have to reduce the number of dataset (rows) from the csv file. Use a csv editor to do that and not MS Excel (it'll give error).
Sir i need help its very very urgent i want the code for the speakers age and gender classification plz help sir i really need it.
I have exported this model as a ".sav" file using Pickle. Now, how can I test this model?
I want to write a news statement and want to predict if it is true or not.
Please help anyone!
time[15:40] - u r not able to see the because you had done reset_index - so the original index numbers have been lost
Dear sir ,
I want to ask u, can we Work on title or text . Process of remove the stopword???
Can u explain me ???
It's urgent sir
@@011_mohdanwar2 are you working on the same project buddy
Not able to implement Passive Aggressive Classifier. The argument 'n_iter' is unexpected
Sir how to predict the label of test data instances whether it is fake or real.
Can u explain fake online reviews detection using passive aggressive classifier
Hello Krish Sir your voice getting lower please play on high pitch.
Sir just with single word how we can say it's fake?? Please answer
sir, I have a doubt why you havent taken other parameters other than title and we are getting an accuracy of 94 pecent Iam just shocked!!!!! plz reply plz
Pls also tell us how to implement this in a web application
CountVectorizer can remove stop words.
I am getting parser error while uploading dataset .Please solve
Also what is "re" in line 127..it's giving error "re is not defined" when trying in different dataset.
messages.reset_index(inplace=True)
im having error like:
AttributeError Traceback (most recent call last)
in
----> 1 messages.reset_index(inplace=True)
AttributeError: 'function' object has no attribute 'reset_index'
can someone help me
review = re.sub('[^a-zA-Z]'," ", messages['title'][i]) i get the error in this line-
expected string or bytes-like object please solve this
just apply messages['title'] = messages['title'].apply(str)
Even i m getting the same issue if u get the solution please let me know
@@manishwadhwani5860 not working
Just import re there
Just import re before running that line
It was really helpful. Can u make videos on Grammer Correction using Rule based methord, Language Models & classifiers.
its really hard to understand it otherwise
Hi krish can u help us make API on this
Hello sir please send the document in this project
Krish, Voice in this video is not much clear.
can tou make it to be a simulator ??
How your classifier detect fack news
It detect every news or like only the news of dataset?
It's for that dataset
Can you please add F1, recall and precision score in this tutorial for used algos.
from sklearn.metrics import classification_report
target_names = ['FAKE', 'REAL']
print(classification_report(y_test, pred, target_names=target_names))
try this
Sir ! I m very new to NLP, Thank you so much for this playlist... i am learning so many things from you....but plx tell me how can i fix this error?
12 from unicodedata import normalize
14 if normalize:
---> 15 cm=cm.astype('float')/cm.sum(axis=1)[:,np.newaxis]
16 print("Normalized Confusion Matrix")
17 else:
AttributeError: module 'matplotlib.cm' has no attribute 'astype'
i m not able to download dataset , without practice it is waste
where is the part 2
ruclips.net/video/E9gVleivB6M/видео.html
Would have enjoyed it but the poor audio made mess of ut
aggle fake news icon is Trump ...:)
dear sir i am facing this error here:
TypeError Traceback (most recent call last)
in ()
4 corpus = []
5 for i in range (0, len(messages)):
----> 6 review = re.sub('[^a-zA-Z]', ' ', messages['title'][i])
7 review = review.lower()
8 review = review.split()
/usr/lib/python3.10/re.py in sub(pattern, repl, string, count, flags)
207 a callable, it's passed the Match object and must return
208 a replacement string to be used."""
--> 209 return _compile(pattern, flags).sub(repl, string, count)
210
211 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object