I find your's and Krish Naik's channels to be the ones which actually help us build industry grade ML projects. Rest are just doing model.fit without any given explanation.
bundle of thanks for this amazing video... apparently it's one video but it covers all the steps involved in machine learning development, data cleaning, EDA, pre-processing, model training, optimizing and good practices everything has been packed in just one video. ❤
100% easy to understand, really enjoyed 😅 a little bit is that in the middle the videos you cut for analysis but it was totally fine. Loved it, now i got my project to make it.
Brother could you share your githup of this project i am facing this issue NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator
Thank for the walkthrough! Just a small remark, while defining the word-cloud object you can set the collocations parameter to false and avoid duplicates in the final plot.
I am facing an error in reading csv file can you help? UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 606-607: invalid continuation byte
Brother could you share me your githup project link I am facing this issue NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator
Awesome teaching !!,not using just libraries for text conversion, helping to understand process under the hood, and building the functions and conditions straight from the ground logic. Very nice video
why didn't you show us that you dropped the column 'text' inorder to make work the corr() in EDA stage it wasted me around 30mins of my time in figuring it out
12:28 Build an email spam classifier website with data cleaning, EDA, feature engineering, model building, evaluation, and deployment. 24:56 EDA is performed to understand the distribution of spam and ham messages and analyze the number of characters, words, and sentences used in the SMS. 37:24 Data preprocessing steps for text data 49:52 The main problem was the error caused by a silly mistake in the code. 1:02:20 Performing Exploratory Data Analysis (EDA) on spam and ham messages 1:14:48 Naive Bayes with TF-IDF vectorization and max features of 3000 performs best 1:27:16 The best performing model for spam classification is Multinomial Naive Bayes with a precision of 99.1% and accuracy of 98.1%. 1:39:39 Built and deployed an email spam classifier using NLTK and Streamlit Crafted by Merlin AI.
I am facing an error in reading csv file can you help? UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 606-607: invalid continuation byte
NotFittedError: This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator. i am getting this error. How to resolve it?
You have to first create object of multinomial nb class like base_model =MultinomialNb() Then you can pass this mode in other multiclass classifier like model = OneVsOneClassifier(base_model)
Brother.., i cannot count the no of characters in each msg.. Because the error comes and it shows "object lf type int has no len()".. Do h have any solution to this??
hi Nitish, why are we fitting the data to countvectorizer even before splitting the data , doesnt this lead to data leakage problem there by effecting our metric ? please answer this
18:25 can I use "df['num_words']=df['text'].apply(lambda x: len(x.split()))" instead , I mean what is the difference between split() and word_tokeniser?
split() has bdefault delimiter as space. May be there are some words seperated usiing a ","(comma) or some other characters. In those scenariors, word_tokenizer can be used.. for further details, you can check the underlying working of nltk in its source code
The project was awesome and explanation was throughly enjoyed, but one doubt if i don't want to create website and need output in notebook itself what should i do. Appreciate your help
Brother could you share. GitHub link of this project I will help me I am facing issue NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator
Hi, I followed that video and used tfidf with base (not max_features =3000), While prediciting for all sorts of inputs i am getting "Not Spam"..what might be the reason?
Sir, After running the ipynb file from your github, The model gets changed and it does not run. After entering the predict button It shows "NotFittedError: This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator." Help me fix this. while cloning your code from github and running the streamlit, It shows "NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."
Facing this issue when i click on predict button please help : NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator
I didn't get heat map getting error:could not convert string to float: 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
Hi, CampusX after running import seaborn as sns it is running fine but when running histplot it is plotting graph but showing Future Warning use_inf_as_na option please tell how to remove it as when running heatmap it is giving error "could not convert string to float: 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...' "
Exactly bro, no one in the comments is talking about it. The fact is that, even the code which he himself has uploaded on github suffers from the same issue. It is really a magic how he escaped it
numeric_df = df.select_dtypes(include=['number']) sns.heatmap(numeric_df.corr(),annot=True) Just add this line after generating pairplot It is giving error because there is text present in the df, we can plot map using only numbers
Hello, thank you for the videos. But I have a question. Do we really use ML like this in profession job?? Not using any particular model, like naive bayes or maybe simply using count vectorizer to get the BOW and it can remove the stop words as well. Not sure about stemming, maybe we can do it on the do like we are doing it now. So we don't use that all and take this do it all manually approach? I am not doubting your approach but I thought that that's how we do it in professional projects. Just wanna know out of curiosity what's correct so that I can lean to that approach more. Thanks
Hi, Sir..... The precision score that we used to calculate are: TP / (TP + FP). But if we calculate from confusion matrix the precision score that we gat is different from directly applying presicion_score function. Why it is so sir ??? For all the three NB cases(gnb, mnb and bnb)
All those who are getting this error: NotFittedError: This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator. You need to put the line written below after voting.fit(X_train,y_train)... mnb.fit(X_train,y_train) Then re-run to get model.pkl and vectorizer.pkl Or u can run also gnb,mnb,bnb thankyou
"NotFittedError: This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator." this is the error i got when i enter predict button .can anyone resolved it ??
Thanks for the video, as per my understanding , using scaler before train_test_split will introduce data leakage , it is good to use scaling after train_test_split
hi, thanks for the detailed video but I don't know why I am getting an application error whenever I try to open the Heroku link, even your working demo link is giving the same error.
I did not get heatmap getting error ValueError: could not convert string to float: 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...' Please any one reply to resolve this error. 😊😊
Sir I'm facing error during heatmap time for data type string is not support in target column but when I'm using it with include int 64 it is working in that time target column is missing because string is not include
One doubt: During voting classifier and stack classfier , the Xtrain, ytrain,Xtest, ytest which you are using, are they the X train or modified prameters like max_feature=3000 or normal without anything modification
Hi Sir, Firstly, Thanks for this amazing content! Secondly, I had a question that why did we use LabelEncoder here, shouldn't we use One Hot Encoder here?
as the name suggest this is for labeling the data( here spam and ham are two labels not sentences ). but one hot encoding is used to convert the entire sentence
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 606-607: invalid continuation byte Sir, I am getting the above error while reading csv file. Please help on this sir. I downloaded the dataset from the kaggle link given by you. but I am getting this error. Kindly provide the solution.
I did the same as you did and I have double checked also but still I am getting this error: "y should be a 1d array, got an array of shape (1034, 6708) instead."
hi i have a question .why heroku is an option for deployment? kindly disuss the more option for deployment and also tell hot to deliver the project ?in which form we will deliver..in the end great content sir much appreciated
Man I made this whole project but when i tried inputting my email's spam messages and sms messages which are spam it shows as Not Spam. I don't know if it lacks accuracy or what is the issue but I tried a lot of message mostly for all of them it shows as Not Spam.
instead of entering the emails manually, if I connect to my email and have it read my own emails to compare with the dataset, how can I do this? I urgently need help.
I've already defined df, as defined in this video but it is giving me Name Error : df is not defined sometimes it is running but mostly it is giving me the error.
In streamlit when you're taking user message, how is it getting converted to vectors. That is not a document of messages. When I tried it out on a single string, i got an error saying "iterable over raw documents expected, string object received".
Sir, I have followed all the steps but I got an error: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader plz guide me... I have tried both flask and streamlit same error occurs
were you able to solve the error coz I am getting the same while deploying on heroku....And app is showing that filenotfounderror for vectorizer and model files
I find your's and Krish Naik's channels to be the ones which actually help us build industry grade ML projects. Rest are just doing model.fit without any given explanation.
also its insane how similar they both look lol
are these projects good for putting in resume plzz reply
Well explained many in a single 90 minutes videos
It's one of the best channel to learn data science free of cost with very good content
bundle of thanks for this amazing video... apparently it's one video but it covers all the steps involved in machine learning development, data cleaning, EDA, pre-processing, model training, optimizing and good practices everything has been packed in just one video. ❤
One Day this channel is going to have 1M Subscribers ❤🔥
within 1 year i guess
1B
We need recommended system from scratch
Ofc can't wait for it
!One day
Cool, this is the completed tutorial I had seen, from beginning until deployment
Keep up the great work sir!
Amazing Content!! I dont understand why this channel has small reach this content is better than any paid courses
Are you nlp expert
100% easy to understand, really enjoyed 😅 a little bit is that in the middle the videos you cut for analysis but it was totally fine. Loved it, now i got my project to make it.
Hii can you help df.corr() aint working for me, it says value error could not convert string to float
Brother could you share your githup of this project i am facing this issue NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator
I am so glad that i found this channel. i recommend this to many of my friends. Amazing content
Thanks to this channel im getting exposure to projects ,hope by the end of this playlist ill make one
Awesome method of teaching. Learned a lot from you. Very insightful content. Keep up this good work.
Thumbs up for you sir. All other videos are till the model training. You have showed the full working. Thanks !
You are best person I have ever come across. With great feature engineering got to learn many things for model deployment too.
Thank for the walkthrough! Just a small remark, while defining the word-cloud object you can set the collocations parameter to false and avoid duplicates in the final plot.
Thanks, will use it the next time
@@campusx-official but how to extract data from mail to jupyter
sir in which language u r making this project
@@campusx-official
@@Human12358 download the file in your system and use read_csv or read_excel and feed it to your dataframe
I am facing an error in reading csv file can you help? UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 606-607: invalid continuation byte
I haven't even started to watch your videos but i can feel it in my heart that I landed on the right place
Thank u very much NItish sir.😊
Very helpful playlist, thank u helping us learn ML AND DL , NLP with such interest and enthusism.
woow!, finally i made one project sir, thank you so so much really paid off for me.
what a tutorial... claen & refreshing... i appreciate your work ... respect💜
Brother could you share me your githup project link I am facing this issue NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator
Awesome teaching !!,not using just libraries for text conversion, helping to understand process under the hood, and building the functions and conditions straight from the ground logic. Very nice video
I'm loving it!! Dekh ke hi motivation mil gaya :)
Thank you for uploading this thorough end to end project... this really helped me..
The best part is you also cover app development and deployment.
Awesome method of teaching. Learned a lot from you. Very insightful content.
Thanks Sir for such an Amazing tutorials on Machine Learning Projects .The concepts you taught was so much crystal clear .
gooood nice sir your teaching method is so best for biggners
why didn't you show us that you dropped the column 'text' inorder to make work the corr() in EDA stage it wasted me around 30mins of my time in figuring it out
12:28 Build an email spam classifier website with data cleaning, EDA, feature engineering, model building, evaluation, and deployment.
24:56 EDA is performed to understand the distribution of spam and ham messages and analyze the number of characters, words, and sentences used in the SMS.
37:24 Data preprocessing steps for text data
49:52 The main problem was the error caused by a silly mistake in the code.
1:02:20 Performing Exploratory Data Analysis (EDA) on spam and ham messages
1:14:48 Naive Bayes with TF-IDF vectorization and max features of 3000 performs best
1:27:16 The best performing model for spam classification is Multinomial Naive Bayes with a precision of 99.1% and accuracy of 98.1%.
1:39:39 Built and deployed an email spam classifier using NLTK and Streamlit
Crafted by Merlin AI.
I am facing an error in reading csv file can you help? UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 606-607: invalid continuation byte
It's difficult to match your depth of knowledge and teaching style. So happy to have found this channel. Keep up the great work!
Wow !! You're doing amazing job sir!! It helps us a lot. Keep uploading such end to end videos in days to come too!!
Finally completed this project with a lot of learning...... thank you boss 🙌
I could not load dataset . can you help?
@@rinalzankar2812 hey Rinal,
Did you try to download the dataset from kaggle?
@@neeraj.kumar.1 did you deploy your website on Heroku????
@@akuljoshi7943
Yeah
@@neeraj.kumar.1 can u plz send link of your website
Thank you for the awesome content.your method of teaching is really incredible
Craziest EDA ever seen❤❤
Great job sir👏👌👌👌 amazing explanation
Is he running every cell to go to next cell
How he is going to next cell
NotFittedError: This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator. i am getting this error. How to resolve it?
@@meenalthakur1440 hey i am getting the same error but for tfid ! please help !
Same I am also facing this issue
Are you able to solve this error ??
You have to first create object of multinomial nb class like
base_model =MultinomialNb()
Then you can pass this mode in other multiclass classifier like
model = OneVsOneClassifier(base_model)
U deserve 100 billion subscribers
You Are The Best On RUclips ❤️❤️❤️
sir at 35:18 why did you copied y to text and not simply used y itself ?
Fabulous work Guru ji🙏
Brother.., i cannot count the no of characters in each msg.. Because the error comes and it shows "object lf type int has no len()".. Do h have any solution to this??
hi Nitish,
why are we fitting the data to countvectorizer even before splitting the data , doesnt this lead to data leakage problem there by effecting our metric ?
please answer this
At 53:30 it's showing the same bar graphs for both ham and spam corpus
make sure in snsplot target == 0
Thank you Nitish sir.
18:25 can I use "df['num_words']=df['text'].apply(lambda x: len(x.split()))" instead , I mean what is the difference between split() and word_tokeniser?
split() has bdefault delimiter as space. May be there are some words seperated usiing a ","(comma) or some other characters. In those scenariors, word_tokenizer can be used..
for further details, you can check the underlying working of nltk in its source code
Amazing Lecture. Learned a lot from your project explanation!
Please suggest few basics to work with before doing this project . A roadmap or something to know
The project was awesome and explanation was throughly enjoyed, but one doubt if i don't want to create website and need output in notebook itself what should i do. Appreciate your help
You can directly call the prediction function in your notebook.
Brother could you share. GitHub link of this project I will help me I am facing issue NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator
Hi, I followed that video and used tfidf with base (not max_features =3000), While prediciting for all sorts of inputs i am getting "Not Spam"..what might be the reason?
@@gokulakannanb4103 Have you deal with the data imbalance?
were you able to solve it? because even i am getting the same issue
Where u able to solve it
very good explanation thank u so much
Sir, After running the ipynb file from your github, The model gets changed and it does not run. After entering the predict button It shows "NotFittedError: This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator." Help me fix this.
while cloning your code from github and running the streamlit, It shows "NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."
hey did it get resolved?
brother i dont get it , but i'm facing the issue you have mentioned
can you say me the way to resolve this .
yeah facing same error
Thank you so much. Very good content sir. I ll definitely try this project.
It worked?
@@gokulakannanb4103 Not for me, model is creating of size 1kb only. And thus it's giving error. something is missing from original code as well.
@@vijaykiran3404 omg
As usual , your explanation is magical.Keep up your great work Sir !Thank you so much
Hi please create more videos like this but also keep in mind a more industry relevant projects. Please🙏
loved the way you orate :)
Facing this issue when i click on predict button please help : NotFittedError: This TfidfTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator
Same, you fix it? Pls help me
I didn't get heat map getting error:could not convert string to float: 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
Same to u I am facing that error.
Cloud you resolve this error in that code
Please share me resolve of code
Please reply as soon as possible😢😢
Because sir is applying with num_char,num_word,num_sent and we are using with all 4 columns @@Kalyan1143
Same I did not understand heatmap
Surprisingly this really worked out 💯
Thank you so much sir, finally i completed it...
Hi, CampusX after running import seaborn as sns it is running fine but when running histplot it is plotting graph but showing Future Warning use_inf_as_na option please tell how to remove it as when running heatmap it is giving error "could not convert string to float: 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...' "
Same problem
Exactly bro, no one in the comments is talking about it. The fact is that, even the code which he himself has uploaded on github suffers from the same issue. It is really a magic how he escaped it
@@ritilranjan7369 can you give the timestamp?
numeric_df = df.select_dtypes(include=['number'])
sns.heatmap(numeric_df.corr(),annot=True)
Just add this line after generating pairplot
It is giving error because there is text present in the df, we can plot map using only numbers
Awesome teaching👏👏
Instead of heroku we can use render but what should be procfile , setup sh , . git ignore and requirements.txt
When we are going to find the correlation it show could not convert string to float how can I overcome that
sir how do i create a directory ? 4:09
Thank you Sir for your great efforts
Hello, thank you for the videos. But I have a question. Do we really use ML like this in profession job?? Not using any particular model, like naive bayes or maybe simply using count vectorizer to get the BOW and it can remove the stop words as well. Not sure about stemming, maybe we can do it on the do like we are doing it now. So we don't use that all and take this do it all manually approach?
I am not doubting your approach but I thought that that's how we do it in professional projects. Just wanna know out of curiosity what's correct so that I can lean to that approach more. Thanks
Top notch ..🔥🔥Quality of Project.. Tysm Professor
Hi, Sir..... The precision score that we used to calculate are: TP / (TP + FP). But if we calculate from confusion matrix the precision score that we gat is different from directly applying presicion_score function. Why it is so sir ??? For all the three NB cases(gnb, mnb and bnb)
Hii can you help, df.corr() aint working for me, it says value error could not convert string to float
thank you so much , it really helped a lot.
All those who are getting this error:
NotFittedError: This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
You need to put the line written below after voting.fit(X_train,y_train)...
mnb.fit(X_train,y_train)
Then re-run to get model.pkl and vectorizer.pkl
Or
u can run also gnb,mnb,bnb
thankyou
I am getting another error
Can you please advice me how to solve it.
NotFittedError: The TF-IDF vectorizer is not fitted
Thank you. I spend 2-3 hrs to solve this error but your code magically helped me run my model.
@@swaragupta7932 cheers bro
thanks bro for solving the error
@@atharvatirkhunde4517 ty
Thanks a lottt...
Explained very clearly.
"NotFittedError: This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."
this is the error i got when i enter predict button .can anyone resolved it ??
did you get any ans
I got the same error,Did you resolved it,please help....🤒
Same problem. ..any solution
@@abhishekbhardwaj6795 naa bro
Its showing the error of ‘list’ object has no attribute ‘transform’ while predicting the spam or not . Can anyone help?
following the same process but after prediction it is only showing "not spam"
Thank you sir it was really helpful
Thank You Sir,
It was a great video, help me a lot.
Very very Thanks ❤❤
Sir apne konse algorithm use Kiya hai p
Great 👍.. amazing video
Thanks for the video, as per my understanding , using scaler before train_test_split will introduce data leakage , it is good to use scaling after train_test_split
hi, thanks for the detailed video but I don't know why I am getting an application error whenever I try to open the Heroku link, even your working demo link is giving the same error.
Same issue...
Hi, I have the same issue while it seems it has deployed successfully. Could you solve the issue?
While vectorizing the text defining X it gives an error saying list object has no attribute 'lower'.
I did not get heatmap getting error
ValueError: could not convert string to float: 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
Please any one reply to resolve this error. 😊😊
Have you got the solution on how to resolve this issue
@@CodewithAbhi03 no
You are best SIr !! respect ++
Sir I'm facing error during heatmap time for data type string is not support in target column but when I'm using it with include int 64 it is working in that time target column is missing because string is not include
This works so well ! Thankyouuuuu
One doubt: During voting classifier and stack classfier , the Xtrain, ytrain,Xtest, ytest which you are using, are they the X train or modified prameters like max_feature=3000 or normal without anything modification
great! teaching helps to understand
Sir you are the best please keep it up and can you make video about internships for ml
Hi Sir,
Firstly, Thanks for this amazing content!
Secondly, I had a question that why did we use LabelEncoder here, shouldn't we use One Hot Encoder here?
as the name suggest this is for labeling the data( here spam and ham are two labels not sentences ). but one hot encoding is used to convert the entire sentence
Label encoder is used to encode output labels. while one hot encoding is to encode the input features. That's the reason
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 606-607: invalid continuation byte
Sir, I am getting the above error while reading csv file. Please help on this sir. I downloaded the dataset from the kaggle link given by you. but I am getting this error.
Kindly provide the solution.
change column name 'v1' and 'v2' in csv file....
df = pd.read_csv('spam.csv', encoding = "ISO-8859-1")
pd.read_csv("spam.csv", encoding =('ISO-8859-1'), low_memory =False)
appreciated (y) very well done
I did the same as you did and I have double checked also but still I am getting this error:
"y should be a 1d array, got an array of shape (1034, 6708) instead."
hi i have a question .why heroku is an option for deployment? kindly disuss the more option for deployment and also tell hot to deliver the project ?in which form we will deliver..in the end great content sir much appreciated
Thank you so much brother 😊🙏
The cut scene code is compulsory or not ?I am confused
Man I made this whole project but when i tried inputting my email's spam messages and sms messages which are spam it shows as Not Spam. I don't know if it lacks accuracy or what is the issue but I tried a lot of message mostly for all of them it shows as Not Spam.
Machine learning is a vast topic what is the actual technology of machine learning is used for classifying
instead of entering the emails manually, if I connect to my email and have it read my own emails to compare with the dataset, how can I do this? I urgently need help.
Thank you for sharing this👍👍
please bring a crash course on machine learning sir it really needed
I've already defined df, as defined in this video but it is giving me Name Error : df is not defined sometimes it is running but mostly it is giving me the error.
In streamlit when you're taking user message, how is it getting converted to vectors. That is not a document of messages. When I tried it out on a single string, i got an error saying "iterable over raw documents expected, string object received".
Sir, I have followed all the steps but I got an error:
RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader
plz guide me... I have tried both flask and streamlit same error occurs
were you able to solve the error coz I am getting the same while deploying on heroku....And app is showing that filenotfounderror for vectorizer and model files
@@areeshaanjum3396 No, I was unable to debug it.
If you guys found the solution please share with me...