At 40:45, an error was being faced. Remove those inverted commas from lowercase = 'True'. Correct code: feature_extraction = TfidfVectorizer(min_df=1,stop_words='english',lowercase=True)
00:03 Building a Spam Mail Prediction System using Machine Learning with Python 01:55 Classify spam and non-spam mails using machine learning 05:38 Importing the dependencies for machine learning project 07:38 Splitting data for training and testing 12:08 Replacing null values with null strings in the mail data 14:18 Printing the first five rows and checking the size of the data set 18:12 Labelling data to predict outcomes 20:16 Separating text and labels for spam mail prediction 24:05 Splitting dataset into training and test data 25:46 Splitting data for training and testing in machine learning 29:15 Feature extraction converts text data to numerical values 31:31 TF Idea Vectorizer assigns scores to words based on frequency in data set 34:53 Preprocessing steps in spam mail prediction using machine learning 36:36 Converting messages into numerical values using feature extraction 40:22 Converting text data into numerical values using TF-IDF vectorizer 42:10 Training logistic regression model with extreme features and y-train 46:03 Comparing predicted values with true values to determine accuracy on training data. 48:06 Model achieved 96.5% accuracy on test data 51:31 Building a predictive system for spam mail prediction 53:14 Data preparation and feeding into machine learning model 57:19 Implementing if-else condition based on prediction values 59:08 Predict if mail is spam or not using machine learning 1:02:24 Building a spam mail prediction system using logistic regression Crafted by Merlin AI.
These videos are really great and the explanation is very clear . You are going over everything from basics. I have a question. Similar to the heart disease prediction project (using different parameters ) can we look at the ECG data and predict abnormalities in the heartbeat or arrhythmia .
Can we do the feature Vectorization before splitting it into training and test data? When I tried the same I was getting an error while Building the model.
Thanks, your content is well explained but also consider mentioning the strengths and weaknesses of models you use and how to overcome them with some ml other alternative models. For example, for that project, Support Vector Machines could perform better than Logistic regression for spam email detection especially when there is nonlinearity in the features. Keep the great work.👍
At block 15, I am getting this error ValueError Traceback (most recent call last) in 3 X_test_features = feature_extraction.transform(X_test) 4 ----> 5 Y_train = Y_train.astype('int') 6 Y_test = Y_test.astype('int') how to fix this? plz help anyone
@@TARAK_4018 thank you so much for replying. I have doubts regarding internships and resume building. If u could provide ur mail ID. It will be really helpful to shape my career.
here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
All All, Can you please help me why I am getting this error "'int' object has no attribute 'lower'" at X_train_features=feature_extraction.fit_transform(X_train) line although I have code executing perfectly fine till feature_extraction=TfidfVectorizer(min_df = 1, stop_words='english', lowercase='True')
Hello sir. Your explanation is very clear. Sir can you do a project on life expectancy (WHO) dataset or can you please help me out in doing that ? Thanking you
here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
There are some things to note: 1. The dataset is an imbalance: 2. TfIDF vectorization doesn't capture the semantics meanings. It only victories based on the frequency of words Fix: 1. Do sampling techniques or add class weights 2. Go with other vectorizations and models like transformer based ones
Sorry if stupid question but how does the model determine the difference between a HAM and SPAM mail? is it purely based on the training of different spam mails and the machine learning takes into consideration the number of words, frequency of certain words and repetitive words etc using the TF-IDF calculation at tfidvectorizer(min_df=1 ...) part?
Your question is more valid than my life . I also have same question from long time and now how can I give explaination without realistic logic to my project mentor?
@Siddhardhan here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
What example input you put? There are some things to note: 1. The dataset is an imbalance: 2. TfIDF vectorization doesn't capture the semantics meanings. It only victories based on the frequency of words Fix: 1. Do sampling techniques or add class weights 2. Go with other vectorizations and models like transformer based ones
This is great work . I face this problem when using test data (ValueError: X has 3302 features, but LogisticRegression is expecting 7526 features as input.) . Need ur help.
I corrected the problem. I was wrong when I created the variable 'X_test_feature'. This is feature_extraction.transform(X_test) without 'fit' (.fit_transform)
Hello My friend...! thanks for your video.. very brilliant ..! so i used the Decision tree model and i get Accuracy on training data : 1.0, and Accuracy on test data : 0.9668161434977578 does that make scence...? do you thing that in this case using the Decision tree model is more powerful and effective than the logistic regression model..?
The 'lowercase' parameter of TfidfVectorizer must be an instance of 'bool', an instance of 'numpy.bool_' or an instance of 'int'. Got 'True' instead. getting this error on running X_train_features
At 40:45, an error was being faced. Remove those inverted commas from lowercase = 'True'.
Correct code:
feature_extraction = TfidfVectorizer(min_df=1,stop_words='english',lowercase=True)
Thanks buddy😊😊
Thanks bro.
thankyou !!!
thanks
Love u bro
00:03 Building a Spam Mail Prediction System using Machine Learning with Python
01:55 Classify spam and non-spam mails using machine learning
05:38 Importing the dependencies for machine learning project
07:38 Splitting data for training and testing
12:08 Replacing null values with null strings in the mail data
14:18 Printing the first five rows and checking the size of the data set
18:12 Labelling data to predict outcomes
20:16 Separating text and labels for spam mail prediction
24:05 Splitting dataset into training and test data
25:46 Splitting data for training and testing in machine learning
29:15 Feature extraction converts text data to numerical values
31:31 TF Idea Vectorizer assigns scores to words based on frequency in data set
34:53 Preprocessing steps in spam mail prediction using machine learning
36:36 Converting messages into numerical values using feature extraction
40:22 Converting text data into numerical values using TF-IDF vectorizer
42:10 Training logistic regression model with extreme features and y-train
46:03 Comparing predicted values with true values to determine accuracy on training data.
48:06 Model achieved 96.5% accuracy on test data
51:31 Building a predictive system for spam mail prediction
53:14 Data preparation and feeding into machine learning model
57:19 Implementing if-else condition based on prediction values
59:08 Predict if mail is spam or not using machine learning
1:02:24 Building a spam mail prediction system using logistic regression
Crafted by Merlin AI.
Excellent explanation and 100% beginner friendly
Thanks you sid i have implemented this professionally as a ticket classification using this project idea you are really a hero
Glad it helped😇
This is the best tutorial video i have ever washed. Thank you. Please you do a project life expectancy?
These videos are really great and the explanation is very clear . You are going over everything from basics. I have a question. Similar to the heart disease prediction project (using different parameters ) can we look at the ECG data and predict abnormalities in the heartbeat or arrhythmia .
hi! thanks 😇. we can use ecg data to predict the heart condition. we just need to have the suitable data. it can be image data or numerical data.
@@Siddhardhan Thanks for your response. Can you post a video for that project or help me with the project?
Thank you so much for this video even as a beginner I understood this very well
found this very helpful. Thank You💯
Hope you get the best head bro!! Your videos are super helpful
Can we do the feature Vectorization before splitting it into training and test data?
When I tried the same I was getting an error while Building the model.
I am getting error at X_train_features=feature_extraction.fit_transform(X_train)
Thanks, your content is well explained but also consider mentioning the strengths and weaknesses of models you use and how to overcome them with some ml other alternative models. For example, for that project, Support Vector Machines could perform better than Logistic regression for spam email detection especially when there is nonlinearity in the features. Keep the great work.👍
Exactly, KNN would work perfectly.
Thanks a lot brother, learning a lot from your videos.
You explain so well.... Thanks a lot
the code is not giving outputs for spam for every mail we are getting output as ham mail
Yes , same here.
That's mean I have to give ham mail in this prediction system
@@nasiriqbal4258editorif we give spam also ,it's predicting ham only
I am facing the same problem !
Sir why you have not did this project with the help of KNN algorithm . KNN is also used for classification?
Now in this which algorithm is used
Bro KNN can be used on non linear data which is in this case it is not, it is either 0 or 1 which is linear
Broo error occuring in this program while we insert a another mail in dataset in input its occurs error in tfidvector what can shall we doo????
clearly understood.. thank you Bro..
Which machine algorithm has used here?
At block 15, I am getting this error
ValueError Traceback (most recent call last)
in
3 X_test_features = feature_extraction.transform(X_test)
4
----> 5 Y_train = Y_train.astype('int')
6 Y_test = Y_test.astype('int')
how to fix this?
plz help anyone
same error...did you get any solution
Just do this
feature_extraction = TfidfVectorizer(min_df = 1, stop_words='english', lowercase=True)
@@RiyaSaini-zk6gg thanks was a big help
@@RiyaSaini-zk6gg Thank you
@@RiyaSaini-zk6gg thank you so much
Getting an error in this line. Y_train = Y_train.astype('int')
Already changed 'True' -> True. But still showing error
same...found any changes??
New subscriber. Thanks
Hii sir, I have small dought, topics of curriculum in the discription is enough to start career as machine learning
hi! I would suggest u to learn Deep Learning as well.
@@Siddhardhan ok tq sir
Thank you very much sir this project is very useful to me
did you mention this project in your resume???
Yes
@@TARAK_4018 thank you so much for replying. I have doubts regarding internships and resume building. If u could provide ur mail ID. It will be really helpful to shape my career.
Thank you so much sir, this project explaination really helped me a lot, can I get documentation for this project??
here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
Nice learned a lot
sir i got an error at building predictive system.
ValueError: X has 6 features, but LogisticRegression is expecting 7431 features as input.
All All, Can you please help me why I am getting this error "'int' object has no attribute 'lower'" at X_train_features=feature_extraction.fit_transform(X_train) line although I have
code executing perfectly fine till feature_extraction=TfidfVectorizer(min_df = 1, stop_words='english', lowercase='True')
Hello sir. Your explanation is very clear. Sir can you do a project on life expectancy (WHO) dataset or can you please help me out in doing that ? Thanking you
here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
Also add data visualization and hyperparameter tuning
Sir
Please tell me where can i find documentation for this project
Sir can you tell which hardware and software's are use in this project
sir is this a model based application of email spam detection ......then what about real time application??of itt
did you get the answer bro??
Sir, in google colaboratory from feature extraction onwards its show's an error please crt it and re upload it sir..........
Did you clear the error
Thank you so much 🍁
Welcome 😊
Hello sir,i got an error in feature extraction
replace true with 1
when I pass any spam mail its show ham mail ? why
the dataset is imbalanced. you'll have to resample it
@@ashupochh8310how we have to resample ..??
@@kiranwaskle5487 you need a balanced dataset. Increase the number of spam mails or decrease the number of ham mails.
There are some things to note:
1. The dataset is an imbalance:
2. TfIDF vectorization doesn't capture the semantics meanings. It only victories based on the frequency of words
Fix:
1. Do sampling techniques or add class weights
2. Go with other vectorizations and models like transformer based ones
Thanks a lot !
using which algorithm project is prepared is it naive
bayes
Sorry if stupid question but how does the model determine the difference between a HAM and SPAM mail? is it purely based on the training of different spam mails and the machine learning takes into consideration the number of words, frequency of certain words and repetitive words etc using the TF-IDF calculation at tfidvectorizer(min_df=1 ...) part?
Your question is more valid than my life . I also have same question from long time and now how can I give explaination without realistic logic to my project mentor?
it's based on the frequency of all the words
thanks, can you make an e-commerce application recommendation system? using merging method svm and query expansion?
sir im getting error couldnt find that version that satisfy the accuracy_score
can you please tell me wht to do sir
sir where do you get all this excel sheets please provide the link sir
kaggle
When you paste spam categprized text the model also recognizes it as ham. Tested it a few times :(
how i can inter a new spam email for test the code?
one question nlp is library ya one software?
What are the algorithms are used in these program
it was a very helpfull tutorial thank you also your datasets are always so good can you provide a link for them please
hi! you're welcome. link given in description
@Siddhardhan here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
I have the same doubt
how can I correct this?
Change Labelling 17:36
sir can I make this project in jupyter notebook please reply sir i have to complete my project
i'm getting an error in feature extraction - invalid literal for int() with base 10: 'ham'
Chance 'True' --> True
@@ms_007_ still getting error..any changes?
Same
Which machine learning algorithm is used here plzzzz replay
Sir i need the ppt which you have presented can u pls send it 😀 Or google slides link 🔗
Sir, please tell us how to deploy this model on Streamlit?
Is there website made with yhis project??
is this project using tensorflow
Sir please tell that how can we delete spam mails after detection
which algorithm is used here
?
Is it necessary to run this model in flask or we can just finish it here?? Anyone please tell
We can use in jupyter
what is the difference between present existing system and your project?
same question
When i am giving input from dataset it works prefect , but not for the one on my own
Why it gives wrong output If write input on my own ??????,
same problem did you find anything to get the own result
What example input you put?
There are some things to note:
1. The dataset is an imbalance:
2. TfIDF vectorization doesn't capture the semantics meanings. It only victories based on the frequency of words
Fix:
1. Do sampling techniques or add class weights
2. Go with other vectorizations and models like transformer based ones
for giving input a spam mail it still showing 1 as output , can anyone help?
Hii bro i am facing the error while using y_test.astype the error is invalid literal for int() with base 10 .please anybody rctify the error
How to fix notfittederror in prediction system someone tell me please
This is great work . I face this problem when using test data (ValueError: X has 3302 features, but LogisticRegression is expecting 7526 features as input.) . Need ur help.
I corrected the problem. I was wrong when I created the variable 'X_test_feature'. This is feature_extraction.transform(X_test) without 'fit' (.fit_transform)
@@RodrigoGarcia-ni1zz please write ta code
@@sayantanmanna614 yes please
Thankyou saviour! I was so disturbed from past 2 days.
@@RodrigoGarcia-ni1zz please write the code
Sir can you please make a video on RUclips spam comment detection
Hello sir i would like to attend personal classes from you ready to pay the amount for the classes can you please reply to this.
it's just giving the ham mail in n=answer is not giving spam mail prediction
X_train_features showing parameter error
Sir, how can we deploy this model?
did you deploy it?
how to deploy it in a webpage?
Data set???!
Why this code show error
Thankyou!
whatever be the input the prediction is showing ham even giving the spam mail
you didn't check whether the data is balanced or imbalanced which is very important.
5:25
21:39
Hello My friend...! thanks for your video.. very brilliant ..!
so i used the Decision tree model and i get Accuracy on training data : 1.0, and Accuracy on test data : 0.9668161434977578 does that make scence...? do you thing that in this case using the Decision tree model is more powerful and effective than the logistic regression model..?
finished watching
The 'lowercase' parameter of TfidfVectorizer must be an instance of 'bool', an instance of 'numpy.bool_' or an instance of 'int'. Got 'True' instead.
getting this error on running X_train_features
Remove quotes in true then run
Please do a project on bird classification based on audio using CNN
or Online Expense tracker
bro if anyone have done their project this please message here need documentation
Can we use it in our resume
finished coding
How to take input that mail
What are the algorithms are used in these program