Project 17. Spam Mail Prediction using Machine Learning with Python | Machine Learning Projects

Siddhardhan

Просмотров 144 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 янв 2025

Комментарии • 151

@sam340 Год назад ⁺⁷⁴
At 40:45, an error was being faced. Remove those inverted commas from lowercase = 'True'.
Correct code:
feature_extraction = TfidfVectorizer(min_df=1,stop_words='english',lowercase=True)
@sameerkumar_3441 Год назад ⁺²
Thanks buddy😊😊
@amujurihemasai6564 Год назад ⁺¹
Thanks bro.
@jagjeetkaursandhu2928 Год назад
thankyou !!!
@nada3565 Год назад
thanks
@mukesh5652 Год назад
Love u bro
@ShubhamPrajapati-ix8ij 7 месяцев назад ⁺¹²
00:03 Building a Spam Mail Prediction System using Machine Learning with Python
01:55 Classify spam and non-spam mails using machine learning
05:38 Importing the dependencies for machine learning project
07:38 Splitting data for training and testing
12:08 Replacing null values with null strings in the mail data
14:18 Printing the first five rows and checking the size of the data set
18:12 Labelling data to predict outcomes
20:16 Separating text and labels for spam mail prediction
24:05 Splitting dataset into training and test data
25:46 Splitting data for training and testing in machine learning
29:15 Feature extraction converts text data to numerical values
31:31 TF Idea Vectorizer assigns scores to words based on frequency in data set
34:53 Preprocessing steps in spam mail prediction using machine learning
36:36 Converting messages into numerical values using feature extraction
40:22 Converting text data into numerical values using TF-IDF vectorizer
42:10 Training logistic regression model with extreme features and y-train
46:03 Comparing predicted values with true values to determine accuracy on training data.
48:06 Model achieved 96.5% accuracy on test data
51:31 Building a predictive system for spam mail prediction
53:14 Data preparation and feeding into machine learning model
57:19 Implementing if-else condition based on prediction values
59:08 Predict if mail is spam or not using machine learning
1:02:24 Building a spam mail prediction system using logistic regression
Crafted by Merlin AI.
@vidhanshuborade5977 Год назад ⁺⁹
Excellent explanation and 100% beginner friendly
@vinaynaik953 3 года назад ⁺⁵
Thanks you sid i have implemented this professionally as a ticket classification using this project idea you are really a hero
@Siddhardhan 3 года назад ⁺²
Glad it helped😇
@gertrudevine9678 Год назад ⁺³
This is the best tutorial video i have ever washed. Thank you. Please you do a project life expectancy?
@tanujam 3 года назад ⁺¹⁷
These videos are really great and the explanation is very clear . You are going over everything from basics. I have a question. Similar to the heart disease prediction project (using different parameters ) can we look at the ECG data and predict abnormalities in the heartbeat or arrhythmia .
@Siddhardhan 3 года назад ⁺³
hi! thanks 😇. we can use ecg data to predict the heart condition. we just need to have the suitable data. it can be image data or numerical data.
@tanujam 3 года назад
@@Siddhardhan Thanks for your response. Can you post a video for that project or help me with the project?
@lakshmik1087 3 года назад ⁺³
Thank you so much for this video even as a beginner I understood this very well
@tejasbajwa_23Student Год назад ⁺²
found this very helpful. Thank You💯
@tejgumaste5011 Год назад
Hope you get the best head bro!! Your videos are super helpful
@subithbabu1931 Год назад ⁺³
Can we do the feature Vectorization before splitting it into training and test data?
When I tried the same I was getting an error while Building the model.
@prashantkumartiwari6797 11 месяцев назад ⁺²
I am getting error at X_train_features=feature_extraction.fit_transform(X_train)
@stephane-wamba Год назад
Thanks, your content is well explained but also consider mentioning the strengths and weaknesses of models you use and how to overcome them with some ml other alternative models. For example, for that project, Support Vector Machines could perform better than Logistic regression for spam email detection especially when there is nonlinearity in the features. Keep the great work.👍
@malla7134 Год назад
Exactly, KNN would work perfectly.
@malikumarhassan1 3 года назад
Thanks a lot brother, learning a lot from your videos.
@elfincredible9002 7 месяцев назад
You explain so well.... Thanks a lot
@mohammedshafeeq4930 2 года назад ⁺⁹
the code is not giving outputs for spam for every mail we are getting output as ham mail
@sharanganbala Год назад
Yes , same here.
@nasiriqbal4258editor Год назад
That's mean I have to give ham mail in this prediction system
@Megnamv Год назад
@@nasiriqbal4258editorif we give spam also ,it's predicting ham only
@155_meenusingh6 Год назад
I am facing the same problem !
@Hkkivines30 3 года назад ⁺⁹
Sir why you have not did this project with the help of KNN algorithm . KNN is also used for classification?
@vs_shortz_0001 2 года назад ⁺²
Now in this which algorithm is used
@MemeFYPage Год назад ⁺⁴
Bro KNN can be used on non linear data which is in this case it is not, it is either 0 or 1 which is linear
@vickyvignesh2166 8 месяцев назад
Broo error occuring in this program while we insert a another mail in dataset in input its occurs error in tfidvector what can shall we doo????
@saipavanakki8116 2 года назад
clearly understood.. thank you Bro..
@NEF7354 11 месяцев назад ⁺¹
Which machine algorithm has used here?
@36_kapilojha2 2 года назад ⁺⁶
At block 15, I am getting this error
ValueError Traceback (most recent call last)
in
3 X_test_features = feature_extraction.transform(X_test)
4
----> 5 Y_train = Y_train.astype('int')
6 Y_test = Y_test.astype('int')
how to fix this?
plz help anyone
@mayureshpatil383 Год назад ⁺¹
same error...did you get any solution
@RiyaSaini-zk6gg Год назад ⁺³
Just do this
feature_extraction = TfidfVectorizer(min_df = 1, stop_words='english', lowercase=True)
@abhilashmadhav.m536 Год назад ⁺¹
@@RiyaSaini-zk6gg thanks was a big help
@sayantanmanna614 Год назад ⁺¹
@@RiyaSaini-zk6gg Thank you
@sabyasachikarmakar8598 Год назад ⁺¹
@@RiyaSaini-zk6gg thank you so much
@ayondas8005 Год назад ⁺²
Getting an error in this line. Y_train = Y_train.astype('int')
Already changed 'True' -> True. But still showing error
@bububarks1273 10 месяцев назад
same...found any changes??
@viv1622 Год назад
New subscriber. Thanks
@sandhyareddy7957 3 года назад ⁺²
Hii sir, I have small dought, topics of curriculum in the discription is enough to start career as machine learning
@Siddhardhan 3 года назад
hi! I would suggest u to learn Deep Learning as well.
@sandhyareddy7957 3 года назад
@@Siddhardhan ok tq sir
@TARAK_4018 Год назад
Thank you very much sir this project is very useful to me
@mjanish9836 8 месяцев назад
did you mention this project in your resume???
@TARAK_4018 8 месяцев назад
Yes
@mjanish9836 8 месяцев назад
@@TARAK_4018 thank you so much for replying. I have doubts regarding internships and resume building. If u could provide ur mail ID. It will be really helpful to shape my career.
@AcharyaGagan 3 года назад ⁺³
Thank you so much sir, this project explaination really helped me a lot, can I get documentation for this project??
@rajkumarray3224 2 года назад
here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
@shalithapraveen2391 2 года назад
Nice learned a lot
@abhishekwaghmare2480 2 года назад ⁺⁴
sir i got an error at building predictive system.
ValueError: X has 6 features, but LogisticRegression is expecting 7431 features as input.
@PrinceKumar-kr5ig 4 месяца назад
All All, Can you please help me why I am getting this error "'int' object has no attribute 'lower'" at X_train_features=feature_extraction.fit_transform(X_train) line although I have
code executing perfectly fine till feature_extraction=TfidfVectorizer(min_df = 1, stop_words='english', lowercase='True')
@kahkashanfatima-0016 2 года назад ⁺²
Hello sir. Your explanation is very clear. Sir can you do a project on life expectancy (WHO) dataset or can you please help me out in doing that ? Thanking you
@rajkumarray3224 2 года назад
here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
@shivamkeshri4380 6 месяцев назад
Also add data visualization and hyperparameter tuning
@sivaprasadmaharana845 2 года назад ⁺¹
Sir
Please tell me where can i find documentation for this project
@omkargujar6507 6 месяцев назад
Sir can you tell which hardware and software's are use in this project
@sade.meghana410 2 года назад ⁺¹
sir is this a model based application of email spam detection ......then what about real time application??of itt
@madhankumar4468 9 месяцев назад
did you get the answer bro??
@BCMKISHOREKUMARS Год назад ⁺¹
Sir, in google colaboratory from feature extraction onwards its show's an error please crt it and re upload it sir..........
@sethukarasisethu6491 Год назад
Did you clear the error
@arohirathore9197 3 года назад
Thank you so much 🍁
@Siddhardhan 3 года назад ⁺¹
Welcome 😊
@-VASANTHKUMARSS Год назад ⁺¹
Hello sir,i got an error in feature extraction
@ashupochh8310 Год назад ⁺¹
replace true with 1
@nandiniverma6045 2 года назад ⁺¹
when I pass any spam mail its show ham mail ? why
@ashupochh8310 Год назад
the dataset is imbalanced. you'll have to resample it
@kiranwaskle5487 Год назад
@@ashupochh8310how we have to resample ..??
@ashupochh8310 Год назад
@@kiranwaskle5487 you need a balanced dataset. Increase the number of spam mails or decrease the number of ham mails.
@ruznyma 4 месяца назад
There are some things to note:
1. The dataset is an imbalance:
2. TfIDF vectorization doesn't capture the semantics meanings. It only victories based on the frequency of words
Fix:
1. Do sampling techniques or add class weights
2. Go with other vectorizations and models like transformer based ones
@growingfire 6 месяцев назад
Thanks a lot !
@vishakantaiah1214 4 месяца назад
using which algorithm project is prepared is it naive
bayes
@CybrZone 2 года назад ⁺⁴
Sorry if stupid question but how does the model determine the difference between a HAM and SPAM mail? is it purely based on the training of different spam mails and the machine learning takes into consideration the number of words, frequency of certain words and repetitive words etc using the TF-IDF calculation at tfidvectorizer(min_df=1 ...) part?
@callofdutysuperheroes Год назад ⁺¹
Your question is more valid than my life . I also have same question from long time and now how can I give explaination without realistic logic to my project mentor?
@guitar300k Год назад ⁺¹
it's based on the frequency of all the words
@elsamaylinaputriroesnadi4571 3 года назад
thanks, can you make an e-commerce application recommendation system? using merging method svm and query expansion?
@madhavaswamyamidepuram5751 3 года назад
sir im getting error couldnt find that version that satisfy the accuracy_score
can you please tell me wht to do sir
@Error-Solver. 3 года назад ⁺¹
sir where do you get all this excel sheets please provide the link sir
@Amanansari003 Год назад
kaggle
@AndrewSamStories Год назад
When you paste spam categprized text the model also recognizes it as ham. Tested it a few times :(
@محمدحسننعيمرحيمه 11 месяцев назад
how i can inter a new spam email for test the code?
@RaniEnterprises05 Год назад
one question nlp is library ya one software?
@SanjaySanjay-cw4ru 2 года назад
What are the algorithms are used in these program
@masumecetin4336 3 года назад ⁺¹
it was a very helpfull tutorial thank you also your datasets are always so good can you provide a link for them please
@Siddhardhan 3 года назад ⁺²
hi! you're welcome. link given in description
@rajkumarray3224 2 года назад
@Siddhardhan here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer
@155_meenusingh6 Год назад
I have the same doubt
@ashupochh8310 Год назад
how can I correct this?
@danielnwoke3694 5 месяцев назад
Change Labelling 17:36
@sk65233 Год назад
sir can I make this project in jupyter notebook please reply sir i have to complete my project
@suryatapasarkar3902 Год назад
i'm getting an error in feature extraction - invalid literal for int() with base 10: 'ham'
@ms_007_ Год назад ⁺³
Chance 'True' --> True
@bububarks1273 10 месяцев назад ⁺¹
@@ms_007_ still getting error..any changes?
@Ayushpan123 7 месяцев назад
Same
@vs_shortz_0001 2 года назад
Which machine learning algorithm is used here plzzzz replay
@thinkingmad1685 2 года назад
Sir i need the ppt which you have presented can u pls send it 😀 Or google slides link 🔗
@asynchronousani Год назад
Sir, please tell us how to deploy this model on Streamlit?
@shu03bh 4 месяца назад
Is there website made with yhis project??
@SanjaySanjay-vs3cn 11 месяцев назад
is this project using tensorflow
@saisuraj97 2 года назад
Sir please tell that how can we delete spam mails after detection
@adityarenapure7397 Год назад
which algorithm is used here
?
@Nshupriya Год назад
Is it necessary to run this model in flask or we can just finish it here?? Anyone please tell
@shajidhabegammohamedrafeek3455 5 месяцев назад
We can use in jupyter
@machilikanthyadav460 2 года назад
what is the difference between present existing system and your project?
@155_meenusingh6 Год назад
same question
@AnujeetKunturkar Год назад
When i am giving input from dataset it works prefect , but not for the one on my own
Why it gives wrong output If write input on my own ??????,
@HimanshuSingh-pp5hw 8 месяцев назад
same problem did you find anything to get the own result
@ruznyma 4 месяца назад
What example input you put?
There are some things to note:
1. The dataset is an imbalance:
2. TfIDF vectorization doesn't capture the semantics meanings. It only victories based on the frequency of words
Fix:
1. Do sampling techniques or add class weights
2. Go with other vectorizations and models like transformer based ones
@shorts2429 Год назад
for giving input a spam mail it still showing 1 as output , can anyone help?
@akashk9755 Год назад
Hii bro i am facing the error while using y_test.astype the error is invalid literal for int() with base 10 .please anybody rctify the error
@nasiriqbal4258editor Год назад
How to fix notfittederror in prediction system someone tell me please
@manjurahammad1110 3 года назад ⁺²
This is great work . I face this problem when using test data (ValueError: X has 3302 features, but LogisticRegression is expecting 7526 features as input.) . Need ur help.
@RodrigoGarcia-ni1zz 2 года назад ⁺¹
I corrected the problem. I was wrong when I created the variable 'X_test_feature'. This is feature_extraction.transform(X_test) without 'fit' (.fit_transform)
@sayantanmanna614 Год назад
@@RodrigoGarcia-ni1zz please write ta code
@sagars6188 Год назад
@@sayantanmanna614 yes please
@akankshathakur1032 Год назад
Thankyou saviour! I was so disturbed from past 2 days.
@InduKanaka 5 месяцев назад
@@RodrigoGarcia-ni1zz please write the code
@webproject5144 2 года назад
Sir can you please make a video on RUclips spam comment detection
@JunaidAhmedMohammed-cw7um 2 месяца назад
Hello sir i would like to attend personal classes from you ready to pay the amount for the classes can you please reply to this.
@nandinisrivastava3183 11 месяцев назад
it's just giving the ham mail in n=answer is not giving spam mail prediction
@debudas5230 Год назад
X_train_features showing parameter error
@tejaspradhan2423 Год назад
Sir, how can we deploy this model?
@haardhikkunder9436 Год назад
did you deploy it?
@baivabpaul8918 2 года назад
how to deploy it in a webpage?
@MAbhishiktha 7 месяцев назад
Data set???!
@nandiniverma6045 Год назад
Why this code show error
@sunflowers6957 2 года назад
Thankyou!
@YashAgarwal-xu5fd Год назад
whatever be the input the prediction is showing ham even giving the spam mail
@DataDorz 2 года назад
you didn't check whether the data is balanced or imbalanced which is very important.
@Wilson2746 Месяц назад
5:25
@clumsyslime3369 Год назад
21:39
@alimibrahem8120 Год назад
Hello My friend...! thanks for your video.. very brilliant ..!
so i used the Decision tree model and i get Accuracy on training data : 1.0, and Accuracy on test data : 0.9668161434977578 does that make scence...? do you thing that in this case using the Decision tree model is more powerful and effective than the logistic regression model..?
@sandipansarkar9211 2 года назад
finished watching
@003_abhishekgupta9 Год назад
The 'lowercase' parameter of TfidfVectorizer must be an instance of 'bool', an instance of 'numpy.bool_' or an instance of 'int'. Got 'True' instead.
getting this error on running X_train_features
@amujurihemasai6564 Год назад
Remove quotes in true then run
@prerna_0008 9 месяцев назад
Please do a project on bird classification based on audio using CNN
or Online Expense tracker
@manideepmani5062 Месяц назад
bro if anyone have done their project this please message here need documentation
@devanshprakash8354 2 года назад
Can we use it in our resume
@sandipansarkar9211 2 года назад
finished coding
@rafiulfaisal6240 2 года назад ⁺²
How to take input that mail
@rajeshj5071 2 года назад
What are the algorithms are used in these program

Следующие

Автовоспроизведение