Project 17. Spam Mail Prediction using Machine Learning with Python | Machine Learning Projects

Поделиться
HTML-код
  • Опубликовано: 2 янв 2025

Комментарии • 151

  • @sam340
    @sam340 Год назад +74

    At 40:45, an error was being faced. Remove those inverted commas from lowercase = 'True'.
    Correct code:
    feature_extraction = TfidfVectorizer(min_df=1,stop_words='english',lowercase=True)

  • @ShubhamPrajapati-ix8ij
    @ShubhamPrajapati-ix8ij 7 месяцев назад +12

    00:03 Building a Spam Mail Prediction System using Machine Learning with Python
    01:55 Classify spam and non-spam mails using machine learning
    05:38 Importing the dependencies for machine learning project
    07:38 Splitting data for training and testing
    12:08 Replacing null values with null strings in the mail data
    14:18 Printing the first five rows and checking the size of the data set
    18:12 Labelling data to predict outcomes
    20:16 Separating text and labels for spam mail prediction
    24:05 Splitting dataset into training and test data
    25:46 Splitting data for training and testing in machine learning
    29:15 Feature extraction converts text data to numerical values
    31:31 TF Idea Vectorizer assigns scores to words based on frequency in data set
    34:53 Preprocessing steps in spam mail prediction using machine learning
    36:36 Converting messages into numerical values using feature extraction
    40:22 Converting text data into numerical values using TF-IDF vectorizer
    42:10 Training logistic regression model with extreme features and y-train
    46:03 Comparing predicted values with true values to determine accuracy on training data.
    48:06 Model achieved 96.5% accuracy on test data
    51:31 Building a predictive system for spam mail prediction
    53:14 Data preparation and feeding into machine learning model
    57:19 Implementing if-else condition based on prediction values
    59:08 Predict if mail is spam or not using machine learning
    1:02:24 Building a spam mail prediction system using logistic regression
    Crafted by Merlin AI.

  • @vidhanshuborade5977
    @vidhanshuborade5977 Год назад +9

    Excellent explanation and 100% beginner friendly

  • @vinaynaik953
    @vinaynaik953 3 года назад +5

    Thanks you sid i have implemented this professionally as a ticket classification using this project idea you are really a hero

  • @gertrudevine9678
    @gertrudevine9678 Год назад +3

    This is the best tutorial video i have ever washed. Thank you. Please you do a project life expectancy?

  • @tanujam
    @tanujam 3 года назад +17

    These videos are really great and the explanation is very clear . You are going over everything from basics. I have a question. Similar to the heart disease prediction project (using different parameters ) can we look at the ECG data and predict abnormalities in the heartbeat or arrhythmia .

    • @Siddhardhan
      @Siddhardhan  3 года назад +3

      hi! thanks 😇. we can use ecg data to predict the heart condition. we just need to have the suitable data. it can be image data or numerical data.

    • @tanujam
      @tanujam 3 года назад

      @@Siddhardhan Thanks for your response. Can you post a video for that project or help me with the project?

  • @lakshmik1087
    @lakshmik1087 3 года назад +3

    Thank you so much for this video even as a beginner I understood this very well

  • @tejasbajwa_23Student
    @tejasbajwa_23Student Год назад +2

    found this very helpful. Thank You💯

  • @tejgumaste5011
    @tejgumaste5011 Год назад

    Hope you get the best head bro!! Your videos are super helpful

  • @subithbabu1931
    @subithbabu1931 Год назад +3

    Can we do the feature Vectorization before splitting it into training and test data?
    When I tried the same I was getting an error while Building the model.

  • @prashantkumartiwari6797
    @prashantkumartiwari6797 11 месяцев назад +2

    I am getting error at X_train_features=feature_extraction.fit_transform(X_train)

  • @stephane-wamba
    @stephane-wamba Год назад

    Thanks, your content is well explained but also consider mentioning the strengths and weaknesses of models you use and how to overcome them with some ml other alternative models. For example, for that project, Support Vector Machines could perform better than Logistic regression for spam email detection especially when there is nonlinearity in the features. Keep the great work.👍

    • @malla7134
      @malla7134 Год назад

      Exactly, KNN would work perfectly.

  • @malikumarhassan1
    @malikumarhassan1 3 года назад

    Thanks a lot brother, learning a lot from your videos.

  • @elfincredible9002
    @elfincredible9002 7 месяцев назад

    You explain so well.... Thanks a lot

  • @mohammedshafeeq4930
    @mohammedshafeeq4930 2 года назад +9

    the code is not giving outputs for spam for every mail we are getting output as ham mail

    • @sharanganbala
      @sharanganbala Год назад

      Yes , same here.

    • @nasiriqbal4258editor
      @nasiriqbal4258editor Год назад

      That's mean I have to give ham mail in this prediction system

    • @Megnamv
      @Megnamv Год назад

      ​@@nasiriqbal4258editorif we give spam also ,it's predicting ham only

    • @155_meenusingh6
      @155_meenusingh6 Год назад

      I am facing the same problem !

  • @Hkkivines30
    @Hkkivines30 3 года назад +9

    Sir why you have not did this project with the help of KNN algorithm . KNN is also used for classification?

    • @vs_shortz_0001
      @vs_shortz_0001 2 года назад +2

      Now in this which algorithm is used

    • @MemeFYPage
      @MemeFYPage Год назад +4

      Bro KNN can be used on non linear data which is in this case it is not, it is either 0 or 1 which is linear

    • @vickyvignesh2166
      @vickyvignesh2166 8 месяцев назад

      Broo error occuring in this program while we insert a another mail in dataset in input its occurs error in tfidvector what can shall we doo????

  • @saipavanakki8116
    @saipavanakki8116 2 года назад

    clearly understood.. thank you Bro..

  • @NEF7354
    @NEF7354 11 месяцев назад +1

    Which machine algorithm has used here?

  • @36_kapilojha2
    @36_kapilojha2 2 года назад +6

    At block 15, I am getting this error
    ValueError Traceback (most recent call last)
    in
    3 X_test_features = feature_extraction.transform(X_test)
    4
    ----> 5 Y_train = Y_train.astype('int')
    6 Y_test = Y_test.astype('int')
    how to fix this?
    plz help anyone

  • @ayondas8005
    @ayondas8005 Год назад +2

    Getting an error in this line. Y_train = Y_train.astype('int')
    Already changed 'True' -> True. But still showing error

    • @bububarks1273
      @bububarks1273 10 месяцев назад

      same...found any changes??

  • @viv1622
    @viv1622 Год назад

    New subscriber. Thanks

  • @sandhyareddy7957
    @sandhyareddy7957 3 года назад +2

    Hii sir, I have small dought, topics of curriculum in the discription is enough to start career as machine learning

    • @Siddhardhan
      @Siddhardhan  3 года назад

      hi! I would suggest u to learn Deep Learning as well.

    • @sandhyareddy7957
      @sandhyareddy7957 3 года назад

      @@Siddhardhan ok tq sir

  • @TARAK_4018
    @TARAK_4018 Год назад

    Thank you very much sir this project is very useful to me

    • @mjanish9836
      @mjanish9836 8 месяцев назад

      did you mention this project in your resume???

    • @TARAK_4018
      @TARAK_4018 8 месяцев назад

      Yes

    • @mjanish9836
      @mjanish9836 8 месяцев назад

      @@TARAK_4018 thank you so much for replying. I have doubts regarding internships and resume building. If u could provide ur mail ID. It will be really helpful to shape my career.

  • @AcharyaGagan
    @AcharyaGagan 3 года назад +3

    Thank you so much sir, this project explaination really helped me a lot, can I get documentation for this project??

    • @rajkumarray3224
      @rajkumarray3224 2 года назад

      here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer

  • @shalithapraveen2391
    @shalithapraveen2391 2 года назад

    Nice learned a lot

  • @abhishekwaghmare2480
    @abhishekwaghmare2480 2 года назад +4

    sir i got an error at building predictive system.
    ValueError: X has 6 features, but LogisticRegression is expecting 7431 features as input.

  • @PrinceKumar-kr5ig
    @PrinceKumar-kr5ig 4 месяца назад

    All All, Can you please help me why I am getting this error "'int' object has no attribute 'lower'" at X_train_features=feature_extraction.fit_transform(X_train) line although I have
    code executing perfectly fine till feature_extraction=TfidfVectorizer(min_df = 1, stop_words='english', lowercase='True')

  • @kahkashanfatima-0016
    @kahkashanfatima-0016 2 года назад +2

    Hello sir. Your explanation is very clear. Sir can you do a project on life expectancy (WHO) dataset or can you please help me out in doing that ? Thanking you

    • @rajkumarray3224
      @rajkumarray3224 2 года назад

      here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer

  • @shivamkeshri4380
    @shivamkeshri4380 6 месяцев назад

    Also add data visualization and hyperparameter tuning

  • @sivaprasadmaharana845
    @sivaprasadmaharana845 2 года назад +1

    Sir
    Please tell me where can i find documentation for this project

  • @omkargujar6507
    @omkargujar6507 6 месяцев назад

    Sir can you tell which hardware and software's are use in this project

  • @sade.meghana410
    @sade.meghana410 2 года назад +1

    sir is this a model based application of email spam detection ......then what about real time application??of itt

  • @BCMKISHOREKUMARS
    @BCMKISHOREKUMARS Год назад +1

    Sir, in google colaboratory from feature extraction onwards its show's an error please crt it and re upload it sir..........

  • @arohirathore9197
    @arohirathore9197 3 года назад

    Thank you so much 🍁

  • @-VASANTHKUMARSS
    @-VASANTHKUMARSS Год назад +1

    Hello sir,i got an error in feature extraction

  • @nandiniverma6045
    @nandiniverma6045 2 года назад +1

    when I pass any spam mail its show ham mail ? why

    • @ashupochh8310
      @ashupochh8310 Год назад

      the dataset is imbalanced. you'll have to resample it

    • @kiranwaskle5487
      @kiranwaskle5487 Год назад

      ​@@ashupochh8310how we have to resample ..??

    • @ashupochh8310
      @ashupochh8310 Год назад

      @@kiranwaskle5487 you need a balanced dataset. Increase the number of spam mails or decrease the number of ham mails.

    • @ruznyma
      @ruznyma 4 месяца назад

      There are some things to note:
      1. The dataset is an imbalance:
      2. TfIDF vectorization doesn't capture the semantics meanings. It only victories based on the frequency of words
      Fix:
      1. Do sampling techniques or add class weights
      2. Go with other vectorizations and models like transformer based ones

  • @growingfire
    @growingfire 6 месяцев назад

    Thanks a lot !

  • @vishakantaiah1214
    @vishakantaiah1214 4 месяца назад

    using which algorithm project is prepared is it naive
    bayes

  • @CybrZone
    @CybrZone 2 года назад +4

    Sorry if stupid question but how does the model determine the difference between a HAM and SPAM mail? is it purely based on the training of different spam mails and the machine learning takes into consideration the number of words, frequency of certain words and repetitive words etc using the TF-IDF calculation at tfidvectorizer(min_df=1 ...) part?

    • @callofdutysuperheroes
      @callofdutysuperheroes Год назад +1

      Your question is more valid than my life . I also have same question from long time and now how can I give explaination without realistic logic to my project mentor?

    • @guitar300k
      @guitar300k Год назад +1

      it's based on the frequency of all the words

  • @elsamaylinaputriroesnadi4571
    @elsamaylinaputriroesnadi4571 3 года назад

    thanks, can you make an e-commerce application recommendation system? using merging method svm and query expansion?

  • @madhavaswamyamidepuram5751
    @madhavaswamyamidepuram5751 3 года назад

    sir im getting error couldnt find that version that satisfy the accuracy_score
    can you please tell me wht to do sir

  • @Error-Solver.
    @Error-Solver. 3 года назад +1

    sir where do you get all this excel sheets please provide the link sir

  • @AndrewSamStories
    @AndrewSamStories Год назад

    When you paste spam categprized text the model also recognizes it as ham. Tested it a few times :(

  • @محمدحسننعيمرحيمه
    @محمدحسننعيمرحيمه 11 месяцев назад

    how i can inter a new spam email for test the code?

  • @RaniEnterprises05
    @RaniEnterprises05 Год назад

    one question nlp is library ya one software?

  • @SanjaySanjay-cw4ru
    @SanjaySanjay-cw4ru 2 года назад

    What are the algorithms are used in these program

  • @masumecetin4336
    @masumecetin4336 3 года назад +1

    it was a very helpfull tutorial thank you also your datasets are always so good can you provide a link for them please

    • @Siddhardhan
      @Siddhardhan  3 года назад +2

      hi! you're welcome. link given in description

  • @rajkumarray3224
    @rajkumarray3224 2 года назад

    @Siddhardhan here the dataset is imbalanced as ham is nearly 80-90% of total outcome and spam is containing only 10%. so this shows that the dataset is imbalanced . THEN WHY DONT we use any sampling method here . ? can you please give me the answer

  • @danielnwoke3694
    @danielnwoke3694 5 месяцев назад

    Change Labelling 17:36

  • @sk65233
    @sk65233 Год назад

    sir can I make this project in jupyter notebook please reply sir i have to complete my project

  • @suryatapasarkar3902
    @suryatapasarkar3902 Год назад

    i'm getting an error in feature extraction - invalid literal for int() with base 10: 'ham'

    • @ms_007_
      @ms_007_ Год назад +3

      Chance 'True' --> True

    • @bububarks1273
      @bububarks1273 10 месяцев назад +1

      @@ms_007_ still getting error..any changes?

    • @Ayushpan123
      @Ayushpan123 7 месяцев назад

      Same

  • @vs_shortz_0001
    @vs_shortz_0001 2 года назад

    Which machine learning algorithm is used here plzzzz replay

  • @thinkingmad1685
    @thinkingmad1685 2 года назад

    Sir i need the ppt which you have presented can u pls send it 😀 Or google slides link 🔗

  • @asynchronousani
    @asynchronousani Год назад

    Sir, please tell us how to deploy this model on Streamlit?

  • @shu03bh
    @shu03bh 4 месяца назад

    Is there website made with yhis project??

  • @SanjaySanjay-vs3cn
    @SanjaySanjay-vs3cn 11 месяцев назад

    is this project using tensorflow

  • @saisuraj97
    @saisuraj97 2 года назад

    Sir please tell that how can we delete spam mails after detection

  • @adityarenapure7397
    @adityarenapure7397 Год назад

    which algorithm is used here
    ?

  • @Nshupriya
    @Nshupriya Год назад

    Is it necessary to run this model in flask or we can just finish it here?? Anyone please tell

  • @shajidhabegammohamedrafeek3455
    @shajidhabegammohamedrafeek3455 5 месяцев назад

    We can use in jupyter

  • @machilikanthyadav460
    @machilikanthyadav460 2 года назад

    what is the difference between present existing system and your project?

  • @AnujeetKunturkar
    @AnujeetKunturkar Год назад

    When i am giving input from dataset it works prefect , but not for the one on my own
    Why it gives wrong output If write input on my own ??????,

    • @HimanshuSingh-pp5hw
      @HimanshuSingh-pp5hw 8 месяцев назад

      same problem did you find anything to get the own result

    • @ruznyma
      @ruznyma 4 месяца назад

      What example input you put?
      There are some things to note:
      1. The dataset is an imbalance:
      2. TfIDF vectorization doesn't capture the semantics meanings. It only victories based on the frequency of words
      Fix:
      1. Do sampling techniques or add class weights
      2. Go with other vectorizations and models like transformer based ones

  • @shorts2429
    @shorts2429 Год назад

    for giving input a spam mail it still showing 1 as output , can anyone help?

  • @akashk9755
    @akashk9755 Год назад

    Hii bro i am facing the error while using y_test.astype the error is invalid literal for int() with base 10 .please anybody rctify the error

  • @nasiriqbal4258editor
    @nasiriqbal4258editor Год назад

    How to fix notfittederror in prediction system someone tell me please

  • @manjurahammad1110
    @manjurahammad1110 3 года назад +2

    This is great work . I face this problem when using test data (ValueError: X has 3302 features, but LogisticRegression is expecting 7526 features as input.) . Need ur help.

    • @RodrigoGarcia-ni1zz
      @RodrigoGarcia-ni1zz 2 года назад +1

      I corrected the problem. I was wrong when I created the variable 'X_test_feature'. This is feature_extraction.transform(X_test) without 'fit' (.fit_transform)

    • @sayantanmanna614
      @sayantanmanna614 Год назад

      ​@@RodrigoGarcia-ni1zz please write ta code

    • @sagars6188
      @sagars6188 Год назад

      ​@@sayantanmanna614 yes please

    • @akankshathakur1032
      @akankshathakur1032 Год назад

      Thankyou saviour! I was so disturbed from past 2 days.

    • @InduKanaka
      @InduKanaka 5 месяцев назад

      ​@@RodrigoGarcia-ni1zz please write the code

  • @webproject5144
    @webproject5144 2 года назад

    Sir can you please make a video on RUclips spam comment detection

  • @JunaidAhmedMohammed-cw7um
    @JunaidAhmedMohammed-cw7um 2 месяца назад

    Hello sir i would like to attend personal classes from you ready to pay the amount for the classes can you please reply to this.

  • @nandinisrivastava3183
    @nandinisrivastava3183 11 месяцев назад

    it's just giving the ham mail in n=answer is not giving spam mail prediction

  • @debudas5230
    @debudas5230 Год назад

    X_train_features showing parameter error

  • @tejaspradhan2423
    @tejaspradhan2423 Год назад

    Sir, how can we deploy this model?

  • @baivabpaul8918
    @baivabpaul8918 2 года назад

    how to deploy it in a webpage?

  • @MAbhishiktha
    @MAbhishiktha 7 месяцев назад

    Data set???!

  • @nandiniverma6045
    @nandiniverma6045 Год назад

    Why this code show error

  • @sunflowers6957
    @sunflowers6957 2 года назад

    Thankyou!

  • @YashAgarwal-xu5fd
    @YashAgarwal-xu5fd Год назад

    whatever be the input the prediction is showing ham even giving the spam mail

  • @DataDorz
    @DataDorz 2 года назад

    you didn't check whether the data is balanced or imbalanced which is very important.

  • @Wilson2746
    @Wilson2746 Месяц назад

    5:25

  • @clumsyslime3369
    @clumsyslime3369 Год назад

    21:39

  • @alimibrahem8120
    @alimibrahem8120 Год назад

    Hello My friend...! thanks for your video.. very brilliant ..!
    so i used the Decision tree model and i get Accuracy on training data : 1.0, and Accuracy on test data : 0.9668161434977578 does that make scence...? do you thing that in this case using the Decision tree model is more powerful and effective than the logistic regression model..?

  • @sandipansarkar9211
    @sandipansarkar9211 2 года назад

    finished watching

  • @003_abhishekgupta9
    @003_abhishekgupta9 Год назад

    The 'lowercase' parameter of TfidfVectorizer must be an instance of 'bool', an instance of 'numpy.bool_' or an instance of 'int'. Got 'True' instead.
    getting this error on running X_train_features

  • @prerna_0008
    @prerna_0008 9 месяцев назад

    Please do a project on bird classification based on audio using CNN
    or Online Expense tracker

  • @manideepmani5062
    @manideepmani5062 Месяц назад

    bro if anyone have done their project this please message here need documentation

  • @devanshprakash8354
    @devanshprakash8354 2 года назад

    Can we use it in our resume

  • @sandipansarkar9211
    @sandipansarkar9211 2 года назад

    finished coding

  • @rafiulfaisal6240
    @rafiulfaisal6240 2 года назад +2

    How to take input that mail

  • @rajeshj5071
    @rajeshj5071 2 года назад

    What are the algorithms are used in these program