FAKE NEWS CLASSIFIER WITH MACHINE LEARNING ALGORITHMS USING Natural Language Processing- PART 1

Поделиться
HTML-код
  • Опубликовано: 1 дек 2024

Комментарии • 111

  • @parthraghuwanshi2980
    @parthraghuwanshi2980 Год назад +3

    People like u are gems who after working hard all days in office takes out time just to do post quality content that too being selfless I can truly understand how much good values a person has

  • @suresherriboyina
    @suresherriboyina 4 года назад +7

    Please upload Part 2, Because now we have so much of time , so keep upload new project videos

  • @tejashshah5202
    @tejashshah5202 4 года назад +4

    At 10:48 , CountVectorizer() should not be performed before train_test_split(). If you do, this leads to data leakage and is not correct. Correct way is to fit_transform() on train and transform() on test data.

  • @krishnaik06
    @krishnaik06  4 года назад +55

    Hello All,This video were for the members, but many of you all had requested this video. So I have uplaoded for everyone. It is also added in NLP playlist Happy Learning!!

    • @keshavbansal5148
      @keshavbansal5148 4 года назад +4

      Bomb video 💥💥💥, 💞💞💞

    • @iNSane4224
      @iNSane4224 4 года назад +3

      Thank you sir

    • @anujvyas9493
      @anujvyas9493 4 года назад +3

      Thankyou soo much sir 👍

    • @siddhantranjan1607
      @siddhantranjan1607 4 года назад +1

      sir why you have only used the title column of the dataset for predictions there is one more column as text what about it?

    • @stackexchange7353
      @stackexchange7353 4 года назад

      Could you use sklearn for the feature extraction portion and Keras for the model portion?

  • @alexanderbalasky6174
    @alexanderbalasky6174 4 года назад +6

    Great walkthrough but look into your audio!

  • @vasanthrohith4564
    @vasanthrohith4564 Год назад

    It's super useful🤩.... thanks for teaching ❤

  • @thechaoticneuron
    @thechaoticneuron 4 года назад +10

    Hello Krish, this is a great video. I started my learning into NLP with this. A small question;
    I followed the entire procedure similarly and I implemented Logistic Regression at the end. It gave me a higher accuracy of 94% with less false positives and negatives as well.
    I'm just keen to know how PassiveAggressive Classifier is said to be better for NLP applications and why not s simple logistic regression cannot be used.
    Thank you :)

  • @2500204
    @2500204 4 года назад +3

    I am a machine learning engineer , I like that you post such videos but the problem with real data set is that there is no training data .
    You have to collect and create your own training data . People who are watching this video don't know what is about to hit them once they enter this field.
    It's not plug and play.
    I spend 80% of my time creating data and processing it and only 20% actually doing ML
    In one anomaly detection project we had to use db scan to find noise in the data then we marked the noise dp's as anomalous and cluster dp's as non anomalous. Then used that data to train our ANN.

  • @abhishekpurohit3442
    @abhishekpurohit3442 4 года назад +1

    Just got 97% accuracy by combining both title and text....using passive aggresive classifier..

  • @ronaksengupta6174
    @ronaksengupta6174 4 года назад +8

    Sir after this please make one video on How Pandemics impact the Financial Markets or anything regarding covid19 dataset analysis

  • @johnyjose3941
    @johnyjose3941 4 года назад +3

    Hi Krish, @ 8:28, why did you take messages['title'] ?? I think we should take messages['text']

  • @samirpaul5499
    @samirpaul5499 2 года назад

    Hello Krish - This is an amazing video. I have been watching your videos and learned many things. Wonderful contribution towards the aspiring Machine Learning engineers. I have one question, request you to clarify. After this Bag-of-words/TF-IDF model is built, for new sentences, how do we construct the input featues (to be passed to predict function of the model). If such explanation exists in any other video, please point me to that, else would request you to make a short video on this, this will be immensely helpful. Thank you again. - Samir Paul

  • @sachinborgave8094
    @sachinborgave8094 4 года назад +2

    Thanks Krish, please upload GBM indepth intuition.

  • @tanishbothra5044
    @tanishbothra5044 4 года назад +1

    Hello, Suppose we need to add more features in our X which are not text..i.e suppose we get a sparse matrix after count vectorizer and now we have one more feature length and we want both features.How to combine both?

  • @mdenamulhaque7589
    @mdenamulhaque7589 4 года назад +2

    Dear Krish, for being a data scientist should we need to learn SQL or something like this? If it's need then why it's absent in your data science play list or do you have any idea in future for that. I'm very confused.please info.thnx

    • @rayyanamir8560
      @rayyanamir8560 2 года назад

      Yes you need to know only the basic queries. Like group by , Joins, Order By, Select , Where , etc

  • @pec8377
    @pec8377 Год назад

    Hi @Krish, shouldn't you create the bag of words on the X_train instead of the full dataset ? Otherwise the accuracy will not be the same when providing a new sentence

  • @subhamacharya7084
    @subhamacharya7084 4 года назад +2

    Sir,kindly make some basic video on Pandas

  • @abhishekpurohit3442
    @abhishekpurohit3442 4 года назад +1

    Sir you've not uploaded the video on passive aggressive classifier....Please upload it Sir!!

  • @surajthallapalli4227
    @surajthallapalli4227 4 года назад +1

    Hi Sir, I guess there is a data leakage problem. First we need to split train and test and later we have to apply Countvectorizer rite? In the video first the CountVectorizer is applied and later train and test split is done. Please clarify this.

  • @karanshethia3560
    @karanshethia3560 2 года назад

    Hey Krish. Great video. Can you let me know how can we make a predictive system once we have tried out different models and selected the one that is more accurate/effecient?

  • @mohammedzia1015
    @mohammedzia1015 2 года назад

    Hi Krish,,
    It was a nice video, but I have one ques, the condition "If score = previous_score" will satisfy every time right ? As you have set the value of previous_score to ZERO. So what is the use of this ? Don't we have to assign score value to previous_score like this "previous_score = score", after the IF condition ?

    • @neeleshnayak4375
      @neeleshnayak4375 2 года назад

      Hey did you noticed test and train data have lot of overlap in this problem? and removing overlap will lead to poor prediction by the model

  • @1pmcoffee
    @1pmcoffee 4 года назад

    @Krish Naik Sir do you provide paid personal consultation on hourly basis? Its there any way i can connect with you?

  • @omkarpatil2854
    @omkarpatil2854 4 года назад +1

    Hello krish,
    For the for loop which generates the corpus.
    Yours was done in few minutes but for my laptop (i5 7th gen, 1050tx 4gb graphics) it took more than half n hour.
    It's there anything i need to configure?

    • @adarshgupta9952
      @adarshgupta9952 2 года назад

      You have to reduce the number of dataset (rows) from the csv file. Use a csv editor to do that and not MS Excel (it'll give error).

  • @ganeshhegde8972
    @ganeshhegde8972 4 года назад +1

    Nice sir

  • @mbmathematicsacademic7038
    @mbmathematicsacademic7038 4 месяца назад

    Thank you Kris

  • @jainilpatel1173
    @jainilpatel1173 3 года назад +1

    I got error in -- please give ans
    In passive aggressive classifier algorithm:
    Unexpected keywords argument n_iter(50)

  • @lokbharatendu7063
    @lokbharatendu7063 4 года назад

    Hi Krish
    First of all thanks for such good videos! I am very new to data science so my question might sound very basic.
    In the current video and also in some of the other videos in the current playlist, you have mentioned that algorithms like Naive Bayes/ MultinomialNB work very well with text data.
    In all the samples we are converting sentences to words and then to features (having values 0 and 1 in case of BOW). So post this conversion aren't we just dealing with numeric data rather than textual data? As all the text has been converted to independent features having numeric value.
    Can't we just use any of classification algorithm? if yes then why we say Naive Bayes works well with text data.

  • @themightylion5147
    @themightylion5147 4 года назад +1

    Sir @ 8:28, why did you take messages['title'] ?? I think we should take messages['text'].

  • @kalppanwala6439
    @kalppanwala6439 4 года назад +2

    krish can u make videos regarding BERT can't find any good explanations regarding the same

  • @junaidyousaf4602
    @junaidyousaf4602 3 года назад

    Sir after this please make one video on How To Detect Hate Speech ...

  • @tirumalaparise9474
    @tirumalaparise9474 3 года назад

    Sir, how can this model predict fake news or real news, when some other external news is given Preprocessed in the same way of x[title]?
    Does it work for real life? Or just on testing data..

  • @thepresistence5935
    @thepresistence5935 2 года назад

    Waited 2 hours, but it not executed in my lap, so tooked first 1000 data and changed worked

    • @adarshgupta9952
      @adarshgupta9952 2 года назад

      Hello, can you share that csv file? Because when I took first 100 or 200 data it includes other columns as well and call them as "unnamed 5, unnamed 6 ---- and unnamed 685". Furthermore, when I drop NaN values, all the rows are dropped and I am left with 0 rows.
      OR please tell me how you took first 1000 data. anything may help.

    • @adarshgupta9952
      @adarshgupta9952 2 года назад

      Never mind I got it.
      Tip for others: I was using MS Excel to remove data. Use any csv editor to remove data and not excel.

  • @ushirranjan6713
    @ushirranjan6713 3 года назад

    Sir,When I am trying to import the data and read in the colab, then it not is happening. There was some error due to that I have to do these changes, to read the data.
    df=pd.read_csv('train.csv',engine='python', encoding='utf-8',error_bad_lines=False)

    • @ushirranjan6713
      @ushirranjan6713 3 года назад

      @@K.S_5723 , I have to search it, but you can download from the krish folder and do the same change what I have done, it will work.
      I f I will get i will send you

  • @thunder440v3
    @thunder440v3 4 года назад +1

    Wow!

  • @vishaldas6346
    @vishaldas6346 4 года назад

    Hello Krish, I think there is a mistake while explaining True Positive & True negative. Correct me if I'm wrong.

  • @vishalvanpariya1466
    @vishalvanpariya1466 4 года назад

    Greta video but I have one query why you use only title feature in modelling you should use all the features

  • @avanishsingh8518
    @avanishsingh8518 4 года назад

    Hi sir ,
    Sorry but I have a question ,in the video you were telling that you are going to use text column for countvectrozer ..but you are taking title column. why?

  • @sahityakandru6134
    @sahityakandru6134 4 года назад

    While I was running the ipynb file you gave i can find an error that Unable to allocate 698. MiB for an array with shape (18285, 5000) and data type int64
    Can you please explain this

    • @amansingh3347
      @amansingh3347 3 года назад

      Increase your Ram or use Google Colab

  • @ishwarjagdishashar9096
    @ishwarjagdishashar9096 4 года назад

    I have followed all codes. Getting error in re.sub(). "NameError:name 're' is not defined".
    Do I have to install any library to run the re.sub() function?

  • @varunpusarla
    @varunpusarla 4 года назад

    Why do we use Naive Bayes for NLP problems ?

  • @subarnasamanta4945
    @subarnasamanta4945 4 года назад

    I am trying this project in kaggle with gpu enable but gpu is not working showing 0% usage there can yu tell me why

  • @RadomName3457
    @RadomName3457 3 года назад

    Hi guys, could anyone explain to me what the coefficients of the models are. Why do we have those numbers?

  • @sandipansarkar9211
    @sandipansarkar9211 4 года назад

    Superb video .But while practice coding I am stuck at corpus and from there on wards it is all stuck.I have tried a number of times but to no avail.Thanks

    • @adarshgupta9952
      @adarshgupta9952 2 года назад

      You have to reduce the number of dataset (rows) from the csv file. Use a csv editor to do that and not MS Excel (it'll give error).

  • @urvashisingh3329
    @urvashisingh3329 4 года назад +1

    Sir i need help its very very urgent i want the code for the speakers age and gender classification plz help sir i really need it.

  • @adarshgupta9952
    @adarshgupta9952 2 года назад

    I have exported this model as a ".sav" file using Pickle. Now, how can I test this model?
    I want to write a news statement and want to predict if it is true or not.
    Please help anyone!

  • @rajarshidgp2003
    @rajarshidgp2003 2 года назад

    time[15:40] - u r not able to see the because you had done reset_index - so the original index numbers have been lost

  • @011_mohdanwar2
    @011_mohdanwar2 4 года назад

    Dear sir ,
    I want to ask u, can we Work on title or text . Process of remove the stopword???
    Can u explain me ???

  • @shauryananda207
    @shauryananda207 4 года назад

    Not able to implement Passive Aggressive Classifier. The argument 'n_iter' is unexpected

  • @saisubramanyam3243
    @saisubramanyam3243 3 года назад

    Sir how to predict the label of test data instances whether it is fake or real.

  • @preethisetty4309
    @preethisetty4309 3 года назад

    Can u explain fake online reviews detection using passive aggressive classifier

  • @vipindube5439
    @vipindube5439 4 года назад

    Hello Krish Sir your voice getting lower please play on high pitch.

  • @sabafarheen4918
    @sabafarheen4918 3 года назад

    Sir just with single word how we can say it's fake?? Please answer

  • @aravindnaidu1286
    @aravindnaidu1286 4 года назад

    sir, I have a doubt why you havent taken other parameters other than title and we are getting an accuracy of 94 pecent Iam just shocked!!!!! plz reply plz

  • @johannachristy7515
    @johannachristy7515 4 года назад

    Pls also tell us how to implement this in a web application

  • @ebrahimkutty1491
    @ebrahimkutty1491 4 года назад

    CountVectorizer can remove stop words.

  • @mansikumari9533
    @mansikumari9533 3 года назад

    I am getting parser error while uploading dataset .Please solve

    • @mansikumari9533
      @mansikumari9533 3 года назад +1

      Also what is "re" in line 127..it's giving error "re is not defined" when trying in different dataset.

  • @tusharpangare2468
    @tusharpangare2468 4 года назад

    messages.reset_index(inplace=True)
    im having error like:
    AttributeError Traceback (most recent call last)
    in
    ----> 1 messages.reset_index(inplace=True)
    AttributeError: 'function' object has no attribute 'reset_index'
    can someone help me

  • @sonalgarg5628
    @sonalgarg5628 4 года назад

    review = re.sub('[^a-zA-Z]'," ", messages['title'][i]) i get the error in this line-
    expected string or bytes-like object please solve this

    • @manishwadhwani5860
      @manishwadhwani5860 4 года назад

      just apply messages['title'] = messages['title'].apply(str)

    • @sainathpatil844
      @sainathpatil844 4 года назад

      Even i m getting the same issue if u get the solution please let me know

    • @sainathpatil844
      @sainathpatil844 4 года назад

      @@manishwadhwani5860 not working

    • @sainathpatil844
      @sainathpatil844 4 года назад

      Just import re there

    • @sahityakandru6134
      @sahityakandru6134 4 года назад

      Just import re before running that line

  • @harikrishnanm5109
    @harikrishnanm5109 4 года назад

    It was really helpful. Can u make videos on Grammer Correction using Rule based methord, Language Models & classifiers.
    its really hard to understand it otherwise

  • @avibitm
    @avibitm 3 года назад

    Hi krish can u help us make API on this

  • @sivarajasekharyannam9398
    @sivarajasekharyannam9398 3 года назад

    Hello sir please send the document in this project

  • @sgrsgr5663
    @sgrsgr5663 4 года назад

    Krish, Voice in this video is not much clear.

  • @CasualGamer669
    @CasualGamer669 3 года назад

    can tou make it to be a simulator ??

  • @rohitbaisane6712
    @rohitbaisane6712 3 года назад +1

    How your classifier detect fack news
    It detect every news or like only the news of dataset?

  • @piyushvyas2475
    @piyushvyas2475 4 года назад +1

    Can you please add F1, recall and precision score in this tutorial for used algos.

    • @monicainapakolla7148
      @monicainapakolla7148 4 года назад +2

      from sklearn.metrics import classification_report
      target_names = ['FAKE', 'REAL']
      print(classification_report(y_test, pred, target_names=target_names))
      try this

  • @faryaltahseen7197
    @faryaltahseen7197 Год назад

    Sir ! I m very new to NLP, Thank you so much for this playlist... i am learning so many things from you....but plx tell me how can i fix this error?
    12 from unicodedata import normalize
    14 if normalize:
    ---> 15 cm=cm.astype('float')/cm.sum(axis=1)[:,np.newaxis]
    16 print("Normalized Confusion Matrix")
    17 else:
    AttributeError: module 'matplotlib.cm' has no attribute 'astype'

  • @mainuddinali9561
    @mainuddinali9561 2 года назад

    i m not able to download dataset , without practice it is waste

  • @jonashero5054
    @jonashero5054 3 года назад

    where is the part 2

    • @nagarajannethi
      @nagarajannethi 3 года назад

      ruclips.net/video/E9gVleivB6M/видео.html

  • @malik_msn
    @malik_msn 4 года назад +1

    Would have enjoyed it but the poor audio made mess of ut

  • @rubabvlogs1843
    @rubabvlogs1843 4 года назад

    aggle fake news icon is Trump ...:)

  • @MuhammadAbdullah-gx2ou
    @MuhammadAbdullah-gx2ou Год назад

    dear sir i am facing this error here:
    TypeError Traceback (most recent call last)
    in ()
    4 corpus = []
    5 for i in range (0, len(messages)):
    ----> 6 review = re.sub('[^a-zA-Z]', ' ', messages['title'][i])
    7 review = review.lower()
    8 review = review.split()
    /usr/lib/python3.10/re.py in sub(pattern, repl, string, count, flags)
    207 a callable, it's passed the Match object and must return
    208 a replacement string to be used."""
    --> 209 return _compile(pattern, flags).sub(repl, string, count)
    210
    211 def subn(pattern, repl, string, count=0, flags=0):
    TypeError: expected string or bytes-like object