Kaggle Faker News Classifier Using LSTM- Deep LEarning| Natural Language Processing

Поделиться
HTML-код
  • Опубликовано: 29 ноя 2024

Комментарии • 182

  • @suyashdandekar4653
    @suyashdandekar4653 4 года назад +10

    Dude you are the best ML tutor ever. You not only focus on theory but also teach practical implementations and that too with in a video not more than 30 minutes. I truly love your teachings and am always excited for latest of your videos

  • @prajg4310
    @prajg4310 3 года назад +18

    those who are seeing this after tensorflow 2.5.0, predict_classes(X_test) will not work, instead use:
    y_pred = (model.predict(X_test) > 0.5).astype("int32")

    • @MrHamid-ct7hy
      @MrHamid-ct7hy 2 года назад

      I used this but after using this accuracy only 40 percent

    • @zeelthumar
      @zeelthumar Год назад +1

      Thanks for pointing out this crucial information....can you please give the reference link for ".predict()"

    • @tanmayjadav213
      @tanmayjadav213 Год назад

      Yaa it happened to me too,
      So do you find any solution to that ?@@MrHamid-ct7hy

    • @manojitsaha6262
      @manojitsaha6262 Месяц назад

      Thank you for the solution...😊❤🙌

  • @jaikishank
    @jaikishank 3 года назад

    That was a great presentation and could work on 23000 observation data set with good accuracy. Many thanks Mr Krish for the knowledge disseminated.

  • @sukanthenss914
    @sukanthenss914 4 года назад +3

    Thank you so much. I was waiting for this session for months..

  • @AbdulSattar-zq5yg
    @AbdulSattar-zq5yg 4 года назад +7

    Great work! Please make a video on LSTM with class imbalance dataset

  • @hinata4661
    @hinata4661 4 года назад +3

    Hey Kish, the video was great. In NLP the most important step is preprocessing. It will be very useful if you can demonstrate different preprocessing techniques for different problems like Twitter analysis where hashtags and URLs should be removed. If possible make a video tutorial on Regex.

  • @subarnasamanta4945
    @subarnasamanta4945 4 года назад +1

    sir you have achieved more accuracy with the help of passiveaggressive classifier and i too 90% is less than that(in your members only project playlist)

  • @adeyinkasotunde6870
    @adeyinkasotunde6870 3 года назад

    Sir you're doing a great job. I am one of your faithful fans. My suggestion though, sir try to always include test data in your tutorial. So that we can always see the model effectiveness on test data after using the model to make prediction on unseen test data. And this is very Paramount to all your fans. I must say.... Thank you very much for your great job always ♥️👍🙏

  • @nidhisolanki5314
    @nidhisolanki5314 3 года назад

    Krish sir in my opinion you are the best data science teacher in youtube.please keep making practical sessions like this in deep learning

  • @peddamallamuralikrishna5557
    @peddamallamuralikrishna5557 Год назад +3

    I followed step by step process which you explained sir, even though i got an error as Classification metrics can't handle a mix of binary and continuous targets for this i used e-mail spam detection data set

    • @kedarhardikar575
      @kedarhardikar575 2 месяца назад

      y_pred=model.predict(X_test)
      y_pred_class = [1 if prob > 0.5 else 0 for prob in y_pred]
      from sklearn.metrics import confusion_matrix
      confusion_matrix(y_test,y_pred_class)

  • @naumanmansuri4441
    @naumanmansuri4441 4 года назад +1

    Atlast released requested during live thanks!

    • @krishnaik06
      @krishnaik06  4 года назад +2

      Trying my best to uplaod soon as possible

  • @jayshreesharma6879
    @jayshreesharma6879 4 года назад +6

    Sir please upload multiclass classification confusion matrix with F1 score precision and other measures..with full code..and explanation...

  • @kalppanwala6439
    @kalppanwala6439 4 года назад +3

    Hey Krish, i got an test accuracy of 99.8% on this dataset using bert :)

    • @krishnaik06
      @krishnaik06  4 года назад +6

      Bert will be the upcoming videos

    • @mohdzuhaib4138
      @mohdzuhaib4138 4 года назад

      What's bert?

    • @kalppanwala6439
      @kalppanwala6439 4 года назад

      krish, if possible also explain about robert xlnet and many such versions
      thanks for your content !!

    • @mizgaanmasani8456
      @mizgaanmasani8456 4 года назад

      @@krishnaik06 We will be eagerly waiting :)

  • @dheerajkumark2268
    @dheerajkumark2268 4 года назад +3

    Thank you so much sir 🙏, it is really helpful.

  • @MuhammadAbdullah-gx2ou
    @MuhammadAbdullah-gx2ou Год назад

    Dear, its great to learn, one thing is can you please make a video on Fake news detection on unlabeled data set and method like transformer based learning technique. Thanks

  • @mak_kry
    @mak_kry 10 месяцев назад

    It’s better to use Stratify when you’re splitting your dataset into train and test to keep the same balance between classes in train and test

  • @rukshanthdevakumar9421
    @rukshanthdevakumar9421 4 года назад +4

    Thanks Krish that was good implementation , am trying to understand if we can use tfidfvectorizer during nstm model , but I can see sequence and one hot gives great result , even though I am just learning NLp with ML still not into deep learning part yet .

  • @Abhishek-fw7oo
    @Abhishek-fw7oo 4 года назад

    Im beginner but still watched full video

  • @shubhamsankpal270
    @shubhamsankpal270 4 года назад +3

    Hey Krish, just a quick question. The X.shape in the video shows the output as (18285, 20) but as the data have one 5 columns of which we are not considering "label" column for X so it should show X.shape as (18285, 4) isn't it?

  • @aekanshgupta6642
    @aekanshgupta6642 2 года назад +1

    Why are we using one hot encoding here instead of TFIDF or Bag of words to vectorize this data?

  • @sandipansarkar9211
    @sandipansarkar9211 3 года назад

    great work .Practiced the code in colab

  • @rog0079
    @rog0079 4 года назад

    hey, great video
    next time can u also add like implementing this trained model with a real example, like creating a negative or positive title and giving it to the model to predict the output, this would just add cherry on top :D

    • @krishnaik06
      @krishnaik06  4 года назад +1

      We can do the deployment of this model and yes it will lool good

    • @navinseab3620
      @navinseab3620 3 года назад

      Do you know how we can make a prediction with this model since we can not save one hot for later use

  • @laharipenmatsa1276
    @laharipenmatsa1276 4 года назад

    Great work sir, plz implement with soft attention- blstm

  • @subarnasamanta4945
    @subarnasamanta4945 4 года назад

    sir i think countvectorizer should only be used during train time i.e. fit(train) and transform through that data only to both whereas you have apply fit_transform at the beginning which may be a wrong approach
    here our model is also learning from test data which should not be our aim

  • @DP-od4yr
    @DP-od4yr 4 года назад +1

    with dropout = 0.3, accuracy is reducing to 67% as false positives are increasing

  • @mohdzuhaib4138
    @mohdzuhaib4138 4 года назад +2

    Amazing thank you

  • @shreyjain6447
    @shreyjain6447 2 года назад +2

    You are splitting the data after using the embedding which might lead to data leakage. Is there a way to avoid this?

  • @louerleseigneur4532
    @louerleseigneur4532 3 года назад

    Thanks Krish

  • @fatemehebrahimi7575
    @fatemehebrahimi7575 3 года назад

    thanks a lot, it is really helpful.

  • @krishnakumarprathipati7186
    @krishnakumarprathipati7186 3 года назад

    Sir make a video on word embeddings using BERT

  • @pranavpratyush4075
    @pranavpratyush4075 Год назад

    You are using the dataset which has 20 columns - X.Shape (18285, 20) And the link u gave in the description has only 4 columns excluding Label because u use that as output. Kindly provide the dataset link you are using

  • @tarunshukla1521
    @tarunshukla1521 4 года назад

    hey, krish.. been following all videos very closely. did you miss to upload Adam optimizer and binaryentropy video? Cuz the optimizers you discussed in your videos were adagrad, adadelta, rmsprop etc. but in almost all your codes, you use Adam. Would be great if you can explain that too and insert appropriately in the playlists? Thank you for all the hard work.

  • @bhavinmoriya9216
    @bhavinmoriya9216 3 года назад

    Thanks for the video. DO I need to do minmaxscaling?

  • @nek_insan
    @nek_insan 2 года назад

    Hello sir, Thanks for the great explanation, I have one doubt here, could you please clear that. While testing this model if we have sentence more than sent length , then how we can handle the padding?

  • @nickrathee8891
    @nickrathee8891 2 года назад +1

    if we don't have any labeled data then how to do it?

  • @Lucifer-wd7gh
    @Lucifer-wd7gh 3 года назад +1

    Ok guys I have one request, use lemmatize instead of stemming

  • @ravilourembam9412
    @ravilourembam9412 3 года назад

    @Krish Naik can you do a tutorial on different level sentiment analysis.. classifocation at word level , sentence level, doc level?

  • @munishrajora2303
    @munishrajora2303 4 года назад

    We should use padding = "Pre" or "post" . can you help to provid the mathematical insights.

  • @dineshkumarprabhakar5525
    @dineshkumarprabhakar5525 4 года назад

    it is ok with binary classification. what changes are required for the multiclass classification.

  • @datastory5244
    @datastory5244 3 года назад

    nice video!! but I didn't understand that why you said that we can't achieve an accuracy above 83% without LSTM😕😕😕

  • @mdraihanulislamtomal6064
    @mdraihanulislamtomal6064 2 года назад

    According to Bangla dataset it's not working. what I should do? I am facing problem,when I print corpus. It's seems empty such as [ " ", " "," ", " " ]

  • @RitikSingh-ub7kc
    @RitikSingh-ub7kc 4 года назад +1

    Krish, can you explain some applications of nlp using lstm like next word prediction, translation and Image captioning ?

  • @joeljoseph26
    @joeljoseph26 3 года назад +2

    sent_length=20 is chosen based on the maximum number of words in a sentence or is it top 20 words or random? If there are words in a sentence of more than 20 words, then what to do (adjust the sent_length, or is there any method to handle this scenario)?

    • @novagamings4505
      @novagamings4505 6 месяцев назад

      Use the value of sent_length depending on the sentence which contains the maximum words.

  • @tttanvi
    @tttanvi 4 года назад

    Can you pls create video for glove and word2vec also for the same problem - Fake News Classifier.

  • @Vignesh0206
    @Vignesh0206 4 года назад +2

    sir, is the deep learning playlist of yours is completely enough to learn the deep learning ?
    or do u have plans to update it ?

    • @krishnaik06
      @krishnaik06  4 года назад +1

      More videos will be updated...

  • @theniyal
    @theniyal 4 года назад +1

    Hey Krish, one hot does the same thing as Tensorflow Tokenizer, right?

    • @krishnaik06
      @krishnaik06  4 года назад +1

      See my previous videos to understand about one hot representation

  • @Jai_Ram.2602
    @Jai_Ram.2602 3 года назад

    sir actually u are applying pad_sequence for max_len = 20,but length of some list in one_hotrepr is more than 20 ,so it decreases that sentences to 20 words if u do so we lose some words right sir......so can we find the maxlength of the sentences in one hotrep and give it as max_len right sir

  • @harissaeed5811
    @harissaeed5811 2 года назад

    @krish Naik Sir i am following ur pattern on Roman Urdu data set but my accuracy is nearly 48 percent . how can i implement roman urdu stop word please any 1 help me in this

  • @viviennele7760
    @viviennele7760 4 года назад

    great video! how do you plot the graph of the training loss vs. validation loss over the number of epochs? and how do you plot the graph of training accuracy vs. validation accuracy over the number of epochs?

    • @novagamings4505
      @novagamings4505 6 месяцев назад

      ```model_history = classifier.fit(x_train,y_train, validation_split = 0.33, batch_size=10, epochs = 100)
      plt.plot(model_history.history["accuracy"])
      plt.plot(model_history.history["val_accuracy"])
      plt.title("model accuracy")
      plt.ylabel("accuracy")
      plt.xlabel("epoch")
      plt.legend(["train","test"], loc="upper left")
      plt.show()```
      You can use this code. Make sure to save model history when training the model.

  • @ssamiit
    @ssamiit 3 года назад

    Great video! Just one comment - the validation set is same as X_test, and its accuracy is being displayed after each epoch. So, maybe there is no need to predict on X_test (again).

    • @jaikishank
      @jaikishank 3 года назад

      Suggestion is we can use the following code to extract the validation data from the train set itself.
      base_history=model.fit(X_train,y_train,validation_split=0.2,epochs=10,batch_size=100,verbose=1)

    • @prakashkafle454
      @prakashkafle454 3 года назад +1

      how this can be modified for multicalss ?? i am getting an error

    • @yogendrapratapsingh7618
      @yogendrapratapsingh7618 3 года назад

      @@prakashkafle454 Hey did you get the answer cuz im having the same problem

    • @prakashkafle454
      @prakashkafle454 3 года назад

      @@yogendrapratapsingh7618 ya I did it myself

    • @manasviemmadi8072
      @manasviemmadi8072 Год назад

      @@prakashkafle454 how? could you tell as to what lines of code you added?

  • @AkshayDudvadkar
    @AkshayDudvadkar 3 года назад

    #preprocessing the data
    import gensim
    title_text = df.title.apply(gensim.utils.simple_preprocess)

  • @ayushthakral6692
    @ayushthakral6692 2 года назад

    In my code , the review in for loop only shows last row after re.sub...Please help me with this.

  • @mihirjoshi8792
    @mihirjoshi8792 4 года назад

    i have one question, how will it identify negative statements? for example: "this earphone is not good as the another one.". In this statement stopwords will remove "not" but it is the most important word.

  • @subhanjanbasu6148
    @subhanjanbasu6148 4 года назад +2

    Hey Krish, do we need good GPU for running deep learning projects?

  • @Sonu-ev9zp
    @Sonu-ev9zp 4 года назад +2

    Sir please make a video on deployment of ml model on android app...

    • @krishnaik06
      @krishnaik06  4 года назад +1

      Sure in the upcoming videos

  • @chitneedihemanthsaikumar7511
    @chitneedihemanthsaikumar7511 4 года назад +1

    Hi sir,
    Why is the padding done 'pre'..is there any specific reason.. i mean can it be done as 'post'
    also the GIT URL is having apostrophe ' which is creating a 404 please correct if possible.

    • @krishnaik06
      @krishnaik06  4 года назад

      U can use post and verify which works well...fixed the github issue

  • @ketkinimdeo1304
    @ketkinimdeo1304 4 года назад

    hi krish, can you make a video on time series example using LSTM?

  • @roshankumarsharma8725
    @roshankumarsharma8725 4 года назад

    sir please make a video on sentiment analysis

  • @ajitkumar2670
    @ajitkumar2670 4 года назад

    Hello Krish,
    I hope you are doing
    can you please tell me how to decide vocab_size

  • @unclesam7853
    @unclesam7853 3 года назад

    Do you know how to predict whether a single sample is spam or not?
    Like there is this text msg and we have to predict whether it is spam or not
    Using this model to make a single prediction and get the probability that this msg is spam/fake news

  • @iamrahulkumar11
    @iamrahulkumar11 3 года назад

    Cast string to float is not supported
    [[node binary_crossentropy/Cast (defined at :2) ]] [Op:__inference_train_function_9451]
    Function call stack:
    train_function
    Hi sir , I am getting this error while fitting the model , could you help me out

  • @PatientInAffliction
    @PatientInAffliction 3 года назад

    can someone please explain why vocab size of 5000 was chosen? Shouldnt vocab size be = the vocab of the corpus?

  • @md.hafizurrahman5590
    @md.hafizurrahman5590 4 года назад

    Thank you so much.

  • @sreekanthkumar4297
    @sreekanthkumar4297 3 года назад

    what we will be giving input to this model

  • @oriabnu1
    @oriabnu1 3 года назад

    how can we use LSTM for text steganography

  • @bhismaosti7949
    @bhismaosti7949 4 года назад +1

    Hello Data Science Community, i need a small help for the deployment of DL model in heroku . I want to deploy a model using flask which include tensorflow2.2.0 but the tf is 500+ mb and couldnot support more than 500+ mb by heroku and gives application error. Please provide me a help. (Note: I am a begineer in DL field)

  • @abhicasm9237
    @abhicasm9237 Год назад

    how does X.shape is (18285, 20). Doesn't that 20 means 20 features. I could only see 4. Somebody please explain

  • @tech-talks-with-shakeel0346
    @tech-talks-with-shakeel0346 4 года назад +1

    Hi krish sir
    I have developed my own emotion detection lstm tensorflow model for our regional language.
    How to deploye this model with an android app to make it into production ?? Any help plz

  • @somyajain5579
    @somyajain5579 4 года назад

    Hello Krish... how to predict for a single instance in this case...?

  • @shindepratibha31
    @shindepratibha31 4 года назад

    What does LSTM(100) indicate? I mean we have 3 gates in LSTM and how this 100 neurons coming into picture? Can anyone please explain this?

  • @mahendrakumarbhooshan9706
    @mahendrakumarbhooshan9706 4 года назад

    Sir can you provide a tutorial on how to train annotated video dataset? There isn't a quality article on this topic.

  • @ganeshgulati5780
    @ganeshgulati5780 4 года назад +1

    Hi Krish, can we use Pytorch ?

  • @MAYANKAGARWAL1234
    @MAYANKAGARWAL1234 4 года назад

    Hi Krish, I wanted to know that why we didn't used any Flatten layer as embedding will be a 2d Vector?

    • @karthikvegeta5981
      @karthikvegeta5981 3 года назад

      sir may i know why we use 1d and 2d layer in nlp like we use in cnn?

  • @sagarwaghela1118
    @sagarwaghela1118 4 года назад

    Krish sir in case my Data dose not have labels feature that is 0 and 1 then in that case do we have to give that manually or is there any coding technique for the same?

  • @satyammishra3355
    @satyammishra3355 4 года назад +1

    If we remove the emojis, can we increase the accuracy in similar datasets?

  • @pratikshajd
    @pratikshajd 3 года назад

    Hello,your all videos are very informative.I want 1 help related to data set...how can I contact you.please let me know.Thanyou

  • @milantripathi720
    @milantripathi720 4 года назад

    would you decrease audio volume for start of your video....the difference is too much...before and after

  • @aqibhuda7652
    @aqibhuda7652 4 года назад +1

    do we need to code every single step or is there a hint of code already provided in a tensorflow website? please reply

    • @krishnaik06
      @krishnaik06  4 года назад +1

      This is not from tensorflow website...we have to write our own code

  • @pallebharath3138
    @pallebharath3138 4 года назад

    sir while fitting the model we are getting accuracy 1.00..Why dont the model is said to be overfitting..Can anyone sort out this.

  • @sachins4522
    @sachins4522 4 года назад

    hi,
    is there any impact of pre or post padding sequence?

  • @shashanksingh5411
    @shashanksingh5411 4 года назад +1

    CAN WE SEE THE DEPLOYMENT OF THIS PROJECT.

  • @manojrangera
    @manojrangera 3 года назад

    In this model the validation loss is increasing so model is Overfitting so to stop that we need to use early stopping so that we didn't overfit our model...

  • @navinseab3620
    @navinseab3620 3 года назад

    How can we predict new sentence with this model sir?

  • @VarunSharma-ym2ns
    @VarunSharma-ym2ns 4 года назад

    Hey Krish, The X.shape in the video showing (18285, 20) the data have 5 columns, while I am using my train.csv, it is showing X.shape as (18285, 4). Could you please tell this reason. Where is the mistake

    • @devendrachavan765
      @devendrachavan765 4 года назад

      (18285, 20)==after doing embedding got 20 columns

    • @manasviemmadi8072
      @manasviemmadi8072 Год назад

      @@devendrachavan765 no, check in the beginning itself...it shows 20 columns for x.shape

  • @penugondasaichand692
    @penugondasaichand692 3 года назад

    is embedding layer and word2vec are same ???

  • @PraveenKumar-pd9sx
    @PraveenKumar-pd9sx 4 года назад

    Why No Flatten layer before Dense?

  • @mikiyasassefakassa9136
    @mikiyasassefakassa9136 2 года назад

    sir how to remove self built stopwords using python

  • @manishsharma2211
    @manishsharma2211 4 года назад

    Why the x shape has 20 cols ?? It has only 4 right

  • @subhashachutha7413
    @subhashachutha7413 4 года назад +1

    krish if words mmore than voc_size then what happens???

    • @krishnaik06
      @krishnaik06  4 года назад +1

      I don't think so there will be more than 10k vocab size

    • @subhashachutha7413
      @subhashachutha7413 4 года назад

      If it happens like i am having more than 20k unique words in tweet vocab size is less than that then what happens sir????

    • @ifmondayhadaface9490
      @ifmondayhadaface9490 4 года назад +1

      Subhash Achutha This comment is late, but if vocab_size is 10,000, it will only take the 10000 most common words. Any other words will be deleted.

    • @subhashachutha7413
      @subhashachutha7413 4 года назад

      @@ifmondayhadaface9490 thank you for answering

    • @ifmondayhadaface9490
      @ifmondayhadaface9490 4 года назад +1

      Subhash Achutha You’re welcome.

  • @sanabasingha8918
    @sanabasingha8918 2 года назад

    frustrated.......
    NotImplementedError in LSTM
    Tried many solutions yet not resolved....

  • @jitendrakumarshah3443
    @jitendrakumarshah3443 4 года назад

    How we can decide length of the sentence?i.e set_legth?

  • @karansagar7870
    @karansagar7870 3 года назад

    Error tokenizing data. C error: Expected 5 fields in line 4772, saw 7
    same dataset giving error on colab but working on kaggle

  • @arjyabasu1311
    @arjyabasu1311 4 года назад

    Please do deployment sir!

  • @kumaragurumohan9751
    @kumaragurumohan9751 3 года назад

    Could you please provide the notebook link

  • @kjayeshnaidu6012
    @kjayeshnaidu6012 4 года назад

    Sir I have a question when I used the text attribute instead of title I got an validation accuracy of 83%. Can anybody help me with how can I improve

  • @rohitmohite7997
    @rohitmohite7997 4 года назад

    Share the f1 score @krish sir..

  • @MrHamid-ct7hy
    @MrHamid-ct7hy 2 года назад

    y_pred=model.predict_classes(X_test)
    after this line i got this error and when i solve this error the accuracy of model only 56%
    error
    ('Sequential' object has no attribute 'predict_classes')

    • @guptagaurav916
      @guptagaurav916 2 года назад

      Please post if you find a solution

    • @heysoumyadeep
      @heysoumyadeep 2 года назад +1

      Try
      y_pred = (model.predict(X_test) > 0.5).astype("int64")

    • @MrHamid-ct7hy
      @MrHamid-ct7hy 2 года назад +1

      @@heysoumyadeep got 50 percent accuracy to using this

    • @rudranshbhardwaj2547
      @rudranshbhardwaj2547 2 года назад

      @@MrHamid-ct7hy my accuracy is 66%

    • @rudranshbhardwaj2547
      @rudranshbhardwaj2547 2 года назад

      bro just the last 5 cell continously you will get some better accuracy eash time.

  • @AvinashKumar-sl4ew
    @AvinashKumar-sl4ew 4 года назад

    What is the level of this project? Who can try this a beginner , Intermediate or Advance?

  • @anaswarak6242
    @anaswarak6242 3 года назад

    How test_size is taken? 18:30

  • @PAVANKUMAR-vx9ty
    @PAVANKUMAR-vx9ty 4 года назад +1

    What about docker tutorials tat you started?

    • @krishnaik06
      @krishnaik06  4 года назад +2

      Coming up.. tomorrow next video

    • @PAVANKUMAR-vx9ty
      @PAVANKUMAR-vx9ty 4 года назад +1

      @@krishnaik06 can you please add django like flask to create docker and then after kubernetes

    • @krishnaik06
      @krishnaik06  4 года назад +1

      Yes after this I can go ahead with django too