Dude you are the best ML tutor ever. You not only focus on theory but also teach practical implementations and that too with in a video not more than 30 minutes. I truly love your teachings and am always excited for latest of your videos
those who are seeing this after tensorflow 2.5.0, predict_classes(X_test) will not work, instead use: y_pred = (model.predict(X_test) > 0.5).astype("int32")
Hey Kish, the video was great. In NLP the most important step is preprocessing. It will be very useful if you can demonstrate different preprocessing techniques for different problems like Twitter analysis where hashtags and URLs should be removed. If possible make a video tutorial on Regex.
sir you have achieved more accuracy with the help of passiveaggressive classifier and i too 90% is less than that(in your members only project playlist)
Sir you're doing a great job. I am one of your faithful fans. My suggestion though, sir try to always include test data in your tutorial. So that we can always see the model effectiveness on test data after using the model to make prediction on unseen test data. And this is very Paramount to all your fans. I must say.... Thank you very much for your great job always ♥️👍🙏
I followed step by step process which you explained sir, even though i got an error as Classification metrics can't handle a mix of binary and continuous targets for this i used e-mail spam detection data set
y_pred=model.predict(X_test) y_pred_class = [1 if prob > 0.5 else 0 for prob in y_pred] from sklearn.metrics import confusion_matrix confusion_matrix(y_test,y_pred_class)
Dear, its great to learn, one thing is can you please make a video on Fake news detection on unlabeled data set and method like transformer based learning technique. Thanks
Thanks Krish that was good implementation , am trying to understand if we can use tfidfvectorizer during nstm model , but I can see sequence and one hot gives great result , even though I am just learning NLp with ML still not into deep learning part yet .
Hey Krish, just a quick question. The X.shape in the video shows the output as (18285, 20) but as the data have one 5 columns of which we are not considering "label" column for X so it should show X.shape as (18285, 4) isn't it?
hey, great video next time can u also add like implementing this trained model with a real example, like creating a negative or positive title and giving it to the model to predict the output, this would just add cherry on top :D
sir i think countvectorizer should only be used during train time i.e. fit(train) and transform through that data only to both whereas you have apply fit_transform at the beginning which may be a wrong approach here our model is also learning from test data which should not be our aim
You are using the dataset which has 20 columns - X.Shape (18285, 20) And the link u gave in the description has only 4 columns excluding Label because u use that as output. Kindly provide the dataset link you are using
hey, krish.. been following all videos very closely. did you miss to upload Adam optimizer and binaryentropy video? Cuz the optimizers you discussed in your videos were adagrad, adadelta, rmsprop etc. but in almost all your codes, you use Adam. Would be great if you can explain that too and insert appropriately in the playlists? Thank you for all the hard work.
Hello sir, Thanks for the great explanation, I have one doubt here, could you please clear that. While testing this model if we have sentence more than sent length , then how we can handle the padding?
According to Bangla dataset it's not working. what I should do? I am facing problem,when I print corpus. It's seems empty such as [ " ", " "," ", " " ]
sent_length=20 is chosen based on the maximum number of words in a sentence or is it top 20 words or random? If there are words in a sentence of more than 20 words, then what to do (adjust the sent_length, or is there any method to handle this scenario)?
sir actually u are applying pad_sequence for max_len = 20,but length of some list in one_hotrepr is more than 20 ,so it decreases that sentences to 20 words if u do so we lose some words right sir......so can we find the maxlength of the sentences in one hotrep and give it as max_len right sir
@krish Naik Sir i am following ur pattern on Roman Urdu data set but my accuracy is nearly 48 percent . how can i implement roman urdu stop word please any 1 help me in this
great video! how do you plot the graph of the training loss vs. validation loss over the number of epochs? and how do you plot the graph of training accuracy vs. validation accuracy over the number of epochs?
```model_history = classifier.fit(x_train,y_train, validation_split = 0.33, batch_size=10, epochs = 100) plt.plot(model_history.history["accuracy"]) plt.plot(model_history.history["val_accuracy"]) plt.title("model accuracy") plt.ylabel("accuracy") plt.xlabel("epoch") plt.legend(["train","test"], loc="upper left") plt.show()``` You can use this code. Make sure to save model history when training the model.
Great video! Just one comment - the validation set is same as X_test, and its accuracy is being displayed after each epoch. So, maybe there is no need to predict on X_test (again).
Suggestion is we can use the following code to extract the validation data from the train set itself. base_history=model.fit(X_train,y_train,validation_split=0.2,epochs=10,batch_size=100,verbose=1)
i have one question, how will it identify negative statements? for example: "this earphone is not good as the another one.". In this statement stopwords will remove "not" but it is the most important word.
Hi sir, Why is the padding done 'pre'..is there any specific reason.. i mean can it be done as 'post' also the GIT URL is having apostrophe ' which is creating a 404 please correct if possible.
Do you know how to predict whether a single sample is spam or not? Like there is this text msg and we have to predict whether it is spam or not Using this model to make a single prediction and get the probability that this msg is spam/fake news
Cast string to float is not supported [[node binary_crossentropy/Cast (defined at :2) ]] [Op:__inference_train_function_9451] Function call stack: train_function Hi sir , I am getting this error while fitting the model , could you help me out
Hello Data Science Community, i need a small help for the deployment of DL model in heroku . I want to deploy a model using flask which include tensorflow2.2.0 but the tf is 500+ mb and couldnot support more than 500+ mb by heroku and gives application error. Please provide me a help. (Note: I am a begineer in DL field)
Hi krish sir I have developed my own emotion detection lstm tensorflow model for our regional language. How to deploye this model with an android app to make it into production ?? Any help plz
Krish sir in case my Data dose not have labels feature that is 0 and 1 then in that case do we have to give that manually or is there any coding technique for the same?
In this model the validation loss is increasing so model is Overfitting so to stop that we need to use early stopping so that we didn't overfit our model...
Hey Krish, The X.shape in the video showing (18285, 20) the data have 5 columns, while I am using my train.csv, it is showing X.shape as (18285, 4). Could you please tell this reason. Where is the mistake
y_pred=model.predict_classes(X_test) after this line i got this error and when i solve this error the accuracy of model only 56% error ('Sequential' object has no attribute 'predict_classes')
Dude you are the best ML tutor ever. You not only focus on theory but also teach practical implementations and that too with in a video not more than 30 minutes. I truly love your teachings and am always excited for latest of your videos
those who are seeing this after tensorflow 2.5.0, predict_classes(X_test) will not work, instead use:
y_pred = (model.predict(X_test) > 0.5).astype("int32")
I used this but after using this accuracy only 40 percent
Thanks for pointing out this crucial information....can you please give the reference link for ".predict()"
Yaa it happened to me too,
So do you find any solution to that ?@@MrHamid-ct7hy
Thank you for the solution...😊❤🙌
That was a great presentation and could work on 23000 observation data set with good accuracy. Many thanks Mr Krish for the knowledge disseminated.
Thank you so much. I was waiting for this session for months..
Finally
@@krishnaik06 Hola, ¿podrias activar los subtitulos? Por favor. Muchas gracias
Great work! Please make a video on LSTM with class imbalance dataset
Hey Kish, the video was great. In NLP the most important step is preprocessing. It will be very useful if you can demonstrate different preprocessing techniques for different problems like Twitter analysis where hashtags and URLs should be removed. If possible make a video tutorial on Regex.
sir you have achieved more accuracy with the help of passiveaggressive classifier and i too 90% is less than that(in your members only project playlist)
Sir you're doing a great job. I am one of your faithful fans. My suggestion though, sir try to always include test data in your tutorial. So that we can always see the model effectiveness on test data after using the model to make prediction on unseen test data. And this is very Paramount to all your fans. I must say.... Thank you very much for your great job always ♥️👍🙏
Krish sir in my opinion you are the best data science teacher in youtube.please keep making practical sessions like this in deep learning
I followed step by step process which you explained sir, even though i got an error as Classification metrics can't handle a mix of binary and continuous targets for this i used e-mail spam detection data set
y_pred=model.predict(X_test)
y_pred_class = [1 if prob > 0.5 else 0 for prob in y_pred]
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,y_pred_class)
Atlast released requested during live thanks!
Trying my best to uplaod soon as possible
Sir please upload multiclass classification confusion matrix with F1 score precision and other measures..with full code..and explanation...
Sure
Hey Krish, i got an test accuracy of 99.8% on this dataset using bert :)
Bert will be the upcoming videos
What's bert?
krish, if possible also explain about robert xlnet and many such versions
thanks for your content !!
@@krishnaik06 We will be eagerly waiting :)
Thank you so much sir 🙏, it is really helpful.
Dear, its great to learn, one thing is can you please make a video on Fake news detection on unlabeled data set and method like transformer based learning technique. Thanks
It’s better to use Stratify when you’re splitting your dataset into train and test to keep the same balance between classes in train and test
Thanks Krish that was good implementation , am trying to understand if we can use tfidfvectorizer during nstm model , but I can see sequence and one hot gives great result , even though I am just learning NLp with ML still not into deep learning part yet .
Check my NLP playlist
@@krishnaik06 thank you
Im beginner but still watched full video
Hey Krish, just a quick question. The X.shape in the video shows the output as (18285, 20) but as the data have one 5 columns of which we are not considering "label" column for X so it should show X.shape as (18285, 4) isn't it?
True.
The same here
Why are we using one hot encoding here instead of TFIDF or Bag of words to vectorize this data?
great work .Practiced the code in colab
hey, great video
next time can u also add like implementing this trained model with a real example, like creating a negative or positive title and giving it to the model to predict the output, this would just add cherry on top :D
We can do the deployment of this model and yes it will lool good
Do you know how we can make a prediction with this model since we can not save one hot for later use
Great work sir, plz implement with soft attention- blstm
sir i think countvectorizer should only be used during train time i.e. fit(train) and transform through that data only to both whereas you have apply fit_transform at the beginning which may be a wrong approach
here our model is also learning from test data which should not be our aim
with dropout = 0.3, accuracy is reducing to 67% as false positives are increasing
Amazing thank you
You are splitting the data after using the embedding which might lead to data leakage. Is there a way to avoid this?
Thanks Krish
thanks a lot, it is really helpful.
Sir make a video on word embeddings using BERT
You are using the dataset which has 20 columns - X.Shape (18285, 20) And the link u gave in the description has only 4 columns excluding Label because u use that as output. Kindly provide the dataset link you are using
hey, krish.. been following all videos very closely. did you miss to upload Adam optimizer and binaryentropy video? Cuz the optimizers you discussed in your videos were adagrad, adadelta, rmsprop etc. but in almost all your codes, you use Adam. Would be great if you can explain that too and insert appropriately in the playlists? Thank you for all the hard work.
Thanks for the video. DO I need to do minmaxscaling?
Hello sir, Thanks for the great explanation, I have one doubt here, could you please clear that. While testing this model if we have sentence more than sent length , then how we can handle the padding?
if we don't have any labeled data then how to do it?
Ok guys I have one request, use lemmatize instead of stemming
@Krish Naik can you do a tutorial on different level sentiment analysis.. classifocation at word level , sentence level, doc level?
We should use padding = "Pre" or "post" . can you help to provid the mathematical insights.
it is ok with binary classification. what changes are required for the multiclass classification.
nice video!! but I didn't understand that why you said that we can't achieve an accuracy above 83% without LSTM😕😕😕
According to Bangla dataset it's not working. what I should do? I am facing problem,when I print corpus. It's seems empty such as [ " ", " "," ", " " ]
Krish, can you explain some applications of nlp using lstm like next word prediction, translation and Image captioning ?
Sure Ritik
sent_length=20 is chosen based on the maximum number of words in a sentence or is it top 20 words or random? If there are words in a sentence of more than 20 words, then what to do (adjust the sent_length, or is there any method to handle this scenario)?
Use the value of sent_length depending on the sentence which contains the maximum words.
Can you pls create video for glove and word2vec also for the same problem - Fake News Classifier.
sir, is the deep learning playlist of yours is completely enough to learn the deep learning ?
or do u have plans to update it ?
More videos will be updated...
Hey Krish, one hot does the same thing as Tensorflow Tokenizer, right?
See my previous videos to understand about one hot representation
sir actually u are applying pad_sequence for max_len = 20,but length of some list in one_hotrepr is more than 20 ,so it decreases that sentences to 20 words if u do so we lose some words right sir......so can we find the maxlength of the sentences in one hotrep and give it as max_len right sir
@krish Naik Sir i am following ur pattern on Roman Urdu data set but my accuracy is nearly 48 percent . how can i implement roman urdu stop word please any 1 help me in this
great video! how do you plot the graph of the training loss vs. validation loss over the number of epochs? and how do you plot the graph of training accuracy vs. validation accuracy over the number of epochs?
```model_history = classifier.fit(x_train,y_train, validation_split = 0.33, batch_size=10, epochs = 100)
plt.plot(model_history.history["accuracy"])
plt.plot(model_history.history["val_accuracy"])
plt.title("model accuracy")
plt.ylabel("accuracy")
plt.xlabel("epoch")
plt.legend(["train","test"], loc="upper left")
plt.show()```
You can use this code. Make sure to save model history when training the model.
Great video! Just one comment - the validation set is same as X_test, and its accuracy is being displayed after each epoch. So, maybe there is no need to predict on X_test (again).
Suggestion is we can use the following code to extract the validation data from the train set itself.
base_history=model.fit(X_train,y_train,validation_split=0.2,epochs=10,batch_size=100,verbose=1)
how this can be modified for multicalss ?? i am getting an error
@@prakashkafle454 Hey did you get the answer cuz im having the same problem
@@yogendrapratapsingh7618 ya I did it myself
@@prakashkafle454 how? could you tell as to what lines of code you added?
#preprocessing the data
import gensim
title_text = df.title.apply(gensim.utils.simple_preprocess)
In my code , the review in for loop only shows last row after re.sub...Please help me with this.
i have one question, how will it identify negative statements? for example: "this earphone is not good as the another one.". In this statement stopwords will remove "not" but it is the most important word.
Hey Krish, do we need good GPU for running deep learning projects?
Start with Google colab
Sir please make a video on deployment of ml model on android app...
Sure in the upcoming videos
Hi sir,
Why is the padding done 'pre'..is there any specific reason.. i mean can it be done as 'post'
also the GIT URL is having apostrophe ' which is creating a 404 please correct if possible.
U can use post and verify which works well...fixed the github issue
hi krish, can you make a video on time series example using LSTM?
sir please make a video on sentiment analysis
Hello Krish,
I hope you are doing
can you please tell me how to decide vocab_size
Do you know how to predict whether a single sample is spam or not?
Like there is this text msg and we have to predict whether it is spam or not
Using this model to make a single prediction and get the probability that this msg is spam/fake news
Cast string to float is not supported
[[node binary_crossentropy/Cast (defined at :2) ]] [Op:__inference_train_function_9451]
Function call stack:
train_function
Hi sir , I am getting this error while fitting the model , could you help me out
can someone please explain why vocab size of 5000 was chosen? Shouldnt vocab size be = the vocab of the corpus?
Thank you so much.
what we will be giving input to this model
how can we use LSTM for text steganography
Hello Data Science Community, i need a small help for the deployment of DL model in heroku . I want to deploy a model using flask which include tensorflow2.2.0 but the tf is 500+ mb and couldnot support more than 500+ mb by heroku and gives application error. Please provide me a help. (Note: I am a begineer in DL field)
Deploy it on aws, Heroku allows deployment of
how does X.shape is (18285, 20). Doesn't that 20 means 20 features. I could only see 4. Somebody please explain
Hi krish sir
I have developed my own emotion detection lstm tensorflow model for our regional language.
How to deploye this model with an android app to make it into production ?? Any help plz
search for tensorflow lite
Hello Krish... how to predict for a single instance in this case...?
What does LSTM(100) indicate? I mean we have 3 gates in LSTM and how this 100 neurons coming into picture? Can anyone please explain this?
Sir can you provide a tutorial on how to train annotated video dataset? There isn't a quality article on this topic.
Hi Krish, can we use Pytorch ?
Yes ofcourse
Hi Krish, I wanted to know that why we didn't used any Flatten layer as embedding will be a 2d Vector?
sir may i know why we use 1d and 2d layer in nlp like we use in cnn?
Krish sir in case my Data dose not have labels feature that is 0 and 1 then in that case do we have to give that manually or is there any coding technique for the same?
If we remove the emojis, can we increase the accuracy in similar datasets?
Hello,your all videos are very informative.I want 1 help related to data set...how can I contact you.please let me know.Thanyou
would you decrease audio volume for start of your video....the difference is too much...before and after
do we need to code every single step or is there a hint of code already provided in a tensorflow website? please reply
This is not from tensorflow website...we have to write our own code
sir while fitting the model we are getting accuracy 1.00..Why dont the model is said to be overfitting..Can anyone sort out this.
hi,
is there any impact of pre or post padding sequence?
CAN WE SEE THE DEPLOYMENT OF THIS PROJECT.
In this model the validation loss is increasing so model is Overfitting so to stop that we need to use early stopping so that we didn't overfit our model...
How can we predict new sentence with this model sir?
Hey Krish, The X.shape in the video showing (18285, 20) the data have 5 columns, while I am using my train.csv, it is showing X.shape as (18285, 4). Could you please tell this reason. Where is the mistake
(18285, 20)==after doing embedding got 20 columns
@@devendrachavan765 no, check in the beginning itself...it shows 20 columns for x.shape
is embedding layer and word2vec are same ???
Why No Flatten layer before Dense?
sir how to remove self built stopwords using python
Why the x shape has 20 cols ?? It has only 4 right
krish if words mmore than voc_size then what happens???
I don't think so there will be more than 10k vocab size
If it happens like i am having more than 20k unique words in tweet vocab size is less than that then what happens sir????
Subhash Achutha This comment is late, but if vocab_size is 10,000, it will only take the 10000 most common words. Any other words will be deleted.
@@ifmondayhadaface9490 thank you for answering
Subhash Achutha You’re welcome.
frustrated.......
NotImplementedError in LSTM
Tried many solutions yet not resolved....
How we can decide length of the sentence?i.e set_legth?
Error tokenizing data. C error: Expected 5 fields in line 4772, saw 7
same dataset giving error on colab but working on kaggle
Please do deployment sir!
Could you please provide the notebook link
Sir I have a question when I used the text attribute instead of title I got an validation accuracy of 83%. Can anybody help me with how can I improve
Share the f1 score @krish sir..
y_pred=model.predict_classes(X_test)
after this line i got this error and when i solve this error the accuracy of model only 56%
error
('Sequential' object has no attribute 'predict_classes')
Please post if you find a solution
Try
y_pred = (model.predict(X_test) > 0.5).astype("int64")
@@heysoumyadeep got 50 percent accuracy to using this
@@MrHamid-ct7hy my accuracy is 66%
bro just the last 5 cell continously you will get some better accuracy eash time.
What is the level of this project? Who can try this a beginner , Intermediate or Advance?
How test_size is taken? 18:30
What about docker tutorials tat you started?
Coming up.. tomorrow next video
@@krishnaik06 can you please add django like flask to create docker and then after kubernetes
Yes after this I can go ahead with django too