I really appreciate for putting the effort and not to mention I haven't use tensorflow before , but still I was able to understand everything properly, I'll get an overview of tensorflow first then continue with Model deployment. Much respect, Mr.Srivatsan.
An another excellent video srivat's sir... For those having problem at 16.06, replace * tfds.features.text.Tokenizer() with tfds.deprecated.text.Tokenizer() at 17.21 * tfds.features.text.TokenTextEncoder (vocabulary_set) with tfds.deprecated.text.TokenTextEncoder(vocabulary_set)
Thanks Induraj.. I thought I pinned the changes but forgot. TFDS new version has changed and either change to deprecated as mentioned or use Tokenizer in tf keras - www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer I will try to create a video that covers completely with TF data soon
Thank you for taking the time to do this. It would be interesting to have a little more information on a couple of the aspects for a future version: 1. Why you didn’t add a dropout to the model 2. Why 4 epochs specifically 3. Maybe summarise the big caveats up front. Eg No stopwords, no lemmatisation, no validation set Other thoughts: 4. It’d be cool to see a version using transformers. They have 5. Investigation into word bias would be interesting. Identifying ostensibly neutral words like product names and their bias that bleeds into other reviews. 6. Adding other tabular (vs nlp) features would be interesting too. One benefit of deep learning is that you don’t know what the relevance is of features.
Julian. Thank you for watching. I did not add explicitly dropout as wanted to show over fitting and mentioned in video on adding dropout, regularization or recurrent dropout. Infact I have commented the section as well Epochs as I realized 2 is right. I would have preferred early stopping callback though Will do others you have suggested in one of my upcoming video
Thank you very much this is the very detailed information of tensorflow I found till date and it also helped me in the sentiment analysis project I am working on. hope to see more videos soon.
Hi Srivatsan, you have created excellent youtube playlist under NLP, however I request you to please add a sequence to those 30 videos. Which one to read first and then next is what and so on.
Hello Sir, very nicely explained. Because you are going through NLP topics, could you please make a video on NLP metrics like perplexity, BLEU Score & Coherence measure.
Thanks Nanda.. For some of the videos I will be sharing colab link in my github repo after few weeks. This is one of those videos. You can check my github link below for other code and will upload it post sometime github.com/srivatsan88/RUclipsLI Sorry for the inconvenience..
@@AIEngineeringLife thank you so much sir.. its a great pleasure to view and understand your teachings.. looking forward for more informative videos on statistics and deep learning frameworks and usecases too
Thanks for the great lecture Sir.. the content is crisp and niche. Demos with real dataset, industry standard code and to the point explanation is hard to find. Just a small request Sir.. it would be great if you can include some videos on the Mathematical side of ML and AI
Thank you Mansi.. I do have it in plan but might be for next year. Have too many backlog to clear off. I do have applied statistics and might be able to complete it this year - ruclips.net/p/PL3N9eeOlCrP6IjkyExZW9oZFwt-A1r0qB
This video covered a lot of what I was looking for. Excellent presentation. I have one question: to tokenize my text, I defined a tokenizer, called fit_on_texts() on the training data, then converted to sequence and applied padding. Then I ran the model and saved it. When I load it back, the predictions are all wrong despite the good accuracy. I was told it was a tokenizer issue since I had to define another tokenizer for the prediction text. I can pickle the main tokenizer then load it back in and use it on the prediction text, but do you know a better way? Maybe have a layer in the model that does all that? Thanks.
Hello SIr, Great video and awesome explanation. I am new to ML and I am trying a problem: find whether a text exists in a pdf file or not. Could you pls guide me how to carry out this? Thanks a lot.
Hi sir , Thanks for the lucid explanation. If we have a statement or a paragraph(as in news articles) , how can we come up with the sentiment of those documents, where we do not have a target label or a rating to assign it as positive or negative ? I could only find the Stanford NLP java directory to assign a sentiment , but wasn't fully able to understand.
Have you tried Vader sentiment in python.. I have it covered post half way in video where analyzing sentiment of twitter users - ruclips.net/video/TOb4BPg7Uh8/видео.html
AIEngineering Yes sir but vader is useful only for social media text ( as in short sentences) not really useful when applied to an opinion piece of a news article
Ok now got it. U want to feed entire news article and get sentiment. That will be difficult from tools put there. I might have to do manual labeling and train one or transfer learn existing models. Not much options to my knowledge
Hey im trying to implement the same thing in tensorflow 2. i cant use this tokenizer you used as its been gone. so i used tokenizer = text.WhitespaceTokenizer() from text. but when i try to create a vocab set i cant use tokenizer.tokenize(review_text.get()) as it will give me error tensor is unhashable. Could you tell what might be the issue. The strange thing is that it works in for loop.
My code is in Tensorflow 2.0 only but TF team is changing the apis very frequently. They have deprecated tfds tokenizer and moved it to keras You can either use tfds.deprecated as below in new TF version www.tensorflow.org/datasets/api_docs/python/tfds/deprecated/text/Tokenizer Or use keras one as part of new api www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer
Really thanks for your help again and again. I would use deprecated method if nothing works. how do you suggest i can use keras function.Like i defined my tokenizer as tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=1000, oov_token="", filters='!"#$%&()*+.,-/:;=?@[\]^_`{|}~ ') # look at the data for reviews in train_data.take(10): reviews_data = reviews['data'] print(reviews_data.get('review_body')) # Review Text - use numpy to get numpy array else it will be tensor print(reviews_data.get('star_rating').numpy()) # Rating print(tf.where(reviews_data.get('star_rating') > 3,1,0).numpy()) print(tokenizer.fit_on_texts(reviews_data.get('review_body').numpy())) i cant use it like this. it gives error.
Thanks Sri for the video. I need a small help. Suppose we have unlabelled text data and we need to perform sentiment analysis on it. What are my options. I have tried Vader sentiment and Textblob as pre-trained models but are there any other models that can better Vader and Textblob pre-trained model results
If no labels then the one you have mentioned are options or cloud apis.. I would say better to manually label and build one as vader might not be perfect based on specific industry words
Thanks Imran and appreciate your support and engagement with most of my videos. Will put deployment using flask in few weeks. As you are aware code for this will be shared in future as many skip to code directly based on my previous experience. It will be in my github repo in few weeks
Yes you are right.. Stemming and spell check with reduce the vocab size as well. I just wanted to show the process quickly but I have mentioned it in my video as well that one can further tune it in this case of sentiment analysis.. If we are working on other seq task like NMT then I would leave it as it is
Great video sir. I am training model at google colabotary with a gpu and did same step as you shown but while training the model it took me 3 hours for 2 epochs. can you tell why such great difference in your training time and mine training time.
Can you do !nvidia-smi and tell me what GPU you are allocated and during run can you check if your GPU memory is increasing to check is GPU is being used. Ideally this must not take time but it depends on data and network architecture you are using
Ketul. All of my code is in below git repo. But for some of the videos I might not have uploaded yet and will do it later in time for others to practice rather directly jump on to code github.com/srivatsan88/RUclipsLI
Thanks for video. In the code while doing tokenizing tfds.features.Text.tokenizer() API is not working and giving error.Instead of that I am using tfds.deprecated.text.Tokenizer api.Can you pls let me know if that api deactivated?
Tensorflow has moved the api from tfds to inside tf.keras if you check keras docs you might find it else deprecated is option for now. Let me know if you are not able to find it
Gautam.. that is difficult part. In this case it was trial but I would go for some architecture search like NAS or similar. This space requires compute to get to right neural network architecture else it is time consuming manually
Hello :) Thank you for putting up such an elaborate tutorial. I coded-along and it was a very fulfilling experience. Ended up learning quite a bit. I had a few queries for you: 1. Despite being very comfortable with Numpy and Pandas, I found the initial data pre-processing steps to be difficult to understand (e.g. splitting data into training and test sets, tokenization step, encoding step). Can you please help us with some pre-existing tutorials or (official) TensorFlow docs on this? 2. As Julian has mentioned, wanted to understand how to incorporate lemmatization and stop words in this? 3. How can we incorporate pre-trained word embedding in this model? If possible, do provide your email address in the 'About' section of your channel so that we can reach out to you if needed Thanks again.
Raama.. TF is little confusing unless one get used it. It needs to build scikit learn like pipeline structure that makes it easy to adopt. TF does not have inbuilt support for lemma and others but you can use and python library for it and call that function using map transformation on the data. I will try to do a video on it in future For pre-trained embedding in embedding layer you can use embeddings_initializer parameter and pass the matrix there. I have my Linkedin profile in my channel and LI message is good way to reach me if required
Hi Sir, I have a query that you have used Tensor flow encoded function to convert text into encoded no., why not used TF-IDF or word2vec? Is it due to use embedding layer in Keras itself? Also Can I do Text preprocessing steps like removal stopwords, Normalization etc before model development?
Yes I used because I am training a embedding layer as single Neural Network.. We can also train it externally via glove or word2vec and attach it in NN model
Sir can you tell me what changes need to be made in the model to get neutral sentiments ass well , I was able to divide sentiments according to ratings in -1 , 0, 1 but in model it doesnt support when I tried to enter dense(3)
So basically you are looking at 3 as neutral, < 3 as neg and >3 as pos.. you can check tf.case and see if it helps you.. I have done something in this video on same dataset but i converted to numpy. I would say go with tf.case and see www.tensorflow.org/api_docs/python/tf/case
@@AIEngineeringLife if tf.where(review_text.get('star_rating')==3,1,0).numpy() == 1: print('neutral = 0') elif tf.where(review_text.get('star_rating')>3,1,0).numpy() == 1: print('positive = 1') else: print('negative =-1') I tried this way and it works but do I need to make any changes in the model layers ?
Line 33 I am checking target class distribution. Number of positive vs negative class for 10 batches. Just to see if data is balanced or imbalanced. If it is imbalanced maybe accuracy is not the right metrics to measure
@@AIEngineeringLife Thanks. I have a few more doubts about this end to end Sentiment Analysis project. I will compile all of them and send it to you through an email.
What if i want to change the object from amazon, for example i want to analyze product review from apple phone, what should i put on the data, (amazon-us-reviews/apple phone)?
You can check here on what options are available here. You have mobile phone but not exactly apple phones. You can also get your custom dataset and feed to it or download amazon mobile reviews and search to see if u have large corpus for apple phones - www.tensorflow.org/datasets/catalog/amazon_us_reviews
Say you have very huge text file then pandas will fail to load. So this is one option but if u want pandas based NLP check my apache spark topic modeling video
Dear Srivatsan , Thank you for wonderful tutorial, I am trying to replicate same with colab and colab keeps on crashing. I changed batch size and everything , could you please help me with alternative solution. Thanks Mahesh
While I was implementing in the Jupyter I am constantly getting an error padded_batch() missing 1 required positional argument: 'padded_shapes' while splitting the data into train and test. Will anyone help me with it
@@AIEngineeringLife Hi Sir, I have passed the batch size = 14000 to the padded_batch, somehow in Jupyter I have to pass the padded_shapes=([None], []) to run the code, now it is working fine. PS : when I see Tensorflow documentation it says padded_shapes is mandatory argument to pass.
@@AIEngineeringLife I actually solved the problem It happened because everytime when I run the model I have to create a new encoder due to which encodings changes. ..so this time I saved the encoder to a file and I am using the same encoder
Sir I am getting a error on review_tokens =tokenizer .tokenize(review_text.get('review_body').numpy()) The error saying is tokenize () missing 1 required positional argument :"s" Could u please help me sir
Can you paste the entire block or see if it is similar to below tokenizer = tfds.features.text.Tokenizer() vocabulary_set = set() for _, reviews in train_dataset.enumerate(): #print(reviews) review_text=reviews['data'] reviews_tokens = tokenizer.tokenize(review_text.get('review_body').numpy()) vocabulary_set.update(reviews_tokens) vocab_size = len(vocabulary_set) vocab_size
I'm trying to build an intent classification model with tensorflow.. I'm facing some issues about validation accuracy and prediction accuracy.. I want some expert advice. can you provide your linkedin link or any contact info to help me out Please.
Sir the previous comment i wrote.. I checked the code again.. found only vocab_size incrementing was not done correctly, I changed it and Run all i got 96.12%validation accuracy for 4th epoch. I don't understand the logic behind these variations. please explain
Can you please elobrate again. I did not get the problem clearly. You incremented vocab size and got higher accuracy and are you asking how is it possible?
Thank you for sharing the knowledge. I decided to learn some new skills during the 'Stay Home Stay Safe' time, starting with this video. However, I am stuck at ar_encoded_data = train_dataset.map(encode_map_fn), gives me below error. Could you please help me out? AttributeError: in user code: :10 encode_map_fn * encoded_text.set_shape([None]) AttributeError: 'list' object has no attribute 'set_shape'
Hi Vikram.. Can you check if you have replicated these 2 functions correctly def encode(text_tensor, label_tensor): encoded_text = encoder.encode(text_tensor.numpy()) label = tf.where(label_tensor>3,1,0) return encoded_text, label def encode_map_fn(tensor):
@@AIEngineeringLife It worked although I can't seem to figure out what's the difference in code. Anyways, Thank you and Thank you again for invaluable content.
I really appreciate for putting the effort and not to mention I haven't use tensorflow before , but still I was able to understand everything properly, I'll get an overview of tensorflow first then continue with Model deployment.
Much respect, Mr.Srivatsan.
An another excellent video srivat's sir...
For those having problem
at 16.06, replace
* tfds.features.text.Tokenizer() with tfds.deprecated.text.Tokenizer()
at 17.21
* tfds.features.text.TokenTextEncoder (vocabulary_set) with tfds.deprecated.text.TokenTextEncoder(vocabulary_set)
Thanks Induraj.. I thought I pinned the changes but forgot. TFDS new version has changed and either change to deprecated as mentioned or use Tokenizer in tf keras - www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer
I will try to create a video that covers completely with TF data soon
Great video 👍 Even a beginner can understand each and every concept from your videos sir. Thank you sir
Thank you for taking the time to do this.
It would be interesting to have a little more information on a couple of the aspects for a future version:
1. Why you didn’t add a dropout to the model
2. Why 4 epochs specifically
3. Maybe summarise the big caveats up front. Eg No stopwords, no lemmatisation, no validation set
Other thoughts:
4. It’d be cool to see a version using transformers. They have
5. Investigation into word bias would be interesting. Identifying ostensibly neutral words like product names and their bias that bleeds into other reviews.
6. Adding other tabular (vs nlp) features would be interesting too. One benefit of deep learning is that you don’t know what the relevance is of features.
Julian. Thank you for watching. I did not add explicitly dropout as wanted to show over fitting and mentioned in video on adding dropout, regularization or recurrent dropout. Infact I have commented the section as well
Epochs as I realized 2 is right. I would have preferred early stopping callback though
Will do others you have suggested in one of my upcoming video
Superb presentation ... you are able to explain the concepts very clearly and this is indeed very helpful
Thanks for the good video and for the exaplanation where you explained simply and crisp. from pre processing to model buliding to visualization.
Thank you very much this is the very detailed information of tensorflow I found till date and it also helped me in the sentiment analysis project I am working on. hope to see more videos soon.
Very informative and well-explained as usual. Thank you...
just completed hands on with you ....very interesting ....
Superb Rushikesh..
Thank you sir for such detail explanation
explained really clearly thank you so much!
Thank you so much. This is really helpful.
You are the best...
Hi Srivatsan, you have created excellent youtube playlist under NLP, however I request you to please add a sequence to those 30 videos. Which one to read first and then next is what and so on.
Satish.. will do it. Problem is youtube does not allow name to be changed in playlist but some i realized needs to be re-arranged will do it
Amazing content. Could you please order the videos in your playlists, especially the NLP and Computer Vision playlists
Hello Sir, very nicely explained. Because you are going through NLP topics, could you please make a video on NLP metrics like perplexity, BLEU Score & Coherence measure.
Will try to do it in one of upcoming video
Awesome video sir.. thanks. Please share the colab notebook.
Thanks Nanda.. For some of the videos I will be sharing colab link in my github repo after few weeks. This is one of those videos. You can check my github link below for other code and will upload it post sometime
github.com/srivatsan88/RUclipsLI
Sorry for the inconvenience..
@@AIEngineeringLife thank you so much sir.. its a great pleasure to view and understand your teachings.. looking forward for more informative videos on statistics and deep learning frameworks and usecases too
Thanks for the great lecture Sir.. the content is crisp and niche. Demos with real dataset, industry standard code and to the point explanation is hard to find.
Just a small request Sir.. it would be great if you can include some videos on the Mathematical side of ML and AI
Thank you Mansi.. I do have it in plan but might be for next year. Have too many backlog to clear off. I do have applied statistics and might be able to complete it this year - ruclips.net/p/PL3N9eeOlCrP6IjkyExZW9oZFwt-A1r0qB
@@AIEngineeringLife Sure Sir.. that would be great. Thank You :)
This video covered a lot of what I was looking for. Excellent presentation. I have one question: to tokenize my text, I defined a tokenizer, called fit_on_texts() on the training data, then converted to sequence and applied padding. Then I ran the model and saved it. When I load it back, the predictions are all wrong despite the good accuracy. I was told it was a tokenizer issue since I had to define another tokenizer for the prediction text. I can pickle the main tokenizer then load it back in and use it on the prediction text, but do you know a better way? Maybe have a layer in the model that does all that? Thanks.
Thank you for the video..
Pls make a video on text classification using cnn/RNN and word2vec if possible...
Sudheer.. Have plans for it. Will do it in future once I am done with current backlog
The way you teach is really good and can you share the code so that we can able to re run with other data.
Thanks Prasanna.. I will be sharing the code after few weeks in my git repo
Hello SIr,
Great video and awesome explanation. I am new to ML and I am trying a problem: find whether a text exists in a pdf file or not. Could you pls guide me how to carry out this? Thanks a lot.
Hi sir ,
Thanks for the lucid explanation.
If we have a statement or a paragraph(as in news articles) , how can we come up with the sentiment of those documents, where we do not have a target label or a rating to assign it as positive or negative ? I could only find the Stanford NLP java directory to assign a sentiment , but wasn't fully able to understand.
Have you tried Vader sentiment in python.. I have it covered post half way in video where analyzing sentiment of twitter users - ruclips.net/video/TOb4BPg7Uh8/видео.html
AIEngineering Yes sir
but vader is useful only for social media text ( as in short sentences) not really useful when applied to an opinion piece of a news article
Ok now got it. U want to feed entire news article and get sentiment. That will be difficult from tools put there. I might have to do manual labeling and train one or transfer learn existing models. Not much options to my knowledge
Hey im trying to implement the same thing in tensorflow 2. i cant use this tokenizer you used as its been gone. so i used tokenizer = text.WhitespaceTokenizer() from text. but when i try to create a vocab set i cant use tokenizer.tokenize(review_text.get()) as it will give me error tensor is unhashable. Could you tell what might be the issue. The strange thing is that it works in for loop.
My code is in Tensorflow 2.0 only but TF team is changing the apis very frequently. They have deprecated tfds tokenizer and moved it to keras
You can either use tfds.deprecated as below in new TF version
www.tensorflow.org/datasets/api_docs/python/tfds/deprecated/text/Tokenizer
Or use keras one as part of new api
www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer
@@AIEngineeringLife Will try it out.
Really thanks for your help again and again. I would use deprecated method if nothing works. how do you suggest i can use keras function.Like i defined my tokenizer as
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=1000,
oov_token="",
filters='!"#$%&()*+.,-/:;=?@[\]^_`{|}~ ')
# look at the data
for reviews in train_data.take(10):
reviews_data = reviews['data']
print(reviews_data.get('review_body')) # Review Text - use numpy to get numpy array else it will be tensor
print(reviews_data.get('star_rating').numpy()) # Rating
print(tf.where(reviews_data.get('star_rating') > 3,1,0).numpy())
print(tokenizer.fit_on_texts(reviews_data.get('review_body').numpy()))
i cant use it like this. it gives error.
Thanks Sri for the video.
I need a small help. Suppose we have unlabelled text data and we need to perform sentiment analysis on it. What are my options.
I have tried Vader sentiment and Textblob as pre-trained models but are there any other models that can better Vader and Textblob pre-trained model results
If no labels then the one you have mentioned are options or cloud apis.. I would say better to manually label and build one as vader might not be perfect based on specific industry words
I really appreciate quick replies from you. Thanks a ton for helping me on this.
God bless
Great explanation Sir! Just a request, Can you make videos about tf data..?
Yo first view n as usual video is awesome waiting for deployment in flask and sir plz share the colab link so that we can practice easily
Thanks Imran and appreciate your support and engagement with most of my videos. Will put deployment using flask in few weeks. As you are aware code for this will be shared in future as many skip to code directly based on my previous experience. It will be in my github repo in few weeks
@@AIEngineeringLife thank u so much sir
Hi, do you have the code up in Github yet ? if so, can you please provide us the link ?
@@sandipdas6252 colab.research.google.com/drive/1KYk8mJ6rvrJ6qO7dA7l7dSCc2fezTmMh?usp=sharing
take mine I tried it ^^
Great Video sir , anyone can understand it very easily .
But how can we balance the dataset which we are giving in the batches ?
Rather balancing class you can assign class weights as in the first 2 videos in this playlist - ruclips.net/p/PL3N9eeOlCrP4uLCtas5vxq09sWz6jJXrw
Hi, Can I please get the link for the collab document used in this video or any direction where I could find it?
Thanks
Here you go - github.com/srivatsan88
Sir, would it work if you introduced spell check, Stemming and Lemmatization, this way, the conversion to integer would be consistent?
Yes you are right.. Stemming and spell check with reduce the vocab size as well. I just wanted to show the process quickly but I have mentioned it in my video as well that one can further tune it in this case of sentiment analysis.. If we are working on other seq task like NMT then I would leave it as it is
Great video sir. I am training model at google colabotary with a gpu and did same step as you shown but while training the model it took me 3 hours for 2 epochs. can you tell why such great difference in your training time and mine training time.
Can you do !nvidia-smi and tell me what GPU you are allocated and during run can you check if your GPU memory is increasing to check is GPU is being used. Ideally this must not take time but it depends on data and network architecture you are using
Thank you for the great video. Much informative and helpful. It would be much appreciated if you can share git code if its free source.
Ketul. All of my code is in below git repo. But for some of the videos I might not have uploaded yet and will do it later in time for others to practice rather directly jump on to code
github.com/srivatsan88/RUclipsLI
@@AIEngineeringLife Thanks Srivatsan, I agree with you that directly jumping into code is not a good practice.
Thanks for video.
In the code while doing tokenizing tfds.features.Text.tokenizer() API is not working and giving error.Instead of that I am using tfds.deprecated.text.Tokenizer api.Can you pls let me know if that api deactivated?
Tensorflow has moved the api from tfds to inside tf.keras if you check keras docs you might find it else deprecated is option for now. Let me know if you are not able to find it
I'm still looking for a video that can justify the number of neurons used and the number of layers chosen . The overall explanation is good by the way
Gautam.. that is difficult part. In this case it was trial but I would go for some architecture search like NAS or similar. This space requires compute to get to right neural network architecture else it is time consuming manually
Hello :)
Thank you for putting up such an elaborate tutorial. I coded-along and it was a very fulfilling experience. Ended up learning quite a bit.
I had a few queries for you:
1. Despite being very comfortable with Numpy and Pandas, I found the initial data pre-processing steps to be difficult to understand (e.g. splitting data into training and test sets, tokenization step, encoding step). Can you please help us with some pre-existing tutorials or (official) TensorFlow docs on this?
2. As Julian has mentioned, wanted to understand how to incorporate lemmatization and stop words in this?
3. How can we incorporate pre-trained word embedding in this model?
If possible, do provide your email address in the 'About' section of your channel so that we can reach out to you if needed
Thanks again.
Raama.. TF is little confusing unless one get used it. It needs to build scikit learn like pipeline structure that makes it easy to adopt. TF does not have inbuilt support for lemma and others but you can use and python library for it and call that function using map transformation on the data. I will try to do a video on it in future
For pre-trained embedding in embedding layer you can use embeddings_initializer parameter and pass the matrix there. I have my Linkedin profile in my channel and LI message is good way to reach me if required
@@AIEngineeringLife Thanks a ton for taking out the time to reply to my message. I'll stay tuned for your future videos
Hi Sir,
I have a query that you have used Tensor flow encoded function to convert text into encoded no., why not used TF-IDF or word2vec?
Is it due to use embedding layer in Keras itself?
Also Can I do Text preprocessing steps like removal stopwords, Normalization etc before model development?
Yes I used because I am training a embedding layer as single Neural Network.. We can also train it externally via glove or word2vec and attach it in NN model
Sir Could you please share the colab link .
Sir can you tell me what changes need to be made in the model to get neutral sentiments ass well , I was able to divide sentiments according to ratings in -1 , 0, 1 but in model it doesnt support when I tried to enter dense(3)
So basically you are looking at 3 as neutral, < 3 as neg and >3 as pos.. you can check tf.case and see if it helps you.. I have done something in this video on same dataset but i converted to numpy. I would say go with tf.case and see
www.tensorflow.org/api_docs/python/tf/case
@@AIEngineeringLife
if tf.where(review_text.get('star_rating')==3,1,0).numpy() == 1:
print('neutral = 0')
elif tf.where(review_text.get('star_rating')>3,1,0).numpy() == 1:
print('positive = 1')
else:
print('negative =-1')
I tried this way and it works but do I need to make any changes in the model layers ?
a exception shap mismatch is thrown when I use 3 dense layers of [64,64,64] and a final 3 dense layers
after the divison according to the ratings
@@SoumyarajHere I dont recollect you have to change any other part of code.. Maybe let me try it over a weekend and get back
Ok sir Thank you I will also check again .
Need more clarification on line number 33
Line 33 I am checking target class distribution. Number of positive vs negative class for 10 batches. Just to see if data is balanced or imbalanced.
If it is imbalanced maybe accuracy is not the right metrics to measure
@@AIEngineeringLife Thanks. I have a few more doubts about this end to end Sentiment Analysis project. I will compile all of them and send it to you through an email.
What if i want to change the object from amazon, for example i want to analyze product review from apple phone, what should i put on the data, (amazon-us-reviews/apple phone)?
You can check here on what options are available here. You have mobile phone but not exactly apple phones. You can also get your custom dataset and feed to it or download amazon mobile reviews and search to see if u have large corpus for apple phones - www.tensorflow.org/datasets/catalog/amazon_us_reviews
@@AIEngineeringLife thank you very much!!!
Why are we not using pandas at all here?? Like for reading the csv file
Say you have very huge text file then pandas will fail to load. So this is one option but if u want pandas based NLP check my apache spark topic modeling video
@@AIEngineeringLife Got your point.It's due to the huge size
why do we use from_logits = True here, can someone please explain?
Dear Srivatsan ,
Thank you for wonderful tutorial, I am trying to replicate same with colab and colab keeps on crashing.
I changed batch size and everything , could you please help me with alternative solution.
Thanks
Mahesh
Mahesh.. did u set session type as GPU in colab?
thank you,can us get the code please ?
Is the code for above set of videos available on github??...If yes please provide the link....Thanks!!
Check this - github.com/srivatsan88/RUclipsLI/blob/master/Tensorflow_Sentiment_Analysis.ipynb
@@AIEngineeringLife Thanks a lot...!!
While I was implementing in the Jupyter I am constantly getting an error padded_batch() missing 1 required positional argument: 'padded_shapes' while splitting the data into train and test.
Will anyone help me with it
Priyadasrhi.. Did you pass batch size as argument to padded_batch?.. If yes can you get me train data shape
@@AIEngineeringLife Hi Sir, I have passed the batch size = 14000 to the padded_batch, somehow in Jupyter I have to pass the padded_shapes=([None], []) to run the code, now it is working fine.
PS : when I see Tensorflow documentation it says padded_shapes is mandatory argument to pass.
my deep learning model is giving different predictions to the same input that I have earlier as I the weights are getting reloaded please help
Can you please elobrate Soumya.. Did not quiet get the issue
@@AIEngineeringLife I actually solved the problem
It happened because everytime when I run the model I have to create a new encoder due to which encodings changes. ..so this time I saved the encoder to a file and I am using the same encoder
the problem was the prediction changes when I use same Input on the model last time. ...and the prediction use to be wrong the second time
Can I get the link for the dataset please?
The dataset is part of tensorflow datasets. I have shown how to get the dataset. I am not sure if this comes in any file
@@AIEngineeringLife Thank you! I got it.
sir, i am getting 96.12% validation accuracy in the 4th epoch.where as in your video it is 88% in 2nd epoch. what is the reason for this difference?
Randomness.. That is fine it can happen.. If we set same seed then it might be constant in some cases
Sir I am getting a error on review_tokens =tokenizer .tokenize(review_text.get('review_body').numpy())
The error saying is tokenize () missing 1 required positional argument :"s"
Could u please help me sir
Can you paste the entire block or see if it is similar to below
tokenizer = tfds.features.text.Tokenizer()
vocabulary_set = set()
for _, reviews in train_dataset.enumerate():
#print(reviews)
review_text=reviews['data']
reviews_tokens = tokenizer.tokenize(review_text.get('review_body').numpy())
vocabulary_set.update(reviews_tokens)
vocab_size = len(vocabulary_set)
vocab_size
I'm trying to build an intent classification model with tensorflow.. I'm facing some issues about validation accuracy and prediction accuracy.. I want some expert advice. can you provide your linkedin link or any contact info to help me out Please.
In case anyone can help me, please leave a message in my linkedIn profile. www.linkedin.com/in/satadru-hazra-763a881b3
Hi sir,watched your video really good stuff,can you share the github link for code please
For Tensorflow videos I have not created repo yet. Will take sometime. Other notebooks you can find in this repo - github.com/srivatsan88/RUclipsLI
can you please share the code in github if you have so?
Currently TF code is not yet in repo but will be there in sometime. You can check my repo for other video code - github.com/srivatsan88/RUclipsLI
please provide github link for the code
Check this - github.com/srivatsan88/RUclipsLI/blob/master/Tensorflow_Sentiment_Analysis.ipynb
Sir, I tried this on colab till model.fit. But I am getting 67%accuracy max.
I havent changed anything in the code.what could be wrong
Sir the previous comment i wrote.. I checked the code again.. found only vocab_size incrementing was not done correctly, I changed it and Run all i got 96.12%validation accuracy for 4th epoch. I don't understand the logic behind these variations.
please explain
Can you please elobrate again. I did not get the problem clearly. You incremented vocab size and got higher accuracy and are you asking how is it possible?
Code link?
Should be in this or in my NLP repo - github.com/srivatsan88/RUclipsLI/
Thank you for sharing the knowledge. I decided to learn some new skills during the 'Stay Home Stay Safe' time, starting with this video.
However, I am stuck at ar_encoded_data = train_dataset.map(encode_map_fn), gives me below error. Could you please help me out?
AttributeError: in user code:
:10 encode_map_fn *
encoded_text.set_shape([None])
AttributeError: 'list' object has no attribute 'set_shape'
Hi Vikram.. Can you check if you have replicated these 2 functions correctly
def encode(text_tensor, label_tensor):
encoded_text = encoder.encode(text_tensor.numpy())
label = tf.where(label_tensor>3,1,0)
return encoded_text, label
def encode_map_fn(tensor):
text = tensor['data'].get('review_body')
label = tensor['data'].get('star_rating')
encoded_text, label = tf.py_function(encode,
inp=[text, label],
Tout=(tf.int64, tf.int32))
encoded_text.set_shape([None])
label.set_shape([])
return encoded_text, label
@@AIEngineeringLife It worked although I can't seem to figure out what's the difference in code. Anyways, Thank you and Thank you again for invaluable content.