@ 8:30 aproch - after applying word2vec model we can do multiclass classification using any classification algorithm like Xgboost in that using objective='multi:softprob' Correct me if I am wrong and video is really verryy nice..started following✨✨😊
Hi Aman, there is a statement # fiting the model mymodel.fit(padded_docs, labels, epochs=30) But the input parameter padded_docs is not defined before. Because of this it doesn't. Can you please clarify me where it is defined and how does it work.
@@UnfoldDataScience i guess it would be helpful if you edit the title with part1 and part2 and add the link in the desciption. and you are a great teacher :)
@@UnfoldDataScience video is to predict the next comment, suppose i have list of comments, i want to cluster them based on some features, like one group of comments who talks about the cleaning, other cluster that is group of comments who talks about food where cleaning and food are two different features, it could be any or model can group them based on any common feature
I have trained my own w2v model but am having trouble implementing it into my keras model, i would really appreciate some guidance as I am quite new and would love to learn
Excuse me, Is there any criterion to decide the number of elements each word vector has? In other words, why Word2Vec decides each word represents a vector of 100 elements instead of 200? Thank you so much
Hi Sir, i am very confused with this concept which i dont see explained anywhere. In your previous video on word to vec (eg cbow, skipgram), you talk about window size, why is not specified here when you create the embedding layer? for embeddings eg cbow, isnt the input the neighboring n words and output the focus word, and then we learn the embedding for each word that way.. I would really appreciate your clarification. Thank you for your videos, ur one of the best teachers out there!
Great video, keep it up. Can you make video on Design of experiments topic or if you have any resources share with me. Reaching out as you have lots of knowledge
Clear explanation and implementation, thanks Aman. Also I have few doubts regarding data science project structure. Please let me know how to connect with you.
Nice video but can you rectify mistake at time 10.28 minute onwards where you called vocab size 30 is similar to size of 100 in previous video but those 100 dimensions were semantic numbers .. in this case vocab size is 30 and dimension is 8 . Correct me if I am wrong
I have a question, I have my own corpus and I have built multiple word embeddings such as word2vec,glove, TF-IDf, BERT for the same corpus,This is for a document similarity task, How to evaluate these models and how am I gonna choose the best one???
Am I understanding correctly that the entire vectorization process and the "semantic" meaning is derived from how frequently one word is in proximity to other words in the data it's been trained on?
Tks for the very easy explanation. Can help to suggest how the vector size becomes 100 even though we have only 14 words...not able to understand this....Tks in advance....
12:37 bro what if in real-time our sentence has more words than max length....we may lose info.....any other way to tackle that problem.......by the way your videos so expressive.
Good question, in that case we can increase the max length. When we do cleaning and token creation the length will automatically decrease hence we can take a generic call on what should be length for given corpus. Also we can tune the max_length parameter to see what value gives the best value for the model.
Nice explanation.., Doubt: when we train the model using say 8k observations and pickled the model for futher use and we got the new data which the model again need to go through training say 2k observations .whether we have to train the data with the 2k data newely available or 8k+2k data on the whole when we unpickle and use the same machine learning model we have stored before.
Thanks Kanchi, If you want to retrain the model, the previous saved object is of no use. you need to retrain the model from beginning and use 8k+2k data.
this both custom word2vec and Keras embedding layer uses the same methodologies(context, target, window size, vector representation)?. if yes any performance difference between them. and which one is best to use in most of the cases.
Word2Vec being an unsupervised method seeks to place words with similar context close together in the embedding space. Keras Embedding being a supervised method internally finds custom embeddings in training
What to do if we have a word in our dataset which is not recognized by the word2vec model? In my case, it is giving me an error. What is a solution for that?
Hi sir , How can I use embedding + keras to build model for plagiarism detection of multi class ( not plagiarism , near copy , exact copy , etc...) is steps , very very thanks and regards
Hi Sandesh, we can create features based on word and word and run recommender engine. Watch my recommendation system playlist in playlist section to know more.
Hello Sir, I had a task where I had to do clustering of similar sentences. I used doc2vec model on the sentences and did k-means clustering on the sentences. I already had them labeled so when I cross checked the clustering result was quite poor. Any suggestions from your side to make it better. And am kinda a noob in NLP and Machine Learning so it would really help me out. Thank you
How can we use the approach for logistic regression? I tried creating word vectors with dim 300 using word2vec and then theoretically if I can somehow take the average values of the each word vector and impute that into our four sentences and give label to each sentence then those sentences with average value of word vectors I can do the logistic regression. But from coding perspective I don't understand.. Can you explain that.
Hi sir, I am doing a hotel review analyzing project so I want to cluster similar sentences to give scores under these clusters using opinion mining(negative, positive, neutral). so is that possible to use word2vec to cluster the sentences into groups? and how is it doing?
Good question, while preparing the training data you can create these three categories and then use word2vec to train the model. Were you asking something else?
Hello frined, thank you for this tutoriel, but when I try to do the same code as you, I have this message error: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (6, 1). the parametrs padded_docs and labels in your code not defined !!! please sent me your complet code. thank you
@ 8:30 aproch - after applying word2vec model we can do multiclass classification using any classification algorithm like Xgboost in that using objective='multi:softprob'
Correct me if I am wrong and video is really verryy nice..started following✨✨😊
Yes you can do. Thanks a lot.
it's good to see such models on a smaller datasets. it is good for understanding. Thanks for this video
Glad it was helpful!
Really really one of the best explanations ever seen, appreciated!
Glad it was helpful!
you are really unfolding Data Science, absolutely fantastic...
Hi Aman, there is a statement
# fiting the model
mymodel.fit(padded_docs, labels, epochs=30)
But the input parameter padded_docs is not defined before. Because of this it doesn't. Can you please clarify me where it is defined and how does it work.
and the question that you have asked,for classfication of text, have you made video for that ? and thanks for explaining in such a plain words.
I have made video for that, please search for hotel review sentiment analysis on my channel.
@@UnfoldDataScience i guess it would be helpful if you edit the title with part1 and part2 and add the link in the desciption. and you are a great teacher :)
@@UnfoldDataScience video is to predict the next comment, suppose i have list of comments, i want to cluster them based on some features, like one group of comments who talks about the cleaning, other cluster that is group of comments who talks about food where cleaning and food are two different features, it could be any or model can group them based on any common feature
Very good explanation
Very useful and easy-to-follow tutorial.
Thanks a ton sir 🙏
I have trained my own w2v model but am having trouble implementing it into my keras model, i would really appreciate some guidance as I am quite new and would love to learn
Excuse me, Is there any criterion to decide the number of elements each word vector has? In other words, why Word2Vec decides each word represents a vector of 100 elements instead of 200? Thank you so much
Thank you so much for, a clear explanation.. But what padded_docs and labels parameter represent
Sir please make a video on Glove... I searched the whole youtube but none of them teach as well as you. Thank you.
Sure Hardik.
Hi Sir, i am very confused with this concept which i dont see explained anywhere. In your previous video on word to vec (eg cbow, skipgram), you talk about window size, why is not specified here when you create the embedding layer? for embeddings eg cbow, isnt the input the neighboring n words and output the focus word, and then we learn the embedding for each word that way.. I would really appreciate your clarification. Thank you for your videos, ur one of the best teachers out there!
word2vec is not a singular algorithm, rather, it is a family of model architectures
Great video, keep it up. Can you make video on Design of experiments topic or if you have any resources share with me. Reaching out as you have lots of knowledge
I will try to share something on suggested topic.
Aman is gem for all of us
Cheers Sir :)
One of the best tutorial. Could you please share video on How to classify documents based on word embedding.
Thanks Asheesh, Please watch below videos for your question:
ruclips.net/video/cs049uQWbpg/видео.html
Thank you Aman , Please keep up the good work ..Best Wishes !
So nice of you Sourav.
Hi Can you please explain what is vocabulary size ? Is it dimensions or the no of unique words we have in our document ?
Vocab size is how many unique words we want to consider. Top occurring n words.
10/10 video, gensim documentation sucks so this helped a lot
Interesting,,thanks so much!
Thanks a lot.
Clear explanation and implementation, thanks Aman. Also I have few doubts regarding data science project structure. Please let me know how to connect with you.
Thanks Santhosh. Please connect me through email or RUclips live :)
Your video is very good sir I need to know how we use these model for Hindi text using python please sir tell me
Nice video but can you rectify mistake at time 10.28 minute onwards where you called vocab size 30 is similar to size of 100 in previous video but those 100 dimensions were semantic numbers .. in this case vocab size is 30 and dimension is 8 . Correct me if I am wrong
I have a question, I have my own corpus and I have built multiple word embeddings such as word2vec,glove, TF-IDf, BERT for the same corpus,This is for a document similarity task, How to evaluate these models and how am I gonna choose the best one???
finished watching
Am I understanding correctly that the entire vectorization process and the "semantic" meaning is derived from how frequently one word is in proximity to other words in the data it's been trained on?
It is requested to please make a video on self trained word2vec model weight matrix for keras word embedding.
Ok Bilal
I though memory error is related to RAM, but you said that it is related to C drive. Can you please explain more about it?
Very nice tutorial, but I would like to know what is padded_docs and the labels variables?
I could not see the definition.
Thanks Aman for clearly explaining the concept. No NLP video since one month. Can you please post few series ..
Welcome. More videos will be added in NLP Series going forward.
NICE VIDEO BHAIYA.CAN U PLEASE TELL HOW DO U BECOME SO GOOD AT DS,MEANS SOME ROADMAP THAT A BEGINNER SHOULD FOLLOW TO BE LIKE U
Thanks Aditya.
Watch my playlist here
ruclips.net/video/9pytkbvF8AU/видео.html
@@UnfoldDataScience Thanks I will look into it.Sorry for late reply.
Tks for the very easy explanation. Can help to suggest how the vector size becomes 100 even though we have only 14 words...not able to understand this....Tks in advance....
100 is by default number means it expects up to 100 unique words.
@@UnfoldDataScience Ok...got it...Tks for clarifying.
Good information!
Thanks a lot .
hello, i have a question. how they recognize 100 features?, we must identify what are these features or the algorithms find them itself?
Thanks Aman
what is the imitations of using average word embeddings to represent documents?
12:37 bro what if in real-time our sentence has more words than max length....we may lose info.....any other way to tackle that problem.......by the way your videos so expressive.
Good question, in that case we can increase the max length. When we do cleaning and token creation the length will automatically decrease hence we can take a generic call on what should be length for given corpus. Also we can tune the max_length parameter to see what value gives the best value for the model.
@@UnfoldDataScience thanks for the update.....
Hello Aman, really a great vedio.
Just wanted to know how the model predict any new word which is not a part of our vocab?
unfeasible
Sir thank you so much
the best explanation that I saw tnx
Glad it was helpful Sam
Very useful
Thanks a lot Sourav.
Sir I'm a big fan of you, can you pls make a video on chatbot using nlp pls
Sure.
is it neccessary training on a neural network even if we are using word2vec model?
depends on what purpose you are using
sir can you please share how to classify the document?
There is a video on hotel review sentiment analysis on my channel, you can watch the video, same concept will apply for classifying document as well.
Nice explanation.., Doubt:
when we train the model using say 8k observations and pickled the model for futher use and we got the new data which the model again need to go through training say 2k observations .whether we have to train the data with the 2k data newely available or 8k+2k data on the whole when we unpickle and use the same machine learning model we have stored before.
Thanks Kanchi, If you want to retrain the model, the previous saved object is of no use. you need to retrain the model from beginning and use 8k+2k data.
Nice video ... One doubt what is the difference between Keras embedding & word2vec embedding?
word2vec is not a singular algorithm, rather, it is a family of model architectures
We can choose 50, 100, 200, 300 dimensions, but how it works?
Nice video. Can u please tell me if i have my data in csv file and i upload it by using pandas so how can i implement word2vec and use it on LSTM
This makes no sense at all
this both custom word2vec and Keras embedding layer uses the same methodologies(context, target, window size, vector representation)?.
if yes any performance difference between them. and which one is best to use in most of the cases.
Word2Vec being an unsupervised method seeks to place words with similar context close together in the embedding space. Keras Embedding being a supervised method internally finds custom embeddings in training
What to do if we have a word in our dataset which is not recognized by the word2vec model? In my case, it is giving me an error. What is a solution for that?
How to handle new words in test data? How my model will predict new words which is not being trained Sir
unfeasible
Hi, is our model size going to increase if we use 3.4gb of pre-trained word2vec model and in case if it increase how to deploy that much so big model?
Hi sir , How can I use embedding + keras to build model for plagiarism detection of multi class ( not plagiarism , near copy , exact copy , etc...) is steps , very very thanks and regards
Let me check
How can we use word embedding in recomdation sytem
Hi Sandesh, we can create features based on word and word and run recommender engine. Watch my recommendation system playlist in playlist section to know more.
what(cloud services or collab) we are gonna use in real-time if we have large datasets?
Cloud services.
we can do the same thing using LSA right? getting most similar words.
Yes, we can.
How to classify the input if we have mixed type of sentences that is not only question and answer type sentence??
Need to do little more preprocessing.
Can you provide notebook you showing in video as well ?
drive.google.com/drive/folders/1XdPbyAc9iWml0fPPNX91Yq3BRwkZAG2M
Hello Sir, I had a task where I had to do clustering of similar sentences. I used doc2vec model on the sentences and did k-means clustering on the sentences. I already had them labeled so when I cross checked the clustering result was quite poor. Any suggestions from your side to make it better. And am kinda a noob in NLP and Machine Learning so it would really help me out. Thank you
Hi Akshay, try doing more feature rngineering and u can add tf-idf etc.
How can we use the approach for logistic regression? I tried creating word vectors with dim 300 using word2vec and then theoretically if I can somehow take the average values of the each word vector and impute that into our four sentences and give label to each sentence then those sentences with average value of word vectors I can do the logistic regression. But from coding perspective I don't understand.. Can you explain that.
Hi Prasoon, better go for Sent2vec in this case.
is this the last video of nlp series?
Yes, more will be uploaded soon.
when you say my_vocab_size = 30, but you have only 8 unique words. i did not get it, can you pleaes asnwer?
Yes, if I have more words in my corpus, the size of vocab will go till 30.
@@UnfoldDataScience thanks:)
Hi sir, I am doing a hotel review analyzing project so I want to cluster similar sentences to give scores under these clusters using opinion mining(negative, positive, neutral). so is that possible to use word2vec to cluster the sentences into groups? and how is it doing?
Good question, while preparing the training data you can create these three categories and then use word2vec to train the model. Were you asking something else?
i went to use word embedding for text classification using RNN please help me
Sure, however there are many better models for the same purpose.
I am getting error in installing ' word2 vec as failed building wheels for word2 Vec anyone help to resolve?
How about fasttext?
What about it?
Hello frined, thank you for this tutoriel, but when I try to do the same code as you, I have this message error: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (6, 1).
the parametrs padded_docs and labels in your code not defined !!! please sent me your complet code.
thank you
drive.google.com/drive/folders/1XdPbyAc9iWml0fPPNX91Yq3BRwkZAG2M
i got this error how i can fix
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
See if this help:
stackoverflow.com/questions/66868221/gensim-3-8-0-to-gensim-4-0-0
@@UnfoldDataScience but stil word cant be fetched only,index can be fetched
I need to understand Sen2Vec
Let me note it down.
What is wordingbedding?
Not Sure, Word Embedding is explained in this video.
multi label problem
Answered.
Thank you, very nice tutorial. can i get your email address?
unfolddatascience@gmail.com
i got this error how i can fix
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
see if this help
stackoverflow.com/questions/66868221/gensim-3-8-0-to-gensim-4-0-0