Python Word Embedding using Word2vec and keras|How to use word embedding in python

Поделиться
HTML-код
  • Опубликовано: 6 ноя 2024

Комментарии • 130

  • @nareshjadhav4962
    @nareshjadhav4962 3 года назад +1

    @ 8:30 aproch - after applying word2vec model we can do multiclass classification using any classification algorithm like Xgboost in that using objective='multi:softprob'
    Correct me if I am wrong and video is really verryy nice..started following✨✨😊

  • @dorgeswati
    @dorgeswati 3 года назад +1

    it's good to see such models on a smaller datasets. it is good for understanding. Thanks for this video

  • @telmanmaghrebi3358
    @telmanmaghrebi3358 3 года назад +5

    Really really one of the best explanations ever seen, appreciated!

  • @uwaisahamedimad556
    @uwaisahamedimad556 2 года назад

    you are really unfolding Data Science, absolutely fantastic...

  • @nagarajtrivedi610
    @nagarajtrivedi610 24 дня назад

    Hi Aman, there is a statement
    # fiting the model
    mymodel.fit(padded_docs, labels, epochs=30)
    But the input parameter padded_docs is not defined before. Because of this it doesn't. Can you please clarify me where it is defined and how does it work.

  • @hicodeguru
    @hicodeguru 3 года назад +1

    and the question that you have asked,for classfication of text, have you made video for that ? and thanks for explaining in such a plain words.

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      I have made video for that, please search for hotel review sentiment analysis on my channel.

    • @hicodeguru
      @hicodeguru 3 года назад

      @@UnfoldDataScience i guess it would be helpful if you edit the title with part1 and part2 and add the link in the desciption. and you are a great teacher :)

    • @hicodeguru
      @hicodeguru 3 года назад

      @@UnfoldDataScience video is to predict the next comment, suppose i have list of comments, i want to cluster them based on some features, like one group of comments who talks about the cleaning, other cluster that is group of comments who talks about food where cleaning and food are two different features, it could be any or model can group them based on any common feature

  • @debirath4916
    @debirath4916 Год назад

    Very good explanation

  • @sowmiya_rocker
    @sowmiya_rocker Год назад

    Very useful and easy-to-follow tutorial.
    Thanks a ton sir 🙏

  • @yasminzamrin
    @yasminzamrin 2 года назад

    I have trained my own w2v model but am having trouble implementing it into my keras model, i would really appreciate some guidance as I am quite new and would love to learn

  • @alvaroradajczyk2191
    @alvaroradajczyk2191 Год назад +1

    Excuse me, Is there any criterion to decide the number of elements each word vector has? In other words, why Word2Vec decides each word represents a vector of 100 elements instead of 200? Thank you so much

  • @ashagireadinew9151
    @ashagireadinew9151 2 года назад +1

    Thank you so much for, a clear explanation.. But what padded_docs and labels parameter represent

  • @hardikvegad3508
    @hardikvegad3508 3 года назад

    Sir please make a video on Glove... I searched the whole youtube but none of them teach as well as you. Thank you.

  • @Siva-sf3on
    @Siva-sf3on 2 года назад

    Hi Sir, i am very confused with this concept which i dont see explained anywhere. In your previous video on word to vec (eg cbow, skipgram), you talk about window size, why is not specified here when you create the embedding layer? for embeddings eg cbow, isnt the input the neighboring n words and output the focus word, and then we learn the embedding for each word that way.. I would really appreciate your clarification. Thank you for your videos, ur one of the best teachers out there!

    • @bay-bicerdover
      @bay-bicerdover Год назад

      word2vec is not a singular algorithm, rather, it is a family of model architectures

  • @MrDEBONTUBE
    @MrDEBONTUBE 4 года назад +1

    Great video, keep it up. Can you make video on Design of experiments topic or if you have any resources share with me. Reaching out as you have lots of knowledge

  • @grg4098
    @grg4098 3 года назад +2

    Aman is gem for all of us

  • @asheeshmathur
    @asheeshmathur 4 года назад

    One of the best tutorial. Could you please share video on How to classify documents based on word embedding.

    • @UnfoldDataScience
      @UnfoldDataScience  4 года назад

      Thanks Asheesh, Please watch below videos for your question:
      ruclips.net/video/cs049uQWbpg/видео.html

  • @souravthakur6222
    @souravthakur6222 3 года назад

    Thank you Aman , Please keep up the good work ..Best Wishes !

  • @DineshBabu-gn8cm
    @DineshBabu-gn8cm 3 года назад +1

    Hi Can you please explain what is vocabulary size ? Is it dimensions or the no of unique words we have in our document ?

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      Vocab size is how many unique words we want to consider. Top occurring n words.

  • @TheTakenKing999
    @TheTakenKing999 Год назад

    10/10 video, gensim documentation sucks so this helped a lot

  • @diriba.hopefullife813
    @diriba.hopefullife813 3 года назад +1

    Interesting,,thanks so much!

  • @santhoshgattoji7517
    @santhoshgattoji7517 4 года назад +2

    Clear explanation and implementation, thanks Aman. Also I have few doubts regarding data science project structure. Please let me know how to connect with you.

    • @UnfoldDataScience
      @UnfoldDataScience  4 года назад

      Thanks Santhosh. Please connect me through email or RUclips live :)

  • @emergingtechnologyforengin5932
    @emergingtechnologyforengin5932 2 года назад

    Your video is very good sir I need to know how we use these model for Hindi text using python please sir tell me

  • @sachinshelar8810
    @sachinshelar8810 2 года назад

    Nice video but can you rectify mistake at time 10.28 minute onwards where you called vocab size 30 is similar to size of 100 in previous video but those 100 dimensions were semantic numbers .. in this case vocab size is 30 and dimension is 8 . Correct me if I am wrong

  • @uwaisahamedimad556
    @uwaisahamedimad556 2 года назад

    I have a question, I have my own corpus and I have built multiple word embeddings such as word2vec,glove, TF-IDf, BERT for the same corpus,This is for a document similarity task, How to evaluate these models and how am I gonna choose the best one???

  • @sandipansarkar9211
    @sandipansarkar9211 2 года назад

    finished watching

  • @sma92878
    @sma92878 Год назад

    Am I understanding correctly that the entire vectorization process and the "semantic" meaning is derived from how frequently one word is in proximity to other words in the data it's been trained on?

  • @bilalchandiabaloch8464
    @bilalchandiabaloch8464 3 года назад

    It is requested to please make a video on self trained word2vec model weight matrix for keras word embedding.

  • @hthangkhanhau4876
    @hthangkhanhau4876 11 месяцев назад

    I though memory error is related to RAM, but you said that it is related to C drive. Can you please explain more about it?

  • @ZeWhiteBear
    @ZeWhiteBear Год назад

    Very nice tutorial, but I would like to know what is padded_docs and the labels variables?
    I could not see the definition.

  • @akd9977
    @akd9977 4 года назад

    Thanks Aman for clearly explaining the concept. No NLP video since one month. Can you please post few series ..

    • @UnfoldDataScience
      @UnfoldDataScience  4 года назад

      Welcome. More videos will be added in NLP Series going forward.

  • @aditya_01
    @aditya_01 3 года назад

    NICE VIDEO BHAIYA.CAN U PLEASE TELL HOW DO U BECOME SO GOOD AT DS,MEANS SOME ROADMAP THAT A BEGINNER SHOULD FOLLOW TO BE LIKE U

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад +1

      Thanks Aditya.
      Watch my playlist here
      ruclips.net/video/9pytkbvF8AU/видео.html

    • @aditya_01
      @aditya_01 2 года назад

      @@UnfoldDataScience Thanks I will look into it.Sorry for late reply.

  • @yogeshbharadwaj6200
    @yogeshbharadwaj6200 3 года назад

    Tks for the very easy explanation. Can help to suggest how the vector size becomes 100 even though we have only 14 words...not able to understand this....Tks in advance....

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад +1

      100 is by default number means it expects up to 100 unique words.

    • @yogeshbharadwaj6200
      @yogeshbharadwaj6200 3 года назад

      @@UnfoldDataScience Ok...got it...Tks for clarifying.

  • @preranatiwary7690
    @preranatiwary7690 4 года назад +1

    Good information!

  • @nastaran1010
    @nastaran1010 6 месяцев назад

    hello, i have a question. how they recognize 100 features?, we must identify what are these features or the algorithms find them itself?

  • @exuberantyouth8765
    @exuberantyouth8765 Год назад

    Thanks Aman

  • @yuhangxu2132
    @yuhangxu2132 2 года назад

    what is the imitations of using average word embeddings to represent documents?

  • @maYYidtS
    @maYYidtS 4 года назад +1

    12:37 bro what if in real-time our sentence has more words than max length....we may lose info.....any other way to tackle that problem.......by the way your videos so expressive.

    • @UnfoldDataScience
      @UnfoldDataScience  4 года назад +1

      Good question, in that case we can increase the max length. When we do cleaning and token creation the length will automatically decrease hence we can take a generic call on what should be length for given corpus. Also we can tune the max_length parameter to see what value gives the best value for the model.

    • @maYYidtS
      @maYYidtS 4 года назад

      @@UnfoldDataScience thanks for the update.....

  • @adityasharma2667
    @adityasharma2667 3 года назад

    Hello Aman, really a great vedio.
    Just wanted to know how the model predict any new word which is not a part of our vocab?

  • @ousmanealamakaba3135
    @ousmanealamakaba3135 2 года назад

    Sir thank you so much

  • @samdan7825
    @samdan7825 4 года назад

    the best explanation that I saw tnx

  • @souravbiswas6892
    @souravbiswas6892 4 года назад

    Very useful

  • @srichakraarunmakthal395
    @srichakraarunmakthal395 3 года назад

    Sir I'm a big fan of you, can you pls make a video on chatbot using nlp pls

  • @mbahsamuel3506
    @mbahsamuel3506 Год назад

    is it neccessary training on a neural network even if we are using word2vec model?

  • @unsharma9229
    @unsharma9229 4 года назад +1

    sir can you please share how to classify the document?

    • @UnfoldDataScience
      @UnfoldDataScience  4 года назад

      There is a video on hotel review sentiment analysis on my channel, you can watch the video, same concept will apply for classifying document as well.

  • @kanchisushma171
    @kanchisushma171 3 года назад

    Nice explanation.., Doubt:
    when we train the model using say 8k observations and pickled the model for futher use and we got the new data which the model again need to go through training say 2k observations .whether we have to train the data with the 2k data newely available or 8k+2k data on the whole when we unpickle and use the same machine learning model we have stored before.

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      Thanks Kanchi, If you want to retrain the model, the previous saved object is of no use. you need to retrain the model from beginning and use 8k+2k data.

  • @230489shraddha
    @230489shraddha 2 года назад

    Nice video ... One doubt what is the difference between Keras embedding & word2vec embedding?

    • @bay-bicerdover
      @bay-bicerdover Год назад

      word2vec is not a singular algorithm, rather, it is a family of model architectures

  • @lemoniall6553
    @lemoniall6553 2 года назад

    We can choose 50, 100, 200, 300 dimensions, but how it works?

  • @Maryamkhan-lo1hq
    @Maryamkhan-lo1hq 3 года назад

    Nice video. Can u please tell me if i have my data in csv file and i upload it by using pandas so how can i implement word2vec and use it on LSTM

  • @maYYidtS
    @maYYidtS 3 года назад

    this both custom word2vec and Keras embedding layer uses the same methodologies(context, target, window size, vector representation)?.
    if yes any performance difference between them. and which one is best to use in most of the cases.

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      Word2Vec being an unsupervised method seeks to place words with similar context close together in the embedding space. Keras Embedding being a supervised method internally finds custom embeddings in training

  • @chandvachhani1660
    @chandvachhani1660 3 года назад

    What to do if we have a word in our dataset which is not recognized by the word2vec model? In my case, it is giving me an error. What is a solution for that?

  • @adityasharma2667
    @adityasharma2667 3 года назад

    How to handle new words in test data? How my model will predict new words which is not being trained Sir

  • @Rocklee46v
    @Rocklee46v 3 года назад

    Hi, is our model size going to increase if we use 3.4gb of pre-trained word2vec model and in case if it increase how to deploy that much so big model?

  • @ayuobali9539
    @ayuobali9539 3 года назад

    Hi sir , How can I use embedding + keras to build model for plagiarism detection of multi class ( not plagiarism , near copy , exact copy , etc...) is steps , very very thanks and regards

  • @sandeshkharat7020
    @sandeshkharat7020 3 года назад +1

    How can we use word embedding in recomdation sytem

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      Hi Sandesh, we can create features based on word and word and run recommender engine. Watch my recommendation system playlist in playlist section to know more.

  • @maYYidtS
    @maYYidtS 3 года назад

    what(cloud services or collab) we are gonna use in real-time if we have large datasets?

  • @nixonsebastian2892
    @nixonsebastian2892 3 года назад

    we can do the same thing using LSA right? getting most similar words.

  • @praveenkarmakar6218
    @praveenkarmakar6218 3 года назад

    How to classify the input if we have mixed type of sentences that is not only question and answer type sentence??

  • @tpsmahal
    @tpsmahal 4 года назад

    Can you provide notebook you showing in video as well ?

    • @UnfoldDataScience
      @UnfoldDataScience  4 года назад +1

      drive.google.com/drive/folders/1XdPbyAc9iWml0fPPNX91Yq3BRwkZAG2M

  • @akshaydileep6460
    @akshaydileep6460 4 года назад

    Hello Sir, I had a task where I had to do clustering of similar sentences. I used doc2vec model on the sentences and did k-means clustering on the sentences. I already had them labeled so when I cross checked the clustering result was quite poor. Any suggestions from your side to make it better. And am kinda a noob in NLP and Machine Learning so it would really help me out. Thank you

    • @UnfoldDataScience
      @UnfoldDataScience  4 года назад

      Hi Akshay, try doing more feature rngineering and u can add tf-idf etc.

  • @prasoonsaxena3027
    @prasoonsaxena3027 3 года назад

    How can we use the approach for logistic regression? I tried creating word vectors with dim 300 using word2vec and then theoretically if I can somehow take the average values of the each word vector and impute that into our four sentences and give label to each sentence then those sentences with average value of word vectors I can do the logistic regression. But from coding perspective I don't understand.. Can you explain that.

  • @md.sayedbinhasan9
    @md.sayedbinhasan9 3 года назад

    is this the last video of nlp series?

  • @hicodeguru
    @hicodeguru 3 года назад +1

    when you say my_vocab_size = 30, but you have only 8 unique words. i did not get it, can you pleaes asnwer?

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад +1

      Yes, if I have more words in my corpus, the size of vocab will go till 30.

    • @hicodeguru
      @hicodeguru 3 года назад

      @@UnfoldDataScience thanks:)

  • @hashinitheldeniya1347
    @hashinitheldeniya1347 4 года назад +2

    Hi sir, I am doing a hotel review analyzing project so I want to cluster similar sentences to give scores under these clusters using opinion mining(negative, positive, neutral). so is that possible to use word2vec to cluster the sentences into groups? and how is it doing?

    • @UnfoldDataScience
      @UnfoldDataScience  4 года назад

      Good question, while preparing the training data you can create these three categories and then use word2vec to train the model. Were you asking something else?

  • @seenamosisa1404
    @seenamosisa1404 3 года назад

    i went to use word embedding for text classification using RNN please help me

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      Sure, however there are many better models for the same purpose.

  • @VineetSingh-l1t
    @VineetSingh-l1t 8 месяцев назад

    I am getting error in installing ' word2 vec as failed building wheels for word2 Vec anyone help to resolve?

  • @lemoniall6553
    @lemoniall6553 2 года назад

    How about fasttext?

  • @gueddahhicham4729
    @gueddahhicham4729 3 года назад

    Hello frined, thank you for this tutoriel, but when I try to do the same code as you, I have this message error: Error when checking target: expected dense_21 to have 3 dimensions, but got array with shape (6, 1).
    the parametrs padded_docs and labels in your code not defined !!! please sent me your complet code.
    thank you

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      drive.google.com/drive/folders/1XdPbyAc9iWml0fPPNX91Yq3BRwkZAG2M

  • @seenamosisa1404
    @seenamosisa1404 3 года назад

    i got this error how i can fix
    AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      See if this help:
      stackoverflow.com/questions/66868221/gensim-3-8-0-to-gensim-4-0-0

    • @pankajuprety9216
      @pankajuprety9216 3 года назад

      @@UnfoldDataScience but stil word cant be fetched only,index can be fetched

  • @revof7140
    @revof7140 3 года назад

    I need to understand Sen2Vec

  • @trexmidnite
    @trexmidnite 3 года назад

    What is wordingbedding?

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      Not Sure, Word Embedding is explained in this video.

  • @seenamosisa1404
    @seenamosisa1404 3 года назад

    multi label problem

  • @DavidNganga-g1e
    @DavidNganga-g1e Год назад

    Thank you, very nice tutorial. can i get your email address?

  • @seenamosisa1404
    @seenamosisa1404 3 года назад

    i got this error how i can fix
    AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.

    • @UnfoldDataScience
      @UnfoldDataScience  3 года назад

      see if this help
      stackoverflow.com/questions/66868221/gensim-3-8-0-to-gensim-4-0-0