Introduction to NLP | GloVe Model Explained

Поделиться
HTML-код
  • Опубликовано: 9 фев 2020
  • Learn everything about the GloVe model! I've explained the difference between word2vec and glove in great detail. I've also shown how to visualize higher dimensional word vectors in 2D.
    #nlp #GloVe #machinelearning
    For more videos please subscribe -
    bit.ly/normalizedNERD
    Support me if you can ❤️
    www.paypal.com/paypalme2/suji04
    www.buymeacoffee.com/normaliz...
    NLP playlist -
    • Introduction to NLP
    Source Code -
    github.com/Suji04/NormalizedN...
    References -
    - www.kdnuggets.com/2018/08/wor...
    - nlp.stanford.edu/pubs/glove.pdf
    - machinelearningmastery.com/wh...
    - www.kaggle.com/jeffd23/visual...
    Facebook -
    / nerdywits
    Instagram -
    / normalizednerd
    Twitter -
    / normalized_nerd

Комментарии • 97

  • @javitxuaxtrimaxion4526
    @javitxuaxtrimaxion4526 2 года назад +2

    Awesome video!! I've just arrived here after reading the GloVe paper and your explanation is utterly perfect. I'll sure come back to your channel whenever I find some doubts about Machine Learning or NLP. God job!

  • @addoul99
    @addoul99 Год назад +1

    Fantastic summary of the paper. I just read it and I am pleasantly surprised at how much of the paper's math you covered in detail in this vdeo! Great

  • @alh7839
    @alh7839 2 года назад +1

    man your video is great ! best explanation on the whole internet !

  • @sashimidimsums
    @sashimidimsums 3 года назад +5

    Just wanna say that your explanations are awesome. Really helped me understand NLP better than reading a book.

  • @riskygamiing
    @riskygamiing 2 года назад +1

    I was reading the paper and somewhat struggling on what certain parts of the derivation were or why we needed them but this video is great. Thanks so much

  • @revolutionarydefeatism
    @revolutionarydefeatism 2 года назад

    Perfect! Thanks, there are not much useful videos on RUclips.

  • @rumaisalateef784
    @rumaisalateef784 4 года назад +1

    beautifully explained, thank you!

  • @sasna8800
    @sasna8800 3 года назад

    This is the best explanation I have seen for Glove thank you a million time

  • @bhargavaiitb
    @bhargavaiitb 4 года назад

    Thanks for the explanation. Feels like you explained better than the paper itself.

  • @sachinsarathe1143
    @sachinsarathe1143 3 года назад +1

    Very Nicely Explained Buddy .... I was going through many articles but was not able to understand the Math behind it. Your video certainly helped. Keep up the Good Work.

  • @ijeffking
    @ijeffking 4 года назад

    Very well explained. Keep it up! Thank you.

    • @NormalizedNerd
      @NormalizedNerd  4 года назад

      Thank you more videos are coming :)

    • @ijeffking
      @ijeffking 4 года назад

      @@NormalizedNerd looking forward to......

  • @ToniSkit
    @ToniSkit Месяц назад

    This was great

  • @TheR4Z0R996
    @TheR4Z0R996 4 года назад

    Great explanation thanks a lot my friend :)

    • @NormalizedNerd
      @NormalizedNerd  4 года назад

      Glad that it helped :D...keep supporting!

  • @arunimachakraborty1175
    @arunimachakraborty1175 3 года назад

    Very good explanation. Thank you :)

  • @psic-protosysintegratedcyb2422
    @psic-protosysintegratedcyb2422 4 года назад

    Good introduction!

  • @fezkhanna6900
    @fezkhanna6900 3 года назад

    Fantastic video

  • @CodeAshing
    @CodeAshing 3 года назад +1

    Bruh you explained well

  • @popamaji
    @popamaji 10 месяцев назад

    this is excellent but I hope u had mentioned the training steps of that also. what and in what shape are exactly the input and output tensor.

  • @sarsoura716
    @sarsoura716 3 года назад +1

    Good video, thanks for your efforts. I wish it had less explanation on the cost function of the GloVe model and elaborate testing of word similarity using GloVe model.

    • @NormalizedNerd
      @NormalizedNerd  3 года назад +1

      You can copy the code and test it more ;)

  • @vitalymegabyte
    @vitalymegabyte 2 года назад

    Guy, thank you very much, it was fucking masterpiece, that did my 22 minutes on railway station really profitable :)

  • @parukadli
    @parukadli Год назад

    Is embedding for a word is fixed in Glove or it is generated every time depending on the dataset given for training the model

  • @khadidjatahri7428
    @khadidjatahri7428 2 года назад +1

    thanks for this well explained video. I have one question, please can you explain why do you take only the numerator portion F(w_i.w_k) and ignoring the denominator?

  • @sujeevan9047
    @sujeevan9047 3 года назад

    can you do a video on the Bert word embedding model??? it is also important

  • @maximuskumar502
    @maximuskumar502 4 года назад

    Nice explanation 👍, one quick question on your video, which software and hardware are you using for digital board?

    • @NormalizedNerd
      @NormalizedNerd  4 года назад

      Thank you. I use Microsoft OneNote and a basic pen tablet. Keep supporting!

  • @bhrzali
    @bhrzali 3 года назад

    Wonderful explanation! Just a question. Why do we calculate the ratio p(k|ice)/p(k|steam)?

    • @NormalizedNerd
      @NormalizedNerd  3 года назад +1

      The ratio is better at distinguishing relevant words from irrelevant words than the probabilities. And it also discriminates between relevant words. If we didn't take the ratio and work with raw probabilities then the numbers would be too small.

  • @edwardrouth
    @edwardrouth 4 года назад +1

    Nice Work ! Just subscribed (y). :) Just a quick question out of curiosity "GloVe" and "Poincare GloVe" are same model ?
    All the best for your channel.

    • @NormalizedNerd
      @NormalizedNerd  4 года назад

      Thank you, man!
      No, they are different. Poincare GloVe is a more advanced approach. In normal GloVe, the words are embedded in Euclidean space. But in Poincare GloVe, the words are embedded in hyperbolic space! Although the latter one uses the basic concepts of the original GloVe.

    • @edwardrouth
      @edwardrouth 4 года назад

      @@NormalizedNerd Its total worth subscribing your channel. Looking forward for new videos from you on DS.
      Btw, i am also from West Bengal currently in Germany ;)

    • @NormalizedNerd
      @NormalizedNerd  4 года назад

      @@edwardrouth Oh great! Nice to meet you. More interesting videos are coming ❤️

  • @momona4170
    @momona4170 2 года назад

    I still don't quite understand the part where ln(X_i) was absorbed by biases, please enlighten me.

  • @md.tarekhasan2206
    @md.tarekhasan2206 3 года назад

    Can you please make videos on ELMo, fasttext, and BERT also? It'll be helpful.

  • @Sarah-ik8tt
    @Sarah-ik8tt 2 года назад

    hello thank you for your explanation can you please link me the google collab link asap?

  • @parukadli
    @parukadli 3 года назад

    Nice explanation.. .. which is better Glove or Word2vec?

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      That depends on the dataset. I recommend trying both.

  • @ccuuttww
    @ccuuttww 9 месяцев назад

    p(Love , I ) = 2/3 ?

  • @trieunguyenhai49
    @trieunguyenhai49 4 года назад +1

    thank you so much, but is X_{love} equal to 4 not 3

    • @NormalizedNerd
      @NormalizedNerd  4 года назад

      @TRIỀU NGUYỄN HẢI
      Thanks for pointing this out. Yes X_{love} = 4.

  • @longhoang5137
    @longhoang5137 2 года назад +1

    i laughed when you said 2+1+1=3 xD

    • @NormalizedNerd
      @NormalizedNerd  2 года назад

      LOL XD

    • @alh7839
      @alh7839 2 года назад +1

      i was looking for the comment ^^

    • @xh1501
      @xh1501 2 года назад

      same here lol

  • @eljangoolak
    @eljangoolak Год назад

    quackuarance metrics? I don't understand what that is

  • @SAINIVEDH
    @SAINIVEDH 3 года назад

    @ 19:13. That is a weighting function beacuse log(X_ij) may become zero and the equ.. goes crazy. More details at
    towardsdatascience.com/light-on-math-ml-intuitive-guide-to-understanding-glove-embeddings-b13b4f19c010

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      The article says f(X_ij) prevents log(X_ij) from being NaN which is not true.
      f(X_ij) actually puts an upper limit on co-occurrence frequencies.

  • @Nextalia
    @Nextalia 3 года назад

    I fail to see where the vectors come from... :-( I follow all the explanation without any problem, but... once you define J, where are the vectors coming from? Is there any neural network involved? Same problem when reading the article or any other explanations. They all try to explain where that J function comes, and then, magically, we have vectors we can compare to each other :-(
    Any help on that would be greatly appreciated. Thanks!

    • @NormalizedNerd
      @NormalizedNerd  3 года назад +1

      The authors introduced the word vectors very subtly.
      Here's the deal: 9:50, we assume that there exists a function F which takes the word vectors and produces a scalar quantity!
      And no, we don't have neural networks here. Everything is based on the concurrence matrix.

    • @Nextalia
      @Nextalia 3 года назад

      ​@@NormalizedNerd Thanks for your answer. I found a publication that explains very well what to do after "discovering" that function: thesis.eur.nl/pub/47697/Verstegen.pdf
      I was somehow sure that GloVe was based in neural networks (as does word2vec), but it is not the case. However, it is a bit as a neural network since the way the vectors are created is similar to the way the weights of a NN are trained: stochastic gradient descent.

    • @SwapravaNath
      @SwapravaNath 2 года назад

      The vectors are actually the parameters that one is optimizing over. Actually, the objective function J should have been written with the arguments being the vector representations of the words -- which are the optimization variable. For certain choices of the F function, e.g., softmax, the optimization becomes mathematically easy. And then, it is just a multivariable optimization problem, and a natural algorithm to solve will be gradient-descent (and more).
      Ref: ruclips.net/video/ERibwqs9p38/видео.html [Stanford course on NLP]

  • @83vbond
    @83vbond 3 года назад +3

    Good explanation. Got too technical for me after the middle, but then the code and the graph clarified things. Just one thing: you keep calling the pipe | symbol as 'slash', "j slash i", "k slash ice" etc, which isn't accurate (I think you would know it if you have studied all this). It's better to use 'given', "j given i" as it's actually said, or just say 'pipe' after explaining the first time that this is what the symbol is called. 'slash' is used to mean division, and also to mean 'one or the other', neither of which is applicable here, and the symbol isn't slash anyway. This can cause confusion for some viewers.

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      Yes, pipe would be a better choice.

    • @jibbygonewrong2458
      @jibbygonewrong2458 3 года назад

      It's Bayes. Anyone exposed to stats understands w/o the verbiage.

    • @TNTsundar
      @TNTsundar 3 года назад +1

      You should read that as “probability of i GIVEN j”. The pipe symbol is read as ‘given’.

  • @robinshort6430
    @robinshort6430 Год назад

    Often Xij is zero, and in this cases ln(Xij) is infinity. How do you treat this issue?

    • @NormalizedNerd
      @NormalizedNerd  Год назад +1

      Good point. So, here's how they tackled the problem.
      They defined the weighing function f like this:
      f(X_ij) =
      (X_ij/X_max)^alpha [if X_ij < X_max]
      1 [otherwise]
      So you see when X_ij = 0, f(X_ij) is 0. That means the whole cost term becomes 0. We don't even need to compute ln(X_ij) in this case.
      They addressed two problems with f.
      1) not giving too much importance to the word pairs that cooccur frequently.
      2) avoiding ln(0)
      I hope this makes sense. Please tell me if anything is not clear.

    • @robinshort6430
      @robinshort6430 Год назад

      @Normalized Nerd This is true only assuming that zero times infinity is zero! Just kidding, I just want to point out that programming zero times infinity gives (rightly) an error (on numpy), so I have to write this as an if condition.
      Everything else is clear, thank you very much for your great work and for your answer!

    • @robinshort6430
      @robinshort6430 Год назад

      @@NormalizedNerd is X_max an hyper parameter?

  • @dodoblasters
    @dodoblasters 2 года назад +1

    5:50
    2+1+1=3?

  • @kekecoo5681
    @kekecoo5681 3 года назад

    where did e came from?

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      e^x follows our condition.
      e^(a-b) = e^a/e^b

  • @bikideka7880
    @bikideka7880 3 года назад

    good explanation but plz use a bigger cursor, a lot of youtubers miss this.

  • @WahranRai
    @WahranRai 2 года назад +1

    Your examples are not related : I love NLP... and P(k/ice) etc
    It will be useful to have the same sentences ...

  • @u_luana.j
    @u_luana.j 2 года назад

    5:50 ..?

  • @atomic7680
    @atomic7680 3 года назад +2

    G-Love 😂

    • @NormalizedNerd
      @NormalizedNerd  3 года назад

      Haha...Exactly what I thought when I learned the word for the first time!

  • @sakibahmed2373
    @sakibahmed2373 4 года назад

    Hello There,
    First of all thank you for adding such informative videos to help the beginners in DS field. I am trying to reproduce the code from Github for the "standford Glove Model" Link ---> github.com/stanfordnlp/GloVe
    The problem is if i execute all the statements as mentioned in the "Readme" i get the respective files which it should provide me "cooccur.bin" & "vocab.txt". The latter does have the list of words with frequency but the former is empty and there is no such error reported in the console even. For me its very weird and i dont understand what i am doing wrong. Could you please help me on this ?
    N.B : I am new in ML and still learning !
    Best Regards.

    • @NormalizedNerd
      @NormalizedNerd  4 года назад +1

      "cooccurrence.bin" should contain the word vectors. Make sure that the training actually started. You should see logs like...
      vector size: 50
      vocab size: 71290
      x_max: 10.000000
      alpha: 0.750000
      05/08/20 - 06:02.16AM, iter: 001, cost: 0.071222
      05/08/20 - 06:02.45AM, iter: 002, cost: 0.052683
      05/08/20 - 06:03.14AM, iter: 003, cost: 0.046717
      ...
      I'd suggest you to try this on google colab once.

    • @sakibahmed2373
      @sakibahmed2373 4 года назад

      @@NormalizedNerd Hi, Thank you for your response.
      I never tried colab before. But what i noticed in colab is that i have to upload notebook files which i cant see in the glove project that i cloned. However I am using an online editor "repl.it". First i ran "make" command which created the "build" folder & subsequently "./demo.sh". Running this script creates a "cooccurence.bin" file but as i mentioned earlier its empty. Did i missed something here ? I am sure i missing something very small and important 😒 Below are the logs from the terminal..
       make
      mkdir -p build
      gcc -c src/vocab_count.c -o build/vocab_count.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc -c src/cooccur.c -o build/cooccur.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      src/cooccur.c: In function ‘merge_files’:
      src/cooccur.c:180:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&new, sizeof(CREC), 1, fid[i]);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/cooccur.c:190:5: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&new, sizeof(CREC), 1, fid[i]);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/cooccur.c:203:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&new, sizeof(CREC), 1, fid[i]);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      gcc -c src/shuffle.c -o build/shuffle.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      src/shuffle.c: In function ‘shuffle_merge’:
      src/shuffle.c:96:17: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&array[i], sizeof(CREC), 1, fid[j]);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/shuffle.c: In function ‘shuffle_by_chunks’:
      src/shuffle.c:161:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&array[i], sizeof(CREC), 1, fin);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      gcc -c src/glove.c -o build/glove.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      src/glove.c: In function ‘load_init_file’:
      src/glove.c:86:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&array[a], sizeof(real), 1, fin);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/glove.c: In function ‘glove_thread’:
      src/glove.c:182:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result]
      fread(&cr, sizeof(CREC), 1, fin);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      gcc -c src/common.c -o build/common.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc build/vocab_count.o build/common.o -o build/vocab_count -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc build/cooccur.o build/common.o -o build/cooccur -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc build/shuffle.o build/common.o -o build/shuffle -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
      gcc build/glove.o build/common.o -o build/glove -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic
       ./demo.sh
      mkdir -p build
      --2020-05-08 17:04:13-- mattmahoney.net/dc/text8.zip
      Resolving mattmahoney.net (mattmahoney.net)... 67.195.197.75
      Connecting to mattmahoney.net (mattmahoney.net)|67.195.197.75|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 31344016 (30M) [application/zip]
      Saving to: ‘text8.zip’
      text8.zip 100%[======>] 29.89M 1.97MB/s in 15s
      2020-05-08 17:04:29 (1.95 MB/s) - ‘text8.zip’ saved [31344016/31344016]
      Archive: text8.zip
      inflating: text8
      $ build/vocab_count -min-count 5 -verbose 2 < text8 > vocab.txt
      BUILDING VOCABULARY
      Processed 17005207 tokens.
      Counted 253854 unique words.
      Truncating vocabulary at min count 5.
      Using vocabulary of size 71290.
      $ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 15 < text8 > cooccurrence.bin
      COUNTING COOCCURRENCES
      window size: 15
      context: symmetric
      max product: 13752509
      overflow length: 38028356
      Reading vocab from file "vocab.txt"...loaded 71290 words.
      Building lookup table...table contains 94990279 elements.
      Processing token: 200000./demo.sh: line 43: 114 Killed $BUILDDIR/cooccur -memory $MEMORY -vocab-file $VOCAB_FILE -verbose $VERBOSE -window-size $WINDOW_SIZE < $CORPUS > $COOCCURRENCE_FILE

    • @NormalizedNerd
      @NormalizedNerd  4 года назад +1

      @Sakib Ahmed repl is probably not a good idea for DL stuffs. Try to use colab/kaggle. You can directly clone the github repo in colab. I've created a colab notebook. Run this by yourself. It works perfectly!
      colab.research.google.com/drive/1BA-GRHQOsXrYwmkalQyejsnVE8zmoyH2?usp=sharing

    • @sakibahmed2373
      @sakibahmed2373 4 года назад

      ​@@NormalizedNerd Thank you so much ! It really worked... 😊 (y)

    • @NormalizedNerd
      @NormalizedNerd  4 года назад +1

      @@sakibahmed2373 Do share this channel with your friends :D Enjoy machine learning.

  • @BloggerMnpr
    @BloggerMnpr 7 месяцев назад

    .

  • @TheMurasaki1
    @TheMurasaki1 3 года назад

    "I love to make videos"
    sorry to say this, but is it correct english?

    • @kaustavdatta4748
      @kaustavdatta4748 3 года назад +3

      Not the best English. But the model doesn't care as it will learn whatever you (or the dataset) teach it. The author's English doesn't impact the explanation of the model's workings.

  • @Shopinspoco
    @Shopinspoco 2 месяца назад

    man said "G-love" models 😮‍💨🤣

  • @harshitatiwari8019
    @harshitatiwari8019 3 года назад

    Reduce the number of ads. Ad like every min. Google has made RUclips money sucking machine. So irritating.

  • @Schaelpy
    @Schaelpy 7 месяцев назад

    Good video but the wrong pronunciation of GLoVe is killing me man

    • @ToniSkit
      @ToniSkit Месяц назад

      You mean the right ❤