5:40 turning sentence into vector input with bag of words or vocabulary, each word maps to a number. Without a word embedding then the numbers don’t capture semantic meaning 11:40 after one hot encoding, we use an embedding matrix to get the embedding
The conceptual clarity in these videos is astonishing. I am surely going to purchase your book now. One little thing though, you might have noticed, at 7:22, that the number of columns in one hot vector should have been eleven (for 0 to 10 index) instead of ten. Also, at 23:11 the row chosen for "is" and "shining" should be interchanged (lookup for "is" should be row no 3 and for "shining" it should be row no. 5). Besides, I have rarely seen anyone explain steps 3 and 4 so beautifully. You have my gratitude for that.
Glad to hear the video is useful overall! And great catch, I totally miscounted here and there is one index missing! Haha, in practice, in a general one-hot encoding context, it is common to drop one column because of redundancy. Or in other words, you can deduce the 11th column from the other 10. But yeah, that's not what I did here and it was more like a typo! Thanks for mentioning, I wish there was a way to edit videos on YT ;)
hi @SebastienRaschka , thank you for this amazing video. I just wanted to mention that steps 3 and 4 are actually quite important to mention because some problems like the one that got me to find this video, are way to complex and need a deeper understanding of what happens behind the curtains. Anyway, I'm so grateful for your efforts and keep up the good work!!
Good question! Usually it's initialized from random values. It's basically a fully-connected layer. But since the inputs are sparse, PyTorch implements a special Embedding layer to make computations more efficient. But in a conceptual way, you can think of it as a fully connected layer that is randomly initialized and then learned.
It is not random. It's purpose is to represent words as vectors in such a way that similar words result in similar vectors in the vector space (and different words in not similar vectors). You can search for `word2vec` in google and how to train it, that's very common way to "vectorize" words.
Hi, Great videos! Question: In the slide, shouldn't the one hot vector have 11 positions for the vocabulary which includes and ? I only see 10 slots in the one hot encoding on the slides.
Good catch! I probably dropped one column (it is relatively common, because one of the columns will always be redundant. I.e., if all 10 columns are 0, it implies that the 11th column has the 1)
5:40 turning sentence into vector input with bag of words or vocabulary, each word maps to a number. Without a word embedding then the numbers don’t capture semantic meaning
11:40 after one hot encoding, we use an embedding matrix to get the embedding
Thank you for your lot of efforts in putting it in slide and making presentations.. I have been following deep learning path. It's really helpful
The conceptual clarity in these videos is astonishing. I am surely going to purchase your book now. One little thing though, you might have noticed, at 7:22, that the number of columns in one hot vector should have been eleven (for 0 to 10 index) instead of ten. Also, at 23:11 the row chosen for "is" and "shining" should be interchanged (lookup for "is" should be row no 3 and for "shining" it should be row no. 5).
Besides, I have rarely seen anyone explain steps 3 and 4 so beautifully. You have my gratitude for that.
Glad to hear the video is useful overall! And great catch, I totally miscounted here and there is one index missing! Haha, in practice, in a general one-hot encoding context, it is common to drop one column because of redundancy. Or in other words, you can deduce the 11th column from the other 10. But yeah, that's not what I did here and it was more like a typo! Thanks for mentioning, I wish there was a way to edit videos on YT ;)
hi @SebastienRaschka , thank you for this amazing video. I just wanted to mention that steps 3 and 4 are actually quite important to mention because some problems like the one that got me to find this video, are way to complex and need a deeper understanding of what happens behind the curtains. Anyway, I'm so grateful for your efforts and keep up the good work!!
22:16 Is it created randomly or there is some rule to create Embedding matrix? Thank you.
Good question! Usually it's initialized from random values. It's basically a fully-connected layer. But since the inputs are sparse, PyTorch implements a special Embedding layer to make computations more efficient. But in a conceptual way, you can think of it as a fully connected layer that is randomly initialized and then learned.
It is not random. It's purpose is to represent words as vectors in such a way that similar words result in similar vectors in the vector space (and different words in not similar vectors). You can search for `word2vec` in google and how to train it, that's very common way to "vectorize" words.
9:00 Why are the one-hot vectors for "the" , "sun", and "shining" one-indexed and the one for "the" is zero-indexed? Just a mistake?
Definitely a mistake, he pointed them out.
Hi, Great videos! Question: In the slide, shouldn't the one hot vector have 11 positions for the vocabulary which includes and ? I only see 10 slots in the one hot encoding on the slides.
Good catch! I probably dropped one column (it is relatively common, because one of the columns will always be redundant. I.e., if all 10 columns are 0, it implies that the 11th column has the 1)
@@SebastianRaschka that would mean that representation for 10/padding would be all 0? because 1 would be on index 11 which is now dropped?
excellent explanation my man
i see some bob marley reference there!
Haha, you are the first and only one who noticed :)
Hehe I was singing along when I saw that.
And thank you very much for this informative video!
@@SaimonThapa Glad you were having fun ^^