What my issue is with grokking embeddings in general as in here - we set out for a single token to predict the next token, so how come for example "a e i o u" (vowels) are grouped together when they generally don't come one after the other? Wonderful example btw., I love that you start from letters and build on it.
in the case of the two characters, each character will be transformed before the concatenation. EMB1, EMB2 my question: and what they will undergo in the same matrix? if yes: can you explain to me how backpropagation is done! if not: we will have two forms of representation for each character
you are using your "stay at home" time, for the best
Nice content guys, keep it going...
What my issue is with grokking embeddings in general as in here - we set out for a single token to predict the next token, so how come for example "a e i o u" (vowels) are grouped together when they generally don't come one after the other?
Wonderful example btw., I love that you start from letters and build on it.
Nice
good job
Nice video! Thanks. One question though, How did you determine that the output of that embedding layer should be of size 2?
Two dimensions are easier to plot, that's the only reason.
in the case of the two characters, each character will be transformed before the concatenation. EMB1, EMB2
my question: and what they will undergo in the same matrix?
if yes: can you explain to me how backpropagation is done!
if not: we will have two forms of representation for each character