I highly appreciate a refined tutorial like this. A lot of stuff in books and blogs is pure garbage. Watch the Attention for RNN Seq2Seq Models videos to understand this one better.
One thing is apparently forgotten to be mentioned: --in the attention layers, if the output of a head is d dimensional, and we have l heads, the context vectors will be ld dimensional.. --dense layers reduce it to d dimensions again. Hence the dense layers must have ld inputs and d outputs.. Otherwise, @5:51 doesnt make sense.
Thanks for the video. Really detailed and informative. I'm still not sure how the two input sequences are combined to give the output sequence in the decoder. Can you recommend a material to consult for this?
I highly appreciate a refined tutorial like this. A lot of stuff in books and blogs is pure garbage. Watch the Attention for RNN Seq2Seq Models videos to understand this one better.
Very easy to understand this hard topic. Thanks alot.
One thing is apparently forgotten to be mentioned:
--in the attention layers, if the output of a head is d dimensional, and we have l heads, the context vectors will be ld dimensional..
--dense layers reduce it to d dimensions again. Hence the dense layers must have ld inputs and d outputs..
Otherwise, @5:51 doesnt make sense.
Great explanation!!
Thanks for the video. Really detailed and informative. I'm still not sure how the two input sequences are combined to give the output sequence in the decoder. Can you recommend a material to consult for this?
❤❤great