The best explanation on this topic. When it comes to deeper math of neural networks, good videos are rare. Most are about shallow architecture and application. This video is both clear and complete, it is a gem.
i honestly think this may well be the best video going through the basic mechanics (math / concepts) of a plain vanilla RNN that i've ever seen (& i'm a bit of a connoisseur)
Looking at the loss function, how would L3 be a scalar value as the video pointed out because L3 = (1/2).*(y3-y3_hat)^2 and (y3-y3_hat) is a vector (from what the video says) therefore L3 can't be a scalar and has to be a vector itself right? Then again, I might be missing something here... Anyways, what is the answer to what kind of multiplication is instead of "..." in between (y3-y3_hat)...h3 to get a matrix? Again V is a matrix if and only if y3_hat is vector because y3_hat = V*h3 and h3 is a vector itself unless we are talking about y3=sum(V*h3) where you sum all the resulting elements of the vector from V*h3 considering V is a matrix and h3 is a vector (taking a matrix and multiplying with a vector can only produce a vector output) or you take another set of weights that is a vector let's say "S" where S = another set of weights and multiple y3_hat*S and since y3 is a vector now, and S is also a vector, we can get a scalar output y3_hat*. Any answer is appreciated. If I'm wrong in my thought process, please point it out and I will also appreciate it. Thanks.
The best explanation on this topic. When it comes to deeper math of neural networks, good videos are rare. Most are about shallow architecture and application. This video is both clear and complete, it is a gem.
To explain such a complex topic like BPTT with such ease and clarity, props to you, sir. Truly an IIT professor!
One of the best Lecture series on RNN. Thank you so much.
Love from Pakistan
Best video. So easily explained. The only one that explains the math. Amazing professor.
Best video for understanding BPTT(better than any coursera or other youtube/blog explanation)
Awesome and really clear explanation. That made things so simple. Thank you Sir!
This is the best explanation on BPTT!! Thank you so much, Sir
Thank you very much sir for this detailed video. Now I am clear with BPTT
Always Loved the academic videos. They go much more into mathematics which is really helpful to understand the concept better. Great explanation.
this is the best video on BPTT on the internet.
i honestly think this may well be the best video going through the basic mechanics (math / concepts) of a plain vanilla RNN that i've ever seen (& i'm a bit of a connoisseur)
Explained it very well ! Now I really understand why it is THROUGH time.
Thank you so much sir! This helped me understand BPTT once and for all.
Clearly explained with complete Math. Thanks Sir!!
The best explanation !!
¿Está sumando una matriz con un vector? en el minuto 23:00 hace [ h + W*(dh/dW) ]
Eso no se puede hacer. ¿Alguien me ayuda a interpretarlo?
Looking at the loss function, how would L3 be a scalar value as the video pointed out because
L3 = (1/2).*(y3-y3_hat)^2 and (y3-y3_hat) is a vector (from what the video says) therefore L3 can't be a scalar and has to be a vector itself right?
Then again, I might be missing something here... Anyways, what is the answer to what kind of multiplication is instead of "..." in between (y3-y3_hat)...h3 to get a matrix? Again V is a matrix if and only if y3_hat is vector because y3_hat = V*h3 and h3 is a vector itself unless we are talking about y3=sum(V*h3) where you sum all the resulting elements of the vector from V*h3 considering V is a matrix and h3 is a vector (taking a matrix and multiplying with a vector can only produce a vector output) or you take another set of weights that is a vector let's say "S" where S = another set of weights and multiple y3_hat*S and since y3 is a vector now, and S is also a vector, we can get a scalar output y3_hat*.
Any answer is appreciated. If I'm wrong in my thought process, please point it out and I will also appreciate it.
Thanks.
Hi, what playlist does this video belong to? I want to watch more related video.
One of the best, thanks a lot
I understand now. Thank you so much.
Sir, please give some lessons to Dr. Vanapati.
Thanks a lot. Clearly explained.
Sir, Is vanishing gradient and exploding gradient problem in BPTT because of the recursion involved for calculation of gradients of W and U?
Yes , Vanishing gradient and exploding gradient is still a problem with RNN.
∂ht/∂hk has t - k multiplications; therefore, multiplying the weight, w, by itself t - k times.
w < 1 total becomes too small and opposite too
is d(Vh)/dh supposed to be V^T, the transpose of V???
guy said i will not go into the math of it
what was the missing value in the first derivative that you didnt mention then? the homework one
it should be -(y3-y3')h3^T, this will make it a matrix
why assume that there y hat has a linear function? Isn't sigmid, tanh, and softmax more popular?
Its just for demonstration purposes, so the equations do not get complicated(you don't need to mention g'(h))
@@anjelpatel36 But then why when it comes to h3, we can't assume that g is a linear function? Thanks
Rodriguez Margaret Young Paul Lewis Dorothy