Training RNNs - Loss and BPTT

Поделиться
HTML-код
  • Опубликовано: 11 янв 2025

Комментарии • 33

  • @shuozhang429
    @shuozhang429 3 года назад +10

    The best explanation on this topic. When it comes to deeper math of neural networks, good videos are rare. Most are about shallow architecture and application. This video is both clear and complete, it is a gem.

  • @aterribleyoutuber9039
    @aterribleyoutuber9039 Год назад +2

    To explain such a complex topic like BPTT with such ease and clarity, props to you, sir. Truly an IIT professor!

  • @khubaibraza8446
    @khubaibraza8446 4 года назад +8

    One of the best Lecture series on RNN. Thank you so much.
    Love from Pakistan

  • @nehalsonawane1042
    @nehalsonawane1042 2 года назад

    Best video. So easily explained. The only one that explains the math. Amazing professor.

  • @anuragsharma4196
    @anuragsharma4196 3 года назад

    Best video for understanding BPTT(better than any coursera or other youtube/blog explanation)

  • @nihalm9322
    @nihalm9322 4 года назад +3

    Awesome and really clear explanation. That made things so simple. Thank you Sir!

  • @vivekmankar5823
    @vivekmankar5823 3 года назад

    This is the best explanation on BPTT!! Thank you so much, Sir

  • @YuvarajuMaddiboina
    @YuvarajuMaddiboina Год назад

    Thank you very much sir for this detailed video. Now I am clear with BPTT

  • @vinaykumardaivajna5260
    @vinaykumardaivajna5260 Год назад

    Always Loved the academic videos. They go much more into mathematics which is really helpful to understand the concept better. Great explanation.

  • @Vivekagrawal5800
    @Vivekagrawal5800 2 года назад

    this is the best video on BPTT on the internet.

  • @EzraSchroeder
    @EzraSchroeder 3 года назад

    i honestly think this may well be the best video going through the basic mechanics (math / concepts) of a plain vanilla RNN that i've ever seen (& i'm a bit of a connoisseur)

  • @ankitseth5676
    @ankitseth5676 4 года назад

    Explained it very well ! Now I really understand why it is THROUGH time.

  • @bt08b004
    @bt08b004 4 года назад +1

    Thank you so much sir! This helped me understand BPTT once and for all.

  • @paninilal8322
    @paninilal8322 Год назад

    Clearly explained with complete Math. Thanks Sir!!

  • @RajkumarDarbar
    @RajkumarDarbar Год назад +1

    The best explanation !!

  • @Moustacheru
    @Moustacheru 3 года назад

    ¿Está sumando una matriz con un vector? en el minuto 23:00 hace [ h + W*(dh/dW) ]
    Eso no se puede hacer. ¿Alguien me ayuda a interpretarlo?

  • @yildirimkocoglu3449
    @yildirimkocoglu3449 3 года назад

    Looking at the loss function, how would L3 be a scalar value as the video pointed out because
    L3 = (1/2).*(y3-y3_hat)^2 and (y3-y3_hat) is a vector (from what the video says) therefore L3 can't be a scalar and has to be a vector itself right?
    Then again, I might be missing something here... Anyways, what is the answer to what kind of multiplication is instead of "..." in between (y3-y3_hat)...h3 to get a matrix? Again V is a matrix if and only if y3_hat is vector because y3_hat = V*h3 and h3 is a vector itself unless we are talking about y3=sum(V*h3) where you sum all the resulting elements of the vector from V*h3 considering V is a matrix and h3 is a vector (taking a matrix and multiplying with a vector can only produce a vector output) or you take another set of weights that is a vector let's say "S" where S = another set of weights and multiple y3_hat*S and since y3 is a vector now, and S is also a vector, we can get a scalar output y3_hat*.
    Any answer is appreciated. If I'm wrong in my thought process, please point it out and I will also appreciate it.
    Thanks.

  • @phucnguyenphi5625
    @phucnguyenphi5625 3 года назад

    Hi, what playlist does this video belong to? I want to watch more related video.

  • @arashfatehi9971
    @arashfatehi9971 4 года назад

    One of the best, thanks a lot

  • @SC-ss8vb
    @SC-ss8vb 4 года назад

    I understand now. Thank you so much.

  • @lovesingh6455
    @lovesingh6455 Год назад

    Sir, please give some lessons to Dr. Vanapati.

  • @haroldprabhu4440
    @haroldprabhu4440 4 года назад

    Thanks a lot. Clearly explained.

  • @kartikpodugu
    @kartikpodugu 5 лет назад +2

    Sir, Is vanishing gradient and exploding gradient problem in BPTT because of the recursion involved for calculation of gradients of W and U?

    • @vinodhkumarbaskaran228
      @vinodhkumarbaskaran228 5 лет назад

      Yes , Vanishing gradient and exploding gradient is still a problem with RNN.

    • @ericklestrange6255
      @ericklestrange6255 4 года назад +1

      ∂ht/∂hk has t - k 􏰇 multiplications; therefore, multiplying the weight, w, by itself t - k 􏰇 times.
      w < 1 total becomes too small and opposite too

  • @zijiewang4662
    @zijiewang4662 3 года назад

    is d(Vh)/dh supposed to be V^T, the transpose of V???

  • @hamzaaslam1999
    @hamzaaslam1999 6 месяцев назад

    guy said i will not go into the math of it

  • @ericklestrange6255
    @ericklestrange6255 4 года назад

    what was the missing value in the first derivative that you didnt mention then? the homework one

    • @zijiewang4662
      @zijiewang4662 3 года назад

      it should be -(y3-y3')h3^T, this will make it a matrix

  • @joshualee3172
    @joshualee3172 4 года назад

    why assume that there y hat has a linear function? Isn't sigmid, tanh, and softmax more popular?

    • @anjelpatel36
      @anjelpatel36 4 года назад +3

      Its just for demonstration purposes, so the equations do not get complicated(you don't need to mention g'(h))

    • @rachelxue1743
      @rachelxue1743 3 года назад

      @@anjelpatel36 But then why when it comes to h3, we can't assume that g is a linear function? Thanks

  • @DavyTracy-l7i
    @DavyTracy-l7i 3 месяца назад

    Rodriguez Margaret Young Paul Lewis Dorothy