Attention Mechanisms in Recurrent Neural Networks (RNNs) - IGGG

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024
  • ==\ Subscribe and like if you found this video useful! /==
    Paper: Neural Machine Translation by Jointly Learning to Align and Translate, arxiv.org/pdf/...
    Note: I am not the author of the paper.
    Talk @ IGGG Reading Group, Laval University, May 19, 2017, www2.ift.ulaval...
    My project in which I use deep LSTMs without attention mechanisms: Human Activity Recognition (HAR), github.com/gui...

Комментарии • 14

  • @onlyphants
    @onlyphants 3 года назад

    awesome video..finally, i understand better how bahnadau attention is implemented..thank you for the visualizations.

  • @jugsma6676
    @jugsma6676 6 лет назад

    Nice compilation of several papers. Easy to understand all related papers at once.

  • @IgorAherne
    @IgorAherne 6 лет назад +2

    Hey Guillaume - please help me understand the following thing (you are my only hope :D)
    Yes we have the mini-neural-network which serves context to the decoder, meaning it needs a fixed number of weights to train properly. However, the LSTM by definition allows for an unconstrained number of timesteps - how does it come together? Our mini-network's matrix size would vary every time the source-sequence (from the bidirectional-LSTM) changes in length...

    • @GuillaumeChevalier
      @GuillaumeChevalier  6 лет назад +2

      The mini neural network has fixed weights, and those weights' matrix shape never changes. However, those weights are reused to compute attention features along the length axis of the source encoder for every timestep. Thus, the thing is to reuse the mini neural net as many times as there are timesteps, it is the resulting tensor that will then be of a variable shape, so it may not only be dynamic in batch size, but also in time steps length, to only have the attention features' dimension to be static. The way to implement dynamic computations like those will vary depending on your deep learning framework.

  • @flamingxombie
    @flamingxombie 6 лет назад

    interestingly, tacotron does something similar with cbhg. They also use 'highway' layers by srivastava.

  • @FernandoOliveira-kt3eh
    @FernandoOliveira-kt3eh 5 лет назад

    Hi, Mr Guillaume. Thanks for the great video and explanation. But I have a doubt. Suppose I'm mapping variable lenngth sequences S, each with length Ls. With attention mechanisms you could force the output to have length Ls as well? Or it doesn't make sense? Thank you in advance. (Don't know if you still look at this video anymore)

    • @GuillaumeChevalier
      @GuillaumeChevalier  5 лет назад

      With a multi-head self-attention mechanism you can do that.

  • @pranithachinnu3772
    @pranithachinnu3772 3 года назад

    Hello, I saw human activity recognition project in your github account.. And I wanted to do it for my final year project.. Can you please explain me in detail about that project ...please if you can do this it will help us a lot.. Please help us if you can.. Thank you

  • @pjoshi_15
    @pjoshi_15 5 лет назад

    Good one!

  • @flamingxombie
    @flamingxombie 6 лет назад

    good talk!

  • @shwetagargade1877
    @shwetagargade1877 5 лет назад

    can you please share ppt?

  • @saurabhinorange
    @saurabhinorange 7 лет назад

    very good