L19.3 RNNs with an Attention Mechanism

Поделиться
HTML-код
  • Опубликовано: 23 дек 2024

Комментарии • 19

  • @ah-rdk
    @ah-rdk 2 дня назад +1

    The most complete video I found for Attention Mechanism! Thank you very much, sir.

    • @SebastianRaschka
      @SebastianRaschka  День назад

      Glad you found it useful! Btw I have an even more complete blog post you might like :) Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs (magazine.sebastianraschka.com/p/understanding-and-coding-self-attention)

  • @koiRitwikHai
    @koiRitwikHai 2 года назад +3

    On 17:42, I think the two items (that are going into the pink "Neural Net" box) should be S_{tprime-1} and h_{t}
    where tprime = t'
    because otherwise e_{t,tprime} will be solely dependent on t only, then why even call it e_{t,tprime}? just call it e_t simply

    • @ricardogomes9528
      @ricardogomes9528 Год назад

      I think it should be S_{t-1} (as it is) and h_{t'}, because in the formula below we have that t' ranges all the way through T, which is the max_index of the time-steps of the encoder. Am I wrong?

    • @borutsvara7245
      @borutsvara7245 Год назад

      Yes, i think should be h_{t'}, otherwise the t' dependency does not make sense. But also you will need to run the yellow RNN on the entire sentence to get S_{t-1} and compute attention, which makes no much sense. Further in the next slide you have a h_{'t}, which may indicate a correction attempt.

    • @Prithviization
      @Prithviization 8 месяцев назад

      Slide 29 here : sebastianraschka.com/pdf/lecture-notes/stat453ss21/L19_seq2seq_rnn-transformers__slides.pdf

  • @LoveinPortofino1
    @LoveinPortofino1 2 года назад +2

    Thanks for the very detailed explanation. In the graph you show that S_{t-1} and c_{t} goes into the calculation of S_{t}. However, we also need y_{t-1} as well. In the original paper,
    the formula is: S_{t} = f(S_{t-1}, y_{t-1}, c_{t}). That is why I did not quite understand the computation graph to calculate S_{t}.
    Is the formula below correct?
    S_{t} = sigmoid(Weight_{hidden state} * S_{t-1} + Weight_{context} * c_{t} + Weight_{input} y_{t-1})

  • @Amapramaadhy
    @Amapramaadhy 2 года назад +1

    Thanks for the great content.
    I find the "time step" terminology concept confusing. Might we call it "next item in the sequence" instead?

  • @mahaaljarrah3236
    @mahaaljarrah3236 3 года назад +3

    Thank you very much, was really helpful.

  • @abubakarali6399
    @abubakarali6399 3 года назад +1

    When you summing up all the attention weights then aggregate function give single result, instead of which word is more important.
    How this aggregate function remember attention weight of every word?

    • @SebastianRaschka
      @SebastianRaschka  3 года назад +2

      Good point. The attention weights give you basically the importance of a word. Btw here I am using word as a loose term that also means the representation of it as a real-valued vector.
      You weight the "relevant" words via these weights more strongly when you aggregate. Like you hinted at, you squash the attention-weighted words into a single one, but this is still more "powerful" than a regular RNN. In the regular RNN, you carry on the words iteration by iteration, so information from early words might become forgotten. Say you are in word 10 in a sentence input. With the attention weighted version, you can have still a high weight on word 1, and then via the aggregation function this word will have a high influence then in time step 10.

  • @736939
    @736939 2 года назад +1

    What is the meaning of the bidirectional RNN? Why exactly is this type used for the attention?

    • @SebastianRaschka
      @SebastianRaschka  2 года назад +4

      A bidirectional sounds fancier than it really is. You can think of a standard RNN where you use it on the input sentence as usual. Then, you use it again on the sentence where the words are in reversed order. Then, you concatenate the 2 representations. The one from the forward sentence and the one from the reverse sentence. Why? I guess that's because you want to capture more of the context. For some words, the relevant words come before, for some it's after.

    • @736939
      @736939 2 года назад

      @@SebastianRaschka Thank you professor.

  • @Mvkv4L
    @Mvkv4L Год назад

    Hi Sebastian. I hope you're doing well. I have a question about the attention weights (alpha) and the energies (e) and I was hoping you would help. What are the shapes of alpha and e? Are they vectors or scalars?

  • @EngineeredFemale
    @EngineeredFemale Год назад

    Thanks. The video was clear.

  • @yosimadsu2189
    @yosimadsu2189 Год назад

    Thanks. More detailed but not the best. It lacks actual value to be calculated. Still confusing though

  • @nayanrainakaul9813
    @nayanrainakaul9813 7 месяцев назад

    Thanks kween

  • @BrianWhite-s4y
    @BrianWhite-s4y 3 месяца назад

    Wava Groves