Transformer Model (2/2): Build a Deep Neural Network (1.25x speed recommended)

Поделиться
HTML-код
  • Опубликовано: 17 ноя 2024

Комментарии • 6

  • @temporarychannel4339
    @temporarychannel4339 2 года назад

    I highly appreciate a refined tutorial like this. A lot of stuff in books and blogs is pure garbage. Watch the Attention for RNN Seq2Seq Models videos to understand this one better.

  • @phuctranchi7898
    @phuctranchi7898 3 года назад +3

    Very easy to understand this hard topic. Thanks alot.

  • @sahhaf1234
    @sahhaf1234 2 года назад +7

    One thing is apparently forgotten to be mentioned:
    --in the attention layers, if the output of a head is d dimensional, and we have l heads, the context vectors will be ld dimensional..
    --dense layers reduce it to d dimensions again. Hence the dense layers must have ld inputs and d outputs..
    Otherwise, @5:51 doesnt make sense.

  • @shashwathpunneshetty1260
    @shashwathpunneshetty1260 Год назад

    Great explanation!!

  • @JoshuaOwoyemi
    @JoshuaOwoyemi 3 года назад +2

    Thanks for the video. Really detailed and informative. I'm still not sure how the two input sequences are combined to give the output sequence in the decoder. Can you recommend a material to consult for this?

  • @rongwang6142
    @rongwang6142 Год назад

    ❤❤great