Transformer - Part 6 - Decoder (1): testing and training

Поделиться
HTML-код
  • Опубликовано: 9 сен 2024

Комментарии • 15

  • @exxzxxe
    @exxzxxe 2 года назад +3

    Lennart, you are the RUclips Wizard of Transformers!

  • @zbynekba
    @zbynekba Год назад +2

    Hi Lennart, where from have you retrieved all the details that you are presenting here? I mean, have you maybe studied/analyzed a source code of an existing implementation of transformer-based models?
    I haven’t found anywhere else this detailed explanation. Bravo! And thank you.

  • @anthonytafoya3451
    @anthonytafoya3451 3 года назад +2

    Thank you sir! Awesome video :)

  • @leiqin111
    @leiqin111 Год назад

    The best transformer video!

  • @taozeng6664
    @taozeng6664 2 года назад +1

    Thank you for your awsome video!!!

  • @victormachadogonzaga1898
    @victormachadogonzaga1898 2 года назад +1

    Awesome!

  • @shikharsrivastava4150
    @shikharsrivastava4150 2 года назад +2

    thanko sir

  • @jeremykenn
    @jeremykenn 5 месяцев назад

    does 5:45 - 8:15 refer to the old RNN training method ? and hence next video is the real transformer decoder

  • @somayehseifi8269
    @somayehseifi8269 2 года назад +1

    thanks for ur great explanations. i have a question, during he testing the input to the encoder will just feed once? and these three step will just be repeated? or i each step the input to the encoder will feed again ? and during the testing we dont have any masking right? the mask are equal to None?

    • @lennartsvensson7636
      @lennartsvensson7636  2 года назад +2

      That looks like several questions. :) 1. The output from the encoder can be precomputed and used repeatedly. 2. We use the same architecture during training and testing, that is, there is no masking in the encoder, but one of the self-attention layers in the decoder is masked.

    • @somayehseifi8269
      @somayehseifi8269 2 года назад

      @@lennartsvensson7636 As I know we did masking to have parallization in decoder but during the testing we do not have any paralization and everything will be done one by one . I will be thankful if you explain it for me alittle bit . . Another question if I want to have training, testing and validation set . Since validation will be done in the epoch of training, should I do that like training or like the inference part . Since inference of these networks are different I don't know validation also will be done in parallel in both encoder and decoder ( like training ) or it should be done in parallel in encoder and in decoder should be done like testing . I know it's lots of question but sorry:))

    • @lennartsvensson7636
      @lennartsvensson7636  2 года назад

      @@somayehseifi8269 First, the reason that we use masking during testing is that the weights are selected to work well for an architecture that contains masking. If we were to change the architecture, we would need to re-train the weights using the new architecture (without masking) to obtain good performance, but that would be computationally very expensive (especially since that architecture does not allow us to parallelize training). Second, you are free to perform validation as you see fit. However, the simplest alternative is arguably to evaluate the model’s ability to perform one-step prediction and perform validation in the same way that we trained the network. I believe almost everyone performs validation like this.

    • @somayehseifi8269
      @somayehseifi8269 2 года назад

      @@lennartsvensson7636 Thank u for ur answer. so if i used source mask and target mask in training i do the same for the testing as well right? for validation, you mean it will be done exactly like training , both encoder and decoder will be working in parallel. however, in testing decoder would not work in parallel. did I get it right?

    • @lennartsvensson7636
      @lennartsvensson7636  2 года назад

      @@somayehseifi8269 The parts "I do the same for testing" and "exactly like training" sound correct to me.