Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

Blowing up Transformer Decoder architecture

What BERT Can’t Do: The Transformer's Decoder [Lecture]

Linkin Park: FROM ZERO (Livestream)

Reaction to Cowboys-Browns, Raiders-Chargers, Broncos-Seahawks, Tom Brady on FOX | Colin Cowherd NFL

MEOVV - ‘MEOW’ M/V

Transformer - Part 6 - Decoder (1): testing and training

Lennart Svensson

Просмотров 8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 сен 2024

Комментарии • 15

@exxzxxe 2 года назад ⁺³
Lennart, you are the RUclips Wizard of Transformers!
@zbynekba Год назад ⁺²
Hi Lennart, where from have you retrieved all the details that you are presenting here? I mean, have you maybe studied/analyzed a source code of an existing implementation of transformer-based models?
I haven’t found anywhere else this detailed explanation. Bravo! And thank you.
@anthonytafoya3451 3 года назад ⁺²
Thank you sir! Awesome video :)
@leiqin111 Год назад
The best transformer video!
@taozeng6664 2 года назад ⁺¹
Thank you for your awsome video!!!
@victormachadogonzaga1898 2 года назад ⁺¹
Awesome!
@shikharsrivastava4150 2 года назад ⁺²
thanko sir
@jeremykenn 5 месяцев назад
does 5:45 - 8:15 refer to the old RNN training method ? and hence next video is the real transformer decoder
@somayehseifi8269 2 года назад ⁺¹
thanks for ur great explanations. i have a question, during he testing the input to the encoder will just feed once? and these three step will just be repeated? or i each step the input to the encoder will feed again ? and during the testing we dont have any masking right? the mask are equal to None?
@lennartsvensson7636 2 года назад ⁺²
That looks like several questions. :) 1. The output from the encoder can be precomputed and used repeatedly. 2. We use the same architecture during training and testing, that is, there is no masking in the encoder, but one of the self-attention layers in the decoder is masked.
@somayehseifi8269 2 года назад
@@lennartsvensson7636 As I know we did masking to have parallization in decoder but during the testing we do not have any paralization and everything will be done one by one . I will be thankful if you explain it for me alittle bit . . Another question if I want to have training, testing and validation set . Since validation will be done in the epoch of training, should I do that like training or like the inference part . Since inference of these networks are different I don't know validation also will be done in parallel in both encoder and decoder ( like training ) or it should be done in parallel in encoder and in decoder should be done like testing . I know it's lots of question but sorry:))
@lennartsvensson7636 2 года назад
@@somayehseifi8269 First, the reason that we use masking during testing is that the weights are selected to work well for an architecture that contains masking. If we were to change the architecture, we would need to re-train the weights using the new architecture (without masking) to obtain good performance, but that would be computationally very expensive (especially since that architecture does not allow us to parallelize training). Second, you are free to perform validation as you see fit. However, the simplest alternative is arguably to evaluate the model’s ability to perform one-step prediction and perform validation in the same way that we trained the network. I believe almost everyone performs validation like this.
@somayehseifi8269 2 года назад
@@lennartsvensson7636 Thank u for ur answer. so if i used source mask and target mask in training i do the same for the testing as well right? for validation, you mean it will be done exactly like training , both encoder and decoder will be working in parallel. however, in testing decoder would not work in parallel. did I get it right?
@lennartsvensson7636 2 года назад
@@somayehseifi8269 The parts "I do the same for testing" and "exactly like training" sound correct to me.

Следующие

Автовоспроизведение

Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

Blowing up Transformer Decoder architecture

Blowing up Transformer Decoder architecture

What BERT Can’t Do: The Transformer's Decoder [Lecture]

What BERT Can’t Do: The Transformer's Decoder [Lecture]

Linkin Park: FROM ZERO (Livestream)

Linkin Park: FROM ZERO (Livestream)

Reaction to Cowboys-Browns, Raiders-Chargers, Broncos-Seahawks, Tom Brady on FOX | Colin Cowherd NFL

Reaction to Cowboys-Browns, Raiders-Chargers, Broncos-Seahawks, Tom Brady on FOX | Colin Cowherd NFL

MEOVV - ‘MEOW’ M/V

MEOVV - ‘MEOW’ M/V

10 Things You SHOULD Be Buying at Costco in September 2024

10 Things You SHOULD Be Buying at Costco in September 2024

Transformer: Concepts, Building Blocks, Attention, Sample Implementation in PyTorch

Transformer: Concepts, Building Blocks, Attention, Sample Implementation in PyTorch

How a Transformer works at inference vs training time

How a Transformer works at inference vs training time

Transformers - Part 1 - Self-attention: an introduction

Transformers - Part 1 - Self-attention: an introduction

Rasa Algorithm Whiteboard - Transformers & Attention 2: Keys, Values, Queries

Rasa Algorithm Whiteboard - Transformers & Attention 2: Keys, Values, Queries

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

Transformers - Part 4 - Encoder remarks

Transformers - Part 4 - Encoder remarks

Attention Is All You Need

Attention Is All You Need

Lecture 12.2 Transformers

Lecture 12.2 Transformers

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

ПОЛИЦИЯ ИЗДЕВАЕТСЯ И ОБВИНЯЕТ НАС В ПОБЕГЕ? МЕНТ ОБМАНУЛ. ЗАКРЫЛИ КРУГЛОСУТОЧНЫЙ МАГАЗИН. Часть 2

ПОЛИЦИЯ ИЗДЕВАЕТСЯ И ОБВИНЯЕТ НАС В ПОБЕГЕ? МЕНТ ОБМАНУЛ. ЗАКРЫЛИ КРУГЛОСУТОЧНЫЙ МАГАЗИН. Часть 2

Кто же всё-таки такой Арсен Маркарян #сатир #пародия #satyr

Кто же всё-таки такой Арсен Маркарян #сатир #пародия #satyr

Chuck Be Like : Not My Problem😂 | #brawlstars #shorts

Chuck Be Like : Not My Problem😂 | #brawlstars #shorts

Сестра не поделила надувной матрас с братом..🤦‍♂️🪡⚓️

Сестра не поделила надувной матрас с братом..🤦‍♂️🪡⚓️

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Мама знает где все документы

Мама знает где все документы

How She Escaped the Police Expertly 👮‍♀️🤯 #shorts #hack

How She Escaped the Police Expertly 👮‍♀️🤯 #shorts #hack

Я СДЕЛАЛ ГИГАНТСКУЮ ДУБАЙСКУЮ ШОКОЛАДКУ ВЕСОМ 110 КИЛОГРАММ

Я СДЕЛАЛ ГИГАНТСКУЮ ДУБАЙСКУЮ ШОКОЛАДКУ ВЕСОМ 110 КИЛОГРАММ