Thanks for explanation! Still have a question- in 1:45 left side is output and right side is input? It's in contrary to previous narrative, when we are evolving in time from the left to the right
question, you say that the drawback is the training part. what if I use a pre-trained model from the hugging face website? is there a flaw in a pre-trained model?
I would say it's more a drawback to transformers as opposed to other architectures. If you're using a pre-trained model then you're right, it's not something you need to worry about especially. :) -R
In the original transformer architecture, the decoder is slightly more expensive (in terms of # of parameters) because each of the decoders inside of it has an additional attention layer.
Thanks for the feedback. If you like, you can reduce the speed of the video yourself on your local device: support.google.com/youtube/answer/7509567?co=GENIE.Platform%3DAndroid&hl=en
This is probaly the most succinct and clear explanation of Transformers that I have found so far. Thanks for sharing!
Nice work!
Rachel can make things a lot easier to understand as always. Thank you so much!
aw, ty so much! :) -R
Good work, brief but clear
Very clear, concise! Thank you!
Very nicely explained .Great work !
Thank you Rachel
This was great. Succinct explanation and describing the pros and cons. Thank you!
Very well explained keep up the good work!
Very helpful! Thanks a lot
Thanks for explanation! Still have a question- in 1:45 left side is output and right side is input? It's in contrary to previous narrative, when we are evolving in time from the left to the right
Wish I had any teacher like you.
😣
Great work
question, you say that the drawback is the training part. what if I use a pre-trained model from the hugging face website? is there a flaw in a pre-trained model?
I would say it's more a drawback to transformers as opposed to other architectures. If you're using a pre-trained model then you're right, it's not something you need to worry about especially. :) -R
@@RasaHQ ah yes okay thank you
This was great, thank you!
Hi, I am wondering if the decoder is more computationally expensive than encoder? or they are at the same level?
In the original transformer architecture, the decoder is slightly more expensive (in terms of # of parameters) because each of the decoders inside of it has an additional attention layer.
@@RasaHQ Thanks, how much more would you roughly expect? 2x? 10? 50x?
short and informative
Thank you very much
So my searching ends here..?...or it just starts??
Wow ❤️
Can you do a video detailing how self attention and multi-head works.
It's not the same format, but there's an algorithm whiteboard video that goes into more detail: ruclips.net/video/yGTUuEx3GkA/видео.html
thank you good lady
Probably the best summary video of TL. I followed her on twitter : twitter.com/rctatman
Awesome! She should try working for kaggle :/
does she actually have hair? Nice video, btw.
F F
It is better if you reduce your speed !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Thanks for the feedback. If you like, you can reduce the speed of the video yourself on your local device: support.google.com/youtube/answer/7509567?co=GENIE.Platform%3DAndroid&hl=en
@@RasaHQ Do you think i dont know that ! Try to reduce yourself and watch the quality
No need to be so rude. In my opinion it sounds fine with speed reduced to 0.75. Also there are closed captions.