Transformers for beginners | What are they and how do they work

AssemblyAI

Просмотров 165 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 27 ноя 2024

Комментарии • 160

@ashermai2962 2 года назад ⁺⁴⁹
This channel deserves more views and likes
@AssemblyAI 2 года назад ⁺⁴
Thank you Asher!
@sarc007 Год назад ⁺³
I agree
@pierluigiurru962 Год назад ⁺²⁶
This is clearest explanation of transformers I’ve found so far, and I personally have seen many trying to wrap my head around them. No skimming over details. Very well done!
@andybrice2711 7 месяцев назад ⁺⁵
Positional encodings are not that weird when you think of them as being similar to the hands on a clock: It's a way of representing arbitrarily long periods of time, within a confined space, with smooth continuous movement and no sudden jumps.
Picture the tips of clock hands. Their vertical position follows a sine wave, their horizontal position follows a cosine wave. And we add precision with more hands moving at different speeds.
@PeterKoman 2 года назад ⁺³
Finally a transformer video that actually explains the theory in understandable way. Many thanks.
@AssemblyAI 2 года назад ⁺²
That's great to hear, thank you Peter!
@malayali_thaaram Год назад
Yes!!! I agree! Finally!
@Zulu369 Год назад ⁺⁸
This video is the best technical explanation I have seen in years. Although Transformers are a breakthrough in the field in NLP, I am convinced that they do not describe completely and satisfactorily, the way humans process language.
For all civilizations, spoken language predates written language in communications. Those who do not read and write, still communicate clearly with others. This means humans do not represent natural language in their brains in terms of words, syntax and position of tokens but rather in terms of symbols, images and multimedia shows that make up stories we relate to.
Written language comes only later as an extra layer of communication to express transparently these internal representations that we carry within ourselves. If AI is able to access and decode these internal representations, then the written language, the extra layer, becomes a lot easier to understand, organize, and put on paper with simple techniques rather than using these intricate Transformers that I consider as temporary and unnatural ways of describing natural languages.
@rokljhui864 Год назад
Your idea is represented above , in words, existing separately from your mind. Surely most intelligence is contained within written language, mathematical expression and images.
@Zulu369 Год назад
@@rokljhui864 As I explained above, written words make up THE extra layer that is actually not necessary once you learn more persuasive communications techniques.
@evetsnilrac9689 Год назад ⁺¹
@@rokljhui864 "Surely" is not how you start an intelligent hypothesis.
You must explain the rationale for your belief since it is not at all readily apparent that the intelligence to process written language was not already in our brains so that we could conceive of and learn written language.
@evetsnilrac9689 Год назад ⁺¹
This is a crucial point to understand for all of us interested in fully harnessing what we perceive to be the true potential of this technology.
I would start with the Adamic symbol-based language.
@testing3562 Год назад ⁺¹⁷
I am a programmer, I have created many tools that were actually very useful. I even claim that I have 10+ years experience. But I feel very bad to realize that I am so dumb that I did not understand anything after the first 10 minutes of the video.
@s14vc 8 месяцев назад
They explain it with apples and pears but is actually a very mathematical and elaborated process, if you're not the kind of person that can remember easily how work the sine and cosine functions and do matrix multiplication for fun, is just a little bit harder to get it
@moonlight-td8ed 7 месяцев назад ⁺¹
BRUH JUST REWATCH IT AGAIN... THE VIDEO IS A 10/10
@rodneykingston6420 3 месяца назад
I too am a programmer. I don't know why everyone is saying this is such a great video. To understand all the information that is so rapidly and breezily dispensed by the speaker would clearly require a vast pre-existing knowledge of the subject. It is NOT for beginners.
@testing3562 3 месяца назад
@@rodneykingston6420 I am neither a beginner nor I understood any intricate details of these concepts. But I realised that we need not bother to go that deep into any concept inorder to create an innovative product. Based on my own experience I was depressed at first, then despite my ignorance in these topics, I continue to create really useful working tools and the learning happens automatically. Life is short and we need not have to spend the best part of it learning every new budding technologies that might fail. It happens to be more than sufficient if we know only that part that is required for that specific situation. Lesson learned the hardway is that, too much of AI is just a distraction and waste of time.
@ajithsdevadiga1603 2 месяца назад
@@rodneykingston6420 you got it right, you need to check jay alamers blog on transformers where he has clearly explained it detail, after reading that you will get a clear picture and at later point of time you might wanna check the research paper, which I am sure is not easy to understand at once, couple of tutorials and blogs on this subject will make sense only if you have prior know ledge of neural networks and the math behind it.
@ajithsdevadiga1603 2 месяца назад
First time when I saw this video, I was like what is this lady talking about, then after reading jay alamers blog on transformers gave me a clear picture of the underlying math, rewatching this video after doing bit of reading will actually help you to connect the dots.
@dooseobkim2100 Год назад ⁺³
You are my savior for being actually able to get ready to read all of those AI related papers which I’m completely unaware of. I was stuck at the part of my thesis which I have to provide theoretical background of ChatGPT. As a business student I’m super grateful to learn these knowledges in computer science through your short lecture👍👍
@reshamgaire4188 Год назад ⁺⁴
Finally found a perfect video that cleared all my confusions. Thank you so much ma'am, may god bless you 🙏
@akintilotimileyin6202 2 месяца назад
This is the best explanation of transformer architecture I have seen so far. Thanks.
@Yaddu143 Год назад ⁺⁴
I really want you talk about attention. Thank you, shinning in this video.
@stevemassicotte4068 Год назад ⁺¹⁶
@16:14,, the binary table is wrong, there are two sevens.
The second column should start with 8 and not a second 7.
Attention is all you need ;)
Thanks for the video !
@BCSEbadulIslam 9 месяцев назад
Came here to comment the same 👍
@salamander7715 Год назад
Seeing all the comments of people saying that this video made things simple just makes me feel stupid ahah! This video is amazing and the explanations are great, but i can't say i've understood more then 35% of the concepts. I'll have to watch this several times for sure
@geekyprogrammer4831 2 года назад ⁺¹
This high quality video deserves a lot more views!
@AssemblyAI 2 года назад
Thank you!
@imagnihton2 2 года назад ⁺¹²
This made the concept sound incredibly simple compared to some other sources... Amazing!
@AssemblyAI 2 года назад ⁺¹
Great to hear, thank you!
@mohamadhasanzeinali3674 2 года назад ⁺⁴
I saw numerous videos about Transformers architecture. In my opinion, your video is the best among them. Appreciate that.
@AssemblyAI 2 года назад ⁺¹
Thank you, that is great to hear. :)
@nikhilshrestha4711 Год назад ⁺⁴
really love how you described the model. easier to understand 🙌
@AssemblyAI Год назад
Glad it was helpful!
@moeal5110 Год назад ⁺¹
This is most clear and resourceful video I've seen. Thank you for your hard work and for sharing these resources
@vivekpetrolhead 11 месяцев назад
Best explanation for beginners I've seen besides statquest
@shubham-pp4cw 2 года назад ⁺¹
clear explanation of quiet complex topic and explained easily in shorted period time
@AssemblyAI 2 года назад
Glad to hear you liked it!
@sivad2895 Год назад ⁺¹
The best video on transformer architecture with great explanations and charming presentation.
@moonlight-td8ed 7 месяцев назад
cleanest and most informative video ever.. covered whole attention is all you need paper in 19 mins.. damn.. thank you MISRA TURP and assembly ai
@StoriesFable 7 месяцев назад
I'm watching lot of videos of Transformers, But that is exactly I want. Thank You So Much Ma'am. And also AssemblyAl.
@SAM-t9r9b Год назад
I overall liked the video a lot. I just do not thing is enough to understand the whole concept. Especially masked multi head attention layer was missing and how the actually outcome of the model is created (translation etc)
@lexflow2319 2 года назад ⁺³
I don't understand why there are 6 decoders and encoders. The diagram shows 1 each. Also, what is the output as input to the decoder. Is that the last output from final softmax
@abinav92 Год назад
Best video on intro to transformers!!!
@abooshehrian 2 месяца назад
great summary. One of the densest videos I've watched yet it was explained with so many examples. Thank you!
@nikhil182 Год назад
Thank you so much!💓this has to be the best introduction video to Transformers. We are planning to use Transformers for our Video Processing project.
@AssemblyAI Год назад ⁺¹
Glad it was helpful!
@yourshanky Год назад ⁺¹
Excellent explanation !! Sharp and clear. Thanks for sharing this.
@otsogileonalepelo9610 Год назад ⁺³
Just WOW! You broke down these concepts nicely. Thank you. Live long and prosper 🖖🖖
@AssemblyAI Год назад
Thank you!
@MrTheyosyos Год назад ⁺¹
"attentions for beginners" will be great :)
@anandanv2361 Год назад
The way you explained the concept was awesome. It is very easy to follow.👍
@donevo1 Год назад ⁺¹
very nice presentation! in 12:18 you say that attention is on 8 words. from reading the paper I think that attention is on ALL the words, and 8 is the number of heads: each word vector (D=512) is split to 8, i.e vector dimention in each head is 64.
@bysedova Год назад
Please make a detailed video about self-attantion! Thank you for your explanation! I like you haven't used difficult math terms and you have tried to explain for understanding with easy material supply.
@bdoriandasilva 2 года назад ⁺¹
Great video with a clear explanation. thank you!
@talktovipin1 10 месяцев назад
Very nice explanation. Incorporating animations into the images while explaining would enhance comprehension and make it even more beneficial.
@rokljhui864 Год назад
Interesting. Sounds like a Fourier transform; Obtaining a frequency distribution from a time-series, reveals the underlying frequency components and amplitudes. Are you essentially distilling the 'word cycles' from the sentences to obtain meaning from the word patterns across different word combination lengths (from single word to many thousand) And, optimising the predictability of the next word automatically optimises for the appropriate word combination lengths, that align with actual meaning. i.e Understanding 'peaks' are optimised similar to the fundamental frequencies in a Fourier transform. ?
@helgefredriksen 5 месяцев назад
Hi, could anyone explain how the Feed Forward part of the transformer learns? How does the loss function work? By masking out some of the input from the self-attention part during training and then compare the real value with predicted value?
@mbrochh82 Год назад
I wish someone would explain how exactly the backpropagation works and what values exactly get nudged and tweaked during learning (and by which means)
@DAVIDBYANSI-g1o Год назад
Thank you for the presentation, it has been so insightful. I wish you made a video about the word embeddings of the transformers. Thanks
@AssemblyAI Год назад
Great suggestion!
@kellenswain2049 Год назад
11:06 from reading the paper, 64 is not the square root of the length of QKV vectors, it looks like it is d_model/h where h is the number of heads used in multihead attention. And so then I assume d_model is the length of the QKV vectors?
@GeorgeZoto Год назад
Great and both low and high level descprition of transformers, thank you for creating this useful resource :)
@dannown Год назад
This is a really lovely video -- very specific and detailed, but also followable. Thanks!
@AssemblyAI Год назад
Glad it was helpful!
@near_. Год назад
What's the purpose of output embedding?? What are we feeding in that???
@rufus9322 Год назад
Thank you for your video 🤗
How to understanding more details about word embedding method in Transformer model?
@AddisuSeteye Год назад ⁺¹
Amazing explanation. I can't wait to watch your explanation on another AI related topic.
@AssemblyAI Год назад
More to come!
@maryammoradbeigi6690 Год назад
Incredible explanation on the transformer... Amazing video. Thanks a lot
@AssemblyAI Год назад
Glad you liked it!
@pyaephyo3633 Год назад
i love it.
Your explanation is easy to understand.
@kartikgadad9285 Год назад
Thanks for explaining Transformers, can we have a video on Embeddings, seems super interesting. The Positional Encoding part was difficult to understand, as it has been just taken from abstract level, can we find better video on positional encoding?
@amitsingh7684 7 месяцев назад
very nicely explained with clear details
@juliennoel3061 9 месяцев назад
hi! oh yeah please a specific video on 'attention' 🙂 - And also : 'great job you are doing! Congrats! Thumbs !!'
@sanketdeshmukh491 2 года назад
Thank You for in depth explanation. Kudos!!!
@AssemblyAI 2 года назад
You're very welcome!
@guimaraesalysson Год назад ⁺¹
Theres any video about attention mechanism ?
@AssemblyAI Год назад ⁺²
Not yet but it's a good idea!
@carlosroquesuarezgurruchag8681 Год назад
Thx for the time. Very clear the explanation
@jayanthAILab Год назад
Great work mam. You made it simple to understand.
@actorjohanmatsfredkarlsson2293 Год назад
Great video. I’m missing how the attation layers: queries, keys and values and the output weights are trainee? Also what was the values matrix for?
@MrAmgadHasan Год назад
They are trained just like any neural network: we have a loss function that compares the model's output with the desired output, and then this loss is propagated backwards to the weights and biases and we use gradient descent to update the weights.
Lookup "back propagation" for more info or just look up"how neural networks are trained"
@VaibhavPatil-rx7pc Год назад
smile and learn and clean explaniation!!!
@amigospot 2 года назад
Nice video for a fairly complex architecture!
@AssemblyAI 2 года назад
Thanks Hyder! - Mısra
@kalyandey5195 10 месяцев назад
Awesome!! crystal clear explanation!!!
@hosseinsafari7514 25 дней назад
Thank you for this good explanation. please talk about attention.
@krishnakumarik208 Год назад
VERY GOOD EXPLANATION.
@amparoconsuelo9451 Год назад
I have read books and watched videos on Transformers. I still don't understand Transformers. I want to order from Amazon an assembly Transformer kit, work on it and have a Transformer I understand the way I undestand how Lotus 123 and Wordstar were created.
@devraj241 Год назад
great video, well explained!
@near_. Год назад
What's the purpose of output embedding?? What are we feeding in that???
@_joshwalter_ Год назад
This is phenomenal!
@wasifrock687 Год назад
very well explained. thank you!
@AssemblyAI Год назад
Glad it was helpful!
@wenshufan Год назад
Thank you for explaining the transformer in detail. However, I still don't get how do you train the Q,K,V matrix. The attention mechanism is calculated by from them. What type of feedback/truth can one use to train those matrix values then?
@rodi4850 2 года назад
best explanation!
@goelnikhils 2 года назад
Amazing Explanation. Vow. Thanks a lot
@hussainsalih3520 Год назад
amazing keep doing this amazing tutorials :)
@archowdhury007 Год назад ⁺¹
Beautifully explained. Loved it. First time I understood the transformer model so easily. Great work. Please keep creating more such content. Thanks.
@niyatisrivastava4-yearb.te820 11 месяцев назад
best explanation
@ilkeasal7622 4 месяца назад
amazing explanation!
@0Tyr 2 года назад
Very informative channel, and well presented..
@AssemblyAI 2 года назад
Thank you! - Mısra
@thebiggerpicture__ 2 года назад
Great video. Thanks!
@AssemblyAI 2 года назад
You're welcome :)
@Techie-time 3 месяца назад
Complete clarity, only when you know the subject 70%.
@6001navi Год назад
awesome explanation
@RewanSallam-z3c Год назад
geart work, may allah bless you and guide you 🥰🥰😍😍
@ankit9401 2 года назад ⁺²
You are awesome and I appreciate your efforts. After watching your video, I can say now I understand the transformer architecture.
I have a query. According to original BERT paper, two objectives used during training: Masked Language Model and Next Sentence Prediction. Are these training objectives present in original or all transformer models or they are specifically used for BERT ?
I hope you make video to explain attention and BERT model in future 😊
@AssemblyAI 2 года назад ⁺¹
Great to hear the video was helpful Ankit! These are not the tasks that were in the original transformer model. But I think they are not specific to BERT. Other architectures also use same/similar tasks to train their models. We have a BERT video in the channel by the way. Here it is: ruclips.net/video/6ahxPTLZxU8/видео.html
- Mısra
@strongsyedaa7378 2 года назад
@@AssemblyAI
So instead of using RNN & LSTM we directly use Transformers?
@keithwins 11 месяцев назад
Thank you that was excellent
@AbhinandanTete 4 месяца назад
thank you great explanation❤‍🔥
@andersonsystem2 3 года назад ⁺³
Good video
@AssemblyAI 3 года назад ⁺¹
Glad you enjoyed it :)
@wp1300 Год назад
13:35 Positional encoding
@near_. Год назад
What's the purpose of output embedding?? What are we feeding in that???
@abrahamowos 2 года назад
A question @ 11:30 : if for instance the values v are really large and you multiple them by the results from the softmax layer. Won't the resulting weighted be too high after adding them together?
@JackoMcW Год назад ⁺¹
I'm not sure I understand your question or what you mean by "too high," but consider that all of those softmax values will be
@RAZZKIRAN Год назад
thank u
@AssemblyAI Год назад
You're welcome!
@robl39 Год назад ⁺³
What is disappointing about this video is that you have to know about or understand 50 other concepts first
@JayTheMachine Год назад
thank you soo much, damn, love your explainations
@EmanueleOlivetti 11 месяцев назад
Around 16:00 the binary representation repeats twice 7 so the right part of the binary encoded numbers is incorrect
@titusfx Год назад
I'm still concern how all these papers don't have any mathematical rigour, there isn't one theorem, there is nothing. And it works....🤯 I can't imagine when the rigourosity start coming in, what would be the results. I'm starting to believe that deep learning is Physics for knowledge 😅
@nogur9 Год назад
Thanks :)
@snedunuri2946 2 месяца назад
Too much detail at the beginning. “6 encoders” or “query/value vectors” mean very little. I recommend using a running example which introduces those concepts as needed and with the proper contexr
@manjz7hm 11 месяцев назад
You explained well , but my brain not digesting it 😂
@juanpimentel4567 5 месяцев назад
Why are there 6 encoders and 6 decoders. Someone please explain.
@DivyanshuBhoyar-j6e Год назад
easiest explanation.
@roshanverma1123 Год назад
Great simplified content! Thanks! Btw, you look beautiful!
@nikbl4k 6 месяцев назад
great video, very interesting
@nirmesh44 10 месяцев назад
make attention video
@M7mdal7aj Год назад
thanks but the explanation is not detailed enough. but nice explanation for the positional embedding. thanks
@denwo1982 9 месяцев назад
Chatgpt “explain this video to me as if I was an 8 year old”
@davidespinosa1910 2 месяца назад
Why are transformers so complicated ? Why not use a simple network that processes the entire context window at once ?
@AverageOrdinaryEverydaySupDog 5 месяцев назад
☝ question.... hmmm.. what?

Следующие

Автовоспроизведение

What is Transfer Learning? | With code in Keras