Attention mechanism: Overview

Поделиться
HTML-код
  • Опубликовано: 25 авг 2024
  • This video introduces you to the attention mechanism, a powerful technique that allows neural networks to focus on specific parts of an input sequence. Attention is used to improve the performance of a variety of machine learning tasks, including machine translation, text summarization, and question answering.
    Enroll in this course on Google Cloud Skills Boost → goo.gle/436ZFPR
    View the Generative AI Learning path playlist → goo.gle/LearnG...
    Subscribe to Google Cloud Tech → goo.gle/Google...

Комментарии • 68

  • @googlecloudtech
    @googlecloudtech  Год назад +3

    Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

  • @PapiJack
    @PapiJack 5 месяцев назад +17

    Great video. One tip: Include some sort of pointer so you can direct the attention of the viewer towords a particular part of the slide. It helps following your explantion of the information dense slides.

  • @llewsub
    @llewsub Год назад +30

    confusing

  • @alexanderscoutov
    @alexanderscoutov 9 месяцев назад +27

    4:08 "H_b"... I could not find H_b here :-( I don't understand what are the H_d7 entities in the diagram. So confusing.

    • @aqgi7
      @aqgi7 5 месяцев назад +1

      I think she meant H_d, with d for decoder. H_d7 would be the 7th hidden state produced by decoder. But not clear why H_d7 appears three times (or more).

  • @user-tq2og8wo7m
    @user-tq2og8wo7m 7 месяцев назад +12

    Besides some mistakes. The invertion mechanism is not clear here. Where in the final slide is it shown? All I see is a correct order of words. Would be great to visualize where and how the ordering occurs.

  • @KumR
    @KumR 2 месяца назад

    Felt like being explained in person. Thanks a lot.

  • @cy12343
    @cy12343 Год назад +61

    So confusing...😵‍💫

    • @alileo1578
      @alileo1578 9 месяцев назад

      Yeah, many many concepts depend on the neural networks and deducing parameters with back-propagation

    • @JohnDoe-pq8yw
      @JohnDoe-pq8yw 3 месяца назад

      This takes place after the base model is trained and there are fine tuning training mechanisms as well, so this is not confusing at all, it is part of the information about LLM's.

  • @samuelqueiroz156
    @samuelqueiroz156 11 месяцев назад +16

    Still not clear for me. How does the network know which hidden state should have the higher score?

    • @unknown-otter
      @unknown-otter 10 месяцев назад +10

      I guess the answer you were looking for is the following: the same as the network knows how to classify digits, for example. It learns it by optimizing a loss function through backprop. So, attention is not a magic thing that connects inputs with outputs but just a mechanism for a network to learn what it needs to attend to.
      One cool thing is that you can think of attention head as a fully connected layer with weights that can change based on the input. While a normal fully connected layer has fixed weights and will process any data with them, attention head first calculates what would is the most beneficial in that input data and then run it through a fully connected layer!

  • @for-ever-22
    @for-ever-22 5 месяцев назад +1

    Thanks to the creator. Will be coming back to this video which is amazing and well detailed

  • @devloper_hs
    @devloper_hs 2 месяца назад +1

    for those refering to alpha not present its a actually. Its just some constant that when multiplied by hidden vector produces attention.

  • @aakidatta
    @aakidatta 9 месяцев назад +8

    I watched it almost 4 times and still not able figure out. Where is Alpha in the slide 3:58?

  • @manjz7hm
    @manjz7hm 8 месяцев назад +15

    Google should give attention to simplify the content to public , couldn't completely get the concept .

  • @Udayanverma
    @Udayanverma 10 месяцев назад +6

    where is alpha in this whole diagram! why do you guys make it more difficult than it is.

  • @BR-hi6yt
    @BR-hi6yt Год назад +5

    Thanks for the hidden states, very clear.

  • @abhiksaha3451
    @abhiksaha3451 9 месяцев назад +7

    Is this video made by a generative AI 😂?

  • @richardglady3009
    @richardglady3009 11 месяцев назад +8

    Very complex concepts that were well presented. I may not understand everything (I didn’t-but that is a reflection of my ignorance), the overall picture of what occurred is clear. Thank you.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 9 месяцев назад +4

    Where’s the alpha on the slide?

  • @changliu7553
    @changliu7553 4 дня назад

    Why you go from "the cat eat the mouse" to "black cat eat the mouse"? is this a mistake? thanks,

  • @ChargedPulsar
    @ChargedPulsar 6 месяцев назад +2

    I think these tutorials are thrown in the internet to further slow down and confuse people. The video explains nothing. It will only make sense to people who already know attention mechanism.

  • @ipurelike
    @ipurelike 11 месяцев назад +12

    too high leveled, not enough detail... where are the dislikes?

  • @arkaganguly1
    @arkaganguly1 2 месяца назад +2

    Confusing video. Very difficult to follow

  • @mushroom4533
    @mushroom4533 4 месяца назад +1

    Hard to understand the final slide....

  • @KiranMundy
    @KiranMundy 9 месяцев назад +8

    Very helpful video, but I got confused at one point and am hoping you can help clarify some points.
    At timestamp 4:14: You talk of "alpha" representing the attention weight at each time step. I don't see any "alpha" onscreen, so am a bit confused. Is "alpha" a weight that will get adjusted with training and indicates how important that particular word is at time step 1 in the decoding process?
    I'm also not completely clear on the difference between hidden state and weights, could you explain this?
    It would help me if while explaining you could point to the value you're referring to onscreen and if it were possible to clarify that when you talk about time step, you are referring to the first decoder time step (is that right?)

    • @NetworkDirection
      @NetworkDirection 8 месяцев назад +3

      I assume by 'alpha' she means 'a'

    • @m1erva
      @m1erva 6 месяцев назад

      hidden state is activation function for each word

  • @yashvander
    @yashvander Год назад +11

    Just a quick question, I'm not able to wrap my head around how encoder gets the decoder hidden state annotated by Hd?

    • @kartikthakur-ql9yn
      @kartikthakur-ql9yn Год назад +1

      Encoder doesn't it decoder hidden states .. it's opposite

    • @MrAmgadHasan
      @MrAmgadHasan Год назад +1

      What happens is: The encoder encodes the input and passes it to the decoder. For each time step in the output, the decoder gets the hidden states of all time steps concatenated as a matrix. It then calculates the attention weights.

    • @thejaniyapa3660
      @thejaniyapa3660 Год назад

      @@MrAmgadHasan Thanks for the explanation. Then how does the Encoder hidden states said to be associated with each word(3.26)? It should be part of sentence before nth word + nth word

  • @user-tq2og8wo7m
    @user-tq2og8wo7m 7 месяцев назад +4

    Beside some mistakes, it is still not clear to me how the inverting mechanism operates. All i can observe is an already correctly ordered sequence of words. Would be great to visualize where and how the ordering occurs.

  • @dariovicenzo8139
    @dariovicenzo8139 7 месяцев назад +3

    just a waste of time and memory for youtube servers

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 9 месяцев назад +2

    I think you are introducing an interesting angle that hasn’t been presented before. Thanks.

  • @iGhostr
    @iGhostr 8 месяцев назад +2

    confusing😢

  • @kartikpodugu
    @kartikpodugu 7 месяцев назад

    I think this is explanation for general attention mechanism and not attention in transformers.

  • @interweb3401
    @interweb3401 22 дня назад

    Not clear!

  • @thinkmath4270
    @thinkmath4270 Месяц назад

    It started good but fizzled out as it progresses. unnecessarily confusing. anyways good attempt.

  • @user-ez9ex8hx6v
    @user-ez9ex8hx6v 8 месяцев назад +1

    Ok got it watched thank you yeah

  • @franktaylor7978
    @franktaylor7978 28 дней назад

    it should be "The black cat .."

  • @saurabhmahra4084
    @saurabhmahra4084 10 месяцев назад +62

    You are the example why everyone should not start making youtube videos. You literally made a simple topic look complex.

    • @jiadong7873
      @jiadong7873 5 месяцев назад +1

      agree

    • @Dom-zy1qy
      @Dom-zy1qy 5 месяцев назад +16

      Disagree heavily. For me, this was more palletable than other videos I'd seen on the subject.
      Don't see the point for needlessly harsh criticism.

    • @Omsip123
      @Omsip123 2 месяца назад +1

      You are the example why commenting should be disabled

    • @Omsip123
      @Omsip123 2 месяца назад +2

      Besides, you probably meant to write "not everyone should" instead of "everyone should not" but that might be too complex too.

    • @baluandhavarapu
      @baluandhavarapu Месяц назад +2

      That's an incredibly rude thing to say. And I disagree

  • @gg-ke6mw
    @gg-ke6mw 8 месяцев назад

    this is so confusing.
    Why are Google courses so difficult to understand?

  • @user-su9pg1jo4x
    @user-su9pg1jo4x 7 месяцев назад

    4:04 there is no alpha but an "a" in the sum on the left.

  • @tamurchoudary3452
    @tamurchoudary3452 4 месяца назад +1

    Regurgitating spoon fed knowledge … google has fallen behind.

  • @user-ez9ex8hx6v
    @user-ez9ex8hx6v 8 месяцев назад

    Yeah okay watched

  • @muskduh
    @muskduh Год назад

    thanks

  • @HelenaCrawford-q6f
    @HelenaCrawford-q6f 2 дня назад

    Smith Linda Thompson Michelle Perez Ronald

  • @julius3005
    @julius3005 6 месяцев назад

    La explicación es pobre, esconden gran cantidad de procesos.

  • @primeentelechy
    @primeentelechy Месяц назад

    I'm sorry, but this video is complete rubbish. Incoherent explanation that is unlikely to help anyone. Plus a number of little errors that just should not be there in such a short video - not to mention one by one of the world's most prominent tech companies. Even the example "English sentence" chosen isn't actually a valid English sentence 🤦‍♂️

  • @abhyutaichou8322
    @abhyutaichou8322 Год назад

  • @kislaya7239
    @kislaya7239 6 месяцев назад

    This is a poor video for someone who does not know this topic.

  • @user-ep1tz8gv8f
    @user-ep1tz8gv8f 11 месяцев назад +5

    confusing

  • @yahavx
    @yahavx Год назад +15

    confusing

    • @Shmancyfancy536
      @Shmancyfancy536 5 месяцев назад

      You’re not gonna learn it in 5 min

  • @sergeykurk
    @sergeykurk 10 месяцев назад +3

    confusing

  • @vedansharts8274
    @vedansharts8274 Год назад +5

    confusing

  • @dimitrisparaschakis3280
    @dimitrisparaschakis3280 11 месяцев назад +3

    confusing

  • @VikasDubeyg
    @VikasDubeyg Год назад +9

    confusing