Transformers | Stanford CS224U Natural Language Understanding | Spring 2021

Поделиться
HTML-код
  • Опубликовано: 24 янв 2025

Комментарии • 4

  • @fezkhanna6900
    @fezkhanna6900 2 года назад +1

    This is the most notable video about attention I have come across so far. Thank you for uploading this

  • @AhtiAhde
    @AhtiAhde 2 года назад

    Is there a mistake at 09:50?
    "You have 12 or 24 attention heads"; shouldn't that be 12 or 24 layers with as many attention heads as the length of tokens in the input / output sequence?
    Also, this is VERY well done lecture series! We will probably have our own NLU course at our university based on these materials! This is a huge service for the next generation of natural language related data scientists!

    • @lywang5304
      @lywang5304 2 года назад

      I think from the original paper both the number of attention heads and the number of layers are arbitrary and need not to be affected by the length of tokens. Though paper gives its choices.