L19.5.2.3 BERT: Bidirectional Encoder Representations from Transformers

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024

Комментарии • 5

  • @lily-qs7cr
    @lily-qs7cr 2 года назад +1

    thanks for all of your transformers viedos :)

  • @billykotsos4642
    @billykotsos4642 2 года назад

    Quick question… the output of the Bert is several word embeddings right? So does that mean they need to be concatenated/added/averaged to create the feature to be passed to the MLP classifier?
    What is the most standard method these days?

    • @payam-bagheri
      @payam-bagheri Год назад +1

      I have been trying to understand how sentence embedding (one embedding vector for the whole sentence) is generated by BERT. What I've understood so far is as follows:
      - The [CLS] token that is added to the beginning of the sentence (assuming you give the model a sentence and want the embedding vector as the output) evolves in the same way as any other token evolves as it passes through the network (BERT) and what results from that process for the [CLS] token is used as a representative embedding for the whole sentence.
      - Or, the max/mean along each dimension of the embedding vectors for each token is used as the sentence embeddings.

  • @peregudovoleg
    @peregudovoleg 6 месяцев назад

    17:55, 2 years ago, 300kk parameters were "quite large". Look at us now - 405kkk llama 3. 1000+ grow. What about next 2 years...