Language Learning with BERT - TensorFlow and Deep Learning Singapore

Поделиться
HTML-код
  • Опубликовано: 10 янв 2025

Комментарии • 19

  • @rohitdhankar360
    @rohitdhankar360 5 лет назад +3

    @10:05 - Excellent explanation of Byte-Pair Encodings , thanks.

  • @autripat
    @autripat 5 лет назад +8

    The presenter says, these models "do not use RNNs" (correct), instead "they use CNNs" (incorrect, no use of convolution kernels). They use simple linear transformations of the type XW(transpose) + b

    • @jaspreetsahota1840
      @jaspreetsahota1840 5 лет назад +2

      You can model convolution operations with transformers.

    • @mouduge
      @mouduge 5 лет назад +8

      IMHO, that's debatable. Indeed, think of what happens when you apply the same dense layer to each input in a sequence? Well you're effectively running a 1D convolutional layer with kernel size 1. If you're familiar with Keras, try building a model with:
      TimeDistributed(Dense(10, activation="relu"))
      then replace it with this:
      Conv1D(10, kernel_size=1, activation="relu")
      You'll see that it gives precisely the same result (assuming you use the same random seeds).
      Since the Transformer architecture applies the same dense layers across all time steps, you can think of the whole architecture as a stack of 1D-Convolutional layers with kernel size 1 (then of course there's the important Multihead attention part, which is a different beast altogether).
      Granted, it's not the most typical CNN architecture, which usually use fairly few convolutional layers with kernel size 1, but still, it's not really an error to say the Transformer is based on convolutions. I think Martin's goal was mostly to highlight the fact that, contrary to RNNs, every time step gets processed in parallel.
      Just my $.02! :))

  • @daewonyoon
    @daewonyoon 5 лет назад +5

    Thank you. This summary/introduction is very very helpful.

  • @hiyassat
    @hiyassat 5 лет назад +3

    Can we have link to slides please

  • @archywillhe1379
    @archywillhe1379 4 года назад

    wow engineers sg sure haz come a long way ha! great talk

  • @mkpandey4909
    @mkpandey4909 5 лет назад +1

    Where to get this PPT; Please share the link

  • @monart4210
    @monart4210 4 года назад

    Could I extract word embeddings from BERT and use them for unsupervised learning, e.g. topic modeling? :)

  • @chirpieful
    @chirpieful 5 лет назад +2

    Very good updates for nlp enthusiasts

  • @MegaBlizzardman
    @MegaBlizzardman 5 лет назад +1

    Very clear and helpful talk

  • @janekou2482
    @janekou2482 5 лет назад

    Does bpe also works well for non english languages like chinese and french?

  • @zingg7203
    @zingg7203 4 года назад

    BERT uses wordpiece. Albert uses sentence piece

  • @zingg7203
    @zingg7203 4 года назад +1

    How is it CNN based?

  • @pr22345
    @pr22345 5 лет назад +1

    Very Informative.

  • @xiaochengjin6478
    @xiaochengjin6478 6 лет назад

    very nice speech!

  • @revolutionarybitnche
    @revolutionarybitnche 5 лет назад

    thank you!

  • @ishishir
    @ishishir 5 лет назад

    Nice !

  • @chriscannon303
    @chriscannon303 4 года назад

    what in gods name are you talking about?? what is an LSTM chain?? I came here because I need to know im writing the correct content for my website and I haven't a fucking clue what the hell you are on about.