xLSTM: Extended Long Short-Term Memory

Поделиться
HTML-код
  • Опубликовано: 18 янв 2025

Комментарии • 5

  • @gabrielmongaras
    @gabrielmongaras  8 месяцев назад +1

    Forgot to mention, you just stack sLSTM/mLSTM layers similar to a transformer, like usual 😏
    The sLSTM uses a transformer-like block and the mLSTM uses a SSM-like block which can be seen in section 2.4.

  • @acasualviewer5861
    @acasualviewer5861 8 месяцев назад

    Is it slow to train like LSTMs and RNNs are? A major benefit from Transformers is faster parallelized training. I would assume xLSTMs would be constrained by their sequential nature.

    • @gabrielmongaras
      @gabrielmongaras  8 месяцев назад

      Yep, should still be slow to train. I don't see any way to make one of the cells into something parallel like a transformer since the cells are so complicated.

  • @-slt
    @-slt 8 месяцев назад +1

    constant movement of the screen makes my (and sure many others) head to explode. please move a little less. zoom in and out less. it helps the viewer to focus on the text and your explanation. thanks. :)

    • @gabrielmongaras
      @gabrielmongaras  8 месяцев назад

      Thanks for the feedback! Will keep this in mind next time I'm recording