New xLSTM explained: Better than Transformer LLMs?

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024
  • JUST days ago a new alternative to transformer LLMs was published: xLSTM, in particular mLSTM. The Matrix Long Short-Term Memory (mLSTM) network is an advanced variation of the traditional Long Short-Term Memory (LSTM) model. The core idea of mLSTM is based on "accumulated covariance" with exponential gating functions. I explain it in detail in this video and compare it to the classical attention mechanism.
    The actual performance can't be independently evaluated at the moment, since the research paper was just published. I will keep you informed.
    mLSTM differentiates itself by employing a matrix-based approach to its architecture, where both the input and recurrent weights along with the gates (input, forget, and output gates) are represented as matrices rather than the standard vectors. This configuration allows the mLSTM to process inputs and maintain internal states using matrix operations, facilitating a more intricate interaction between inputs and the recurrent network's hidden states.
    One of the most significant innovations of mLSTM is its ability to capture and represent more complex relationships and dependencies within the data. By utilizing matrices to represent its states and operations, mLSTM can encapsulate relationships across multiple dimensions of the input data simultaneously, increasing the network's representational power and computational efficiency, especially for tasks involving high-dimensional data sets such as natural language processing and time series analysis involving multiple variables. This matrix approach not only enhances the depth of data interaction within each cell of the network but also allows the network to model interactions across different features within the data
    All rights w/ authors:
    xLSTM: Extended Long Short-Term Memory
    arxiv.org/pdf/...
    #airesearch
    #ai
    #newtechnology
  • НаукаНаука

Комментарии • 12

  • @first-thoughtgiver-of-will2456
    @first-thoughtgiver-of-will2456 3 месяца назад +1

    this just makes me want to innovate off mamba

  • @propeacemindfortress
    @propeacemindfortress 4 месяца назад +1

    nice, my favorite timeseries staple get's an upgrade 😄
    awesome find, and big big thanks for sharing

  • @wiktorm9858
    @wiktorm9858 4 месяца назад +1

    Is there a ready-made pytorch implementation of this?

  • @denishclarke4470
    @denishclarke4470 3 месяца назад

    Hey, please provide the slides

  • @davidhauser7537
    @davidhauser7537 4 месяца назад

    very cool

  • @timothywcrane
    @timothywcrane 4 месяца назад

    I hope this resets the audio industry as well. LSTM are great for melody prediction etc... I wonder how this new modeling will be applicable and expandable in scope.

    • @Dom-zy1qy
      @Dom-zy1qy 4 месяца назад

      I haven't had much luck creating a good model to predict melodies. Any resources you recommend?

    • @timothywcrane
      @timothywcrane 4 месяца назад

      @@Dom-zy1qy check out @ValerioVelardoTheSoundofAI

  • @thedoctor5478
    @thedoctor5478 4 месяца назад

    woh woh. did you forgot to say a little something at beginning of video?

    • @thomasmitchell2514
      @thomasmitchell2514 4 месяца назад +1

      Hahaha my wife rolls her eyes when I say it along with him after gleefully clicking on a new upload 😅
      Also I can’t help echoing “beautiful” out loud even with headphones on 😂

    • @JonathanYankovich
      @JonathanYankovich 4 месяца назад

      He said it :)

    • @和平和平-c4i
      @和平和平-c4i 4 месяца назад

      ​ @thomasmitchell2514
      What are you all talking about ? What is the funny part ? all I see is machine learning stuff ...