Efficient Self-Attention for Transformers

Поделиться
HTML-код
  • Опубликовано: 1 окт 2024
  • The memory and computational demands of the original attention mechanism increase quadratically as sequence length grows, rendering it impractical for longer sequences.
    However, various methods have been developed to streamline the attention mechanism's complexity. In this video, we'll explore some of the most prominent models that address this challenge.
    #transformers
    Link to the activation function video:
    A Review of 10 Most Popular Activation Functions in Neural Networks
    • A Review of 10 Most Po...

Комментарии • 11

  • @brianlee4966
    @brianlee4966 6 месяцев назад +1

    Thank you so much

  • @benji6296
    @benji6296 3 месяца назад +1

    what would be the advantage of this methods vs Flash attention. Flash attention speeds up the computation and it is an exact computation most of these methods are approximations. I would like if possible to see a video explaining other attention types as Paged attention and Flash Attention. Great content :)

    • @PyMLstudio
      @PyMLstudio  3 месяца назад +1

      Thank you for the suggestion! You're absolutely right. In this video, I focused on purely algorithmic approaches, not hardware-based solutions like FlashAttention. FlashAttention is an IO-aware exact attention algorithm that uses tiling to reduce memory reads/writes between GPU memory levels, which results in significant speedup without sacrificing model quality.
      I appreciate your input and will definitely consider making a video to explain FlashAttention!

    • @PyMLstudio
      @PyMLstudio  Месяц назад

      Thanks for the suggestion, I made a new video on Flash Attention:
      FlashAttention: Accelerate LLM training
      ruclips.net/video/LKwyHWYEIMQ/видео.html
      I would love to hear your comments and if you have any other suggestions

  • @javadkhataei970
    @javadkhataei970 10 месяцев назад +1

    Very informative. Thank you!

    • @PyMLstudio
      @PyMLstudio  10 месяцев назад

      Glad it was helpful!

  • @pabloealvarez
    @pabloealvarez 10 месяцев назад +1

    good explanation, very clear

    • @PyMLstudio
      @PyMLstudio  10 месяцев назад +1

      Thank you for the nice comment! Glad you find the videos useful!

  • @buh357
    @buh357 5 месяцев назад

    you should include axial attention and axial position embedding, its simple yet work great on image, and video.

    • @PyMLstudio
      @PyMLstudio  4 месяца назад +1

      Thanks for the suggestion, yes I agree. I have briefly described axial attention in the vision transformer series
      ruclips.net/video/bavfa_Rr2f4/видео.htmlsi=0SB9Yc_0SasafhJN

    • @buh357
      @buh357 4 месяца назад

      @@PyMLstudio thats awesome, thanks you!