Linear Complexity in Attention Mechanism: A step-by-step implementation in PyTorch

Longformer: The Long-Document Transformer

AI can't cross this line and we don't know why.

JD Vance, Tim Walz face off at CBS News Vice Presidential Debate

Vivica A. Fox Is Single, 60 & Satisfied-But Open to Romance

The SCARIEST Night of our Lives

Efficient Self-Attention for Transformers

Machine Learning Studio

Просмотров 3,3 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 окт 2024
The memory and computational demands of the original attention mechanism increase quadratically as sequence length grows, rendering it impractical for longer sequences.
However, various methods have been developed to streamline the attention mechanism's complexity. In this video, we'll explore some of the most prominent models that address this challenge.
#transformers
Link to the activation function video:
A Review of 10 Most Popular Activation Functions in Neural Networks
• A Review of 10 Most Po...

Комментарии • 11

@brianlee4966 6 месяцев назад ⁺¹
Thank you so much
@benji6296 3 месяца назад ⁺¹
what would be the advantage of this methods vs Flash attention. Flash attention speeds up the computation and it is an exact computation most of these methods are approximations. I would like if possible to see a video explaining other attention types as Paged attention and Flash Attention. Great content :)
@PyMLstudio 3 месяца назад ⁺¹
Thank you for the suggestion! You're absolutely right. In this video, I focused on purely algorithmic approaches, not hardware-based solutions like FlashAttention. FlashAttention is an IO-aware exact attention algorithm that uses tiling to reduce memory reads/writes between GPU memory levels, which results in significant speedup without sacrificing model quality.
I appreciate your input and will definitely consider making a video to explain FlashAttention!
@PyMLstudio Месяц назад
Thanks for the suggestion, I made a new video on Flash Attention:
FlashAttention: Accelerate LLM training
ruclips.net/video/LKwyHWYEIMQ/видео.html
I would love to hear your comments and if you have any other suggestions
@javadkhataei970 10 месяцев назад ⁺¹
Very informative. Thank you!
@PyMLstudio 10 месяцев назад
Glad it was helpful!
@pabloealvarez 10 месяцев назад ⁺¹
good explanation, very clear
@PyMLstudio 10 месяцев назад ⁺¹
Thank you for the nice comment! Glad you find the videos useful!
@buh357 5 месяцев назад
you should include axial attention and axial position embedding, its simple yet work great on image, and video.
@PyMLstudio 4 месяца назад ⁺¹
Thanks for the suggestion, yes I agree. I have briefly described axial attention in the vision transformer series
ruclips.net/video/bavfa_Rr2f4/видео.htmlsi=0SB9Yc_0SasafhJN
@buh357 4 месяца назад
@@PyMLstudio thats awesome, thanks you!

Следующие

Автовоспроизведение

Linear Complexity in Attention Mechanism: A step-by-step implementation in PyTorch

Linear Complexity in Attention Mechanism: A step-by-step implementation in PyTorch

Longformer: The Long-Document Transformer

Longformer: The Long-Document Transformer

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

JD Vance, Tim Walz face off at CBS News Vice Presidential Debate

JD Vance, Tim Walz face off at CBS News Vice Presidential Debate

Vivica A. Fox Is Single, 60 & Satisfied-But Open to Romance

Vivica A. Fox Is Single, 60 & Satisfied—But Open to Romance

The SCARIEST Night of our Lives

The SCARIEST Night of our Lives

Florida insurance carriers used altered hurricane damage reports, whistleblowers say

Florida insurance carriers used altered hurricane damage reports, whistleblowers say

Top Optimizers for Neural Networks

Top Optimizers for Neural Networks

The math behind Attention: Keys, Queries, and Values matrices

The math behind Attention: Keys, Queries, and Values matrices

Attention in transformers, visually explained | Chapter 6, Deep Learning

Attention in transformers, visually explained | Chapter 6, Deep Learning

FlashAttention - Tri Dao | Stanford MLSys #67

FlashAttention - Tri Dao | Stanford MLSys #67

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Rethinking Attention with Performers (Paper Explained)

Rethinking Attention with Performers (Paper Explained)

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Evolution of Self-Attention in Vision

Evolution of Self-Attention in Vision

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

ПИ ДИДДИ - АБСОЛЮТНОЕ ЗЛО | Грязный Голливуд (белые вечеринки)

ПИ ДИДДИ - АБСОЛЮТНОЕ ЗЛО | Грязный Голливуд (белые вечеринки)

От первого лица: Школа 7😡 СКАНДАЛ в ШКОЛЕ 😱РАЗГРОМИЛИ САЛОН 😰БОЛЬНОЙ ОДНОКЛАССНИК 🥹ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Школа 7😡 СКАНДАЛ в ШКОЛЕ 😱РАЗГРОМИЛИ САЛОН 😰БОЛЬНОЙ ОДНОКЛАССНИК 🥹ГЛАЗАМИ ШКОЛЬНИКА

"ВОТ БЫЛА ЖИЗНЬ В ДУШАНБЕ!" / таджикский дедушка снова скучает по СССР / ссылка на серию в описании

"ВОТ БЫЛА ЖИЗНЬ В ДУШАНБЕ!" / таджикский дедушка снова скучает по СССР / ссылка на серию в описании

Ситуация в ресторане 🤡 #влог #shorts #short

Ситуация в ресторане 🤡 #влог #shorts #short

Comedy Club: Пароль от telegram - Карибидис, Шальнов, Грачев @ComedyClubRussia

Comedy Club: Пароль от telegram - Карибидис, Шальнов, Грачев @ComedyClubRussia

Life hack 😂 Watermelon magic box! #shorts by Leisi Crazy

Life hack 😂 Watermelon magic box! #shorts by Leisi Crazy

Сколько стоит ПП?

Сколько стоит ПП?

Как снимали мой клип POLI - Котик

Как снимали мой клип POLI - Котик