BLEU Score Explained

What is an Algorithm?

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Minecraft but I SAVE The Village... (From Grox)

My FBI Declassified Story

I Hired a Fortnite Pro to Secretly Destroy PWR!

Sliding Window Attention (Longformer) Explained

DataMListic

Просмотров 1,9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 29 июл 2024
In this video we talk about the sliding window attention, the diluted sliding window attention and the global+sliding window attention, as introduced in the Longformer paper. We take a look at the main disadvantage of the classical attention mechanism introduced in the Transformer paper (i.e. the quadratic time complexity) and how the sliding window attention proposes to solves this issue.
References
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Transformer Self-Attention Mechanism Explained: • Transformer Self-Atten...
"Longformer: The long-document transformer" paper: arxiv.org/abs/2004.05150
Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
BART model explained: • BART Explained: Denois...
Why Language Models Hallucinate: • Why Language Models Ha...
Grounding DINO, Open-Set Object Detection: • Object Detection Part ...
Detection Transformers (DETR), Object Queries: • Object Detection Part ...
Wav2vec2 A Framework for Self-Supervised Learning of Speech Representations - Paper Explained: • Wav2vec2 A Framework f...
Transformer Self-Attention Mechanism Explained: • Transformer Self-Atten...
How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA): • How to Fine-tune Large...
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained: • Multi-Head Attention (...
LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p: • LLM Prompt Engineering...
Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro
00:26 - Original attention mechanism
00:50 - Sliding window attention
01:56 - Dilated sliding window attention
02:40 - Global + Sliding window attention
03:31 - Outro
Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic / datamlistic
📸 Instagram: @datamlistic / datamlistic
📱 TikTok: @datamlistic / datamlistic
Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
#slidingwindowattention #longformer #attentionmechanism

Комментарии • 4

@datamlistic 3 месяца назад ⁺²
"Transformer Self-Attention Mechanism Explained" video link: ruclips.net/video/u8pSGp__0Xk/видео.html
@miketoreno8371 3 месяца назад
Best
@mutantrabbit767 3 месяца назад
this was super helpful, thanks so much!
@datamlistic 3 месяца назад
You're welcome! Happy to hear you found it helpful! :)

Следующие

Автовоспроизведение

BLEU Score Explained

BLEU Score Explained

What is an Algorithm?

What is an Algorithm?

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Minecraft but I SAVE The Village... (From Grox)

Minecraft but I SAVE The Village... (From Grox)

My FBI Declassified Story

My FBI Declassified Story

I Hired a Fortnite Pro to Secretly Destroy PWR!

I Hired a Fortnite Pro to Secretly Destroy PWR!

The Pit Stop AS9 E12 🏁 | Bob The Drag Queen & Katya Take Over! 👑 | RuPaul’s Drag Race AS9

The Pit Stop AS9 E12 🏁 | Bob The Drag Queen & Katya Take Over! 👑 | RuPaul’s Drag Race AS9

Longformer: The Long-Document Transformer

Longformer: The Long-Document Transformer

Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained

Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained

Sliding Window Technique - Algorithmic Mental Models

Sliding Window Technique - Algorithmic Mental Models

Swin Transformer paper animated and explained

Swin Transformer paper animated and explained

Efficient Self-Attention for Transformers

Efficient Self-Attention for Transformers

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

How Far is Too Far? | The Age of A.I.

How Far is Too Far? | The Age of A.I.

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Linformer: Self-Attention with Linear Complexity (Paper Explained)

LLM Jargons Explained: Part 3 - Sliding Window Attention

LLM Jargons Explained: Part 3 - Sliding Window Attention

PANDAFX vs MAXWELL / КУБОК ФИФЕРОВ 2024 / 3 ТУР

PANDAFX vs MAXWELL / КУБОК ФИФЕРОВ 2024 / 3 ТУР

ЭКСПЕРИМЕНТ - смогу ли раздавить машинку руками? #shorts

ЭКСПЕРИМЕНТ - смогу ли раздавить машинку руками? #shorts

РОМАШКА ̶Н̶Е̶ БЕСПОЛЕЗНА - И ВОТ ПОЧЕМУ! / PLANTS VS ZOMBIES

РОМАШКА ̶Н̶Е̶ БЕСПОЛЕЗНА - И ВОТ ПОЧЕМУ! / PLANTS VS ZOMBIES

Мокаев уволен после боя! Пресс-конференция UFC 304 / Дана Уайт подвел итоги турнира

Мокаев уволен после боя! Пресс-конференция UFC 304 / Дана Уайт подвел итоги турнира

ВЫЖИЛ ЗА КОТЁНКА В БОЛЬШОМ ГОРОДЕ!

ВЫЖИЛ ЗА КОТЁНКА В БОЛЬШОМ ГОРОДЕ!

💎 разговоры по телефону с той самой 🥹

💎 разговоры по телефону с той самой 🥹

СБЕЖАЛ ЧЕРЕЗ КРЫШУ ОТ ЗЛЫХ РОДИТЕЛЕЙ В SCHOOLBOY RUNAWAY В МАЙНКРАФТ!

СБЕЖАЛ ЧЕРЕЗ КРЫШУ ОТ ЗЛЫХ РОДИТЕЛЕЙ В SCHOOLBOY RUNAWAY В МАЙНКРАФТ!

Наказали Перекупа - Больно и Справедливо! || Гнилая схема с KIA RIO

Наказали Перекупа — Больно и Справедливо! || Гнилая схема с KIA RIO