RING Attention explained: 1 Mio Context Length

Next-Gen AI: RecurrentGemma (Long Context Length)

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

revealing the truth...

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

NEW DRAGON HUNTER NPC FULL GUIDE | DRAGON HEART QUEST? | Blox Fruits...

NEW: INFINI Attention w/ 1 Mio Context Length

Discover AI

Просмотров 2,2 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 янв 2025

Комментарии • 7

@softsensei 8 месяцев назад ⁺¹
Just wanted to say you are doing the community such a great service and contribution. Thank you!
@hjups 8 месяцев назад ⁺¹
It's not really Internal RAG, it's more of internal summarization - similar to RWKV (the mechanism is different though).
RAG would require that the model can retrieve from an infinite DB rather than a finite summarized state. This method would very likely fail at LiM tasks, similar to one that was tried in a previous video (with instructions in the middle of a block of unrelated text). The model would have to know that the instruction is going to be more important than specific details from the text passage (the same concept would apply to retrieving specific details). That also means that this method may fail at copying outside of the current block, similar to Mamba variants (and for the same reason).
@KitcloudkickerJr 8 месяцев назад
So, essentially, Its a key value memroy netowrk abked into a LLM model?
@hjups 8 месяцев назад
It's a summarization state, constructed from the outer product of the block K-V vectors. So each block of size S has K and V vectors of size Sxd, and they form a dxd "summary" of the K-V state for that block. Then the next block can "query" into that dxd state using a linear attention mechanism, which is added to the local self attention (within the block). Essentially, a fancy hybrid model like Jamba, just implemented different, but should have similar pitfalls. At least the summarization state here is of size dxd rather than 1x(a*d), where a
@EobardUchihaThawne 8 месяцев назад
I wish i had good level of math to understand where those formulas being derived
@Whysicist 8 месяцев назад
Virtual Multiport Memory?
@Charles-Darwin 8 месяцев назад
With their deepmind arm, im thinking theyll reach organics/organic-analog computing first. Imagine if states and events were global - global tx/rx. A chemical solution.
Shame on google for assisting in the war machine with their tech. "Dont be evil"

Следующие

Автовоспроизведение

RING Attention explained: 1 Mio Context Length

RING Attention explained: 1 Mio Context Length

Next-Gen AI: RecurrentGemma (Long Context Length)

Next-Gen AI: RecurrentGemma (Long Context Length)

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

revealing the truth...

revealing the truth...

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

NEW DRAGON HUNTER NPC FULL GUIDE | DRAGON HEART QUEST? | Blox Fruits...

NEW DRAGON HUNTER NPC FULL GUIDE | DRAGON HEART QUEST? | Blox Fruits...

Superman - Teaser Trailer Tomorrow

Superman - Teaser Trailer Tomorrow

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

NVIDIA and MIT are cooking a new AI model...

NVIDIA and MIT are cooking a new AI model...

NEW: Better In-Context Learning ICL, Improved RAG (Harvard)

NEW: Better In-Context Learning ICL, Improved RAG (Harvard)

Beyond RAG: New Continual Learning of LLM w/ InCA

Beyond RAG: New Continual Learning of LLM w/ InCA

FlashAttention - Tri Dao | Stanford MLSys #67

FlashAttention - Tri Dao | Stanford MLSys #67

Deep dive - Better Attention layers for Transformer models

Deep dive - Better Attention layers for Transformer models

Иван Ургант - Про возвращение Вечернего Урганта, Ёлки и природоведение / Опять не Гальцев

Иван Ургант - Про возвращение Вечернего Урганта, Ёлки и природоведение / Опять не Гальцев

Dirichlet Energy Minimization Explains In-Context Learning (Harvard)

Dirichlet Energy Minimization Explains In-Context Learning (Harvard)

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

K-media: SHAMPOO

K-media: SHAMPOO

КТО ЛУЧШЕ ПЕРЕКРИЧАЛ?😂

КТО ЛУЧШЕ ПЕРЕКРИЧАЛ?😂

صديقي سيساعدني دائمًا في اختبار الرياضيات! 🤓👊

صديقي سيساعدني دائمًا في اختبار الرياضيات! 🤓👊

New MAZE CHALLENGE - HELP Oren, Pinki, Simon and Vineria | Incredibox Sprunki

New MAZE CHALLENGE - HELP Oren, Pinki, Simon and Vineria | Incredibox Sprunki

Бабушка всегда спасет #aminkavitaminka #aminokka #memes #аминкавитаминка

Бабушка всегда спасет #aminkavitaminka #aminokka #memes #аминкавитаминка

Плюсы беременности в Южной Корее 🤰🏻😮 #корея #беременность #дети #путешествия #shorts

Плюсы беременности в Южной Корее 🤰🏻😮 #корея #беременность #дети #путешествия #shorts

Прелести быть старшим ребёнком 😁 #aminkavitaminka #memes #аминкавитаминка

Прелести быть старшим ребёнком 😁 #aminkavitaminka #memes #аминкавитаминка