Speculations on Test-Time Scaling (o1)

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Do we need Attention? A Mamba Primer

진 (Jin) 'Running Wild' Official MV

Lil Baby - Insecurities

Long-Context LLM Extension

Sasha Rush 🤗

Просмотров 3 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 17 ноя 2024

Комментарии •

@lone0017 Месяц назад
Please keep up the good work, videos like these are incredibly helpful !
As someone who self-study ML, your channel along with similar others (Yannic Kilcher's, AI coffee talks) are such great sources of insight and education. Hopefully one day I could have the honour to do my phd with you.
@srush_nlp Месяц назад
🙏
@samlaki4051 Месяц назад ⁺¹
ahh been binging these since yesterday. really gives me a lot to think about regarding ways to think about extending the usual transformer architecture for different purposes.
thank you Professor!
- from sunny syracuse
@srush_nlp Месяц назад
Glad they're useful! Was just in Syracuse last month. Beautiful time of the year.
@kei55340 27 дней назад
Thanks for the great video! This is a more clear explanation of long-context extension than every other source I've seen.
One confusion I have with your video is that you connect high frequency rotation subspaces with short length (which I assume means small values of n-m). Why is there such a connection? In particular, why would tweaking a high frequency rotation be especially impactful on shorter length gaps?
@griffinadams5057 3 дня назад
great as always!
@victormanuel8767 Месяц назад ⁺¹
No I have NOT memorized the equation for self attention (I can barely read the math in these papers ) 😅.
But this has shown me what methods are effective and when.
Great visualizations. Fantastic work.
@srush_nlp Месяц назад
Think of it like F=MA or E=MC^2. It's just "softmax(KQ) V" all the way down these days.
@timbertrand1136 Месяц назад ⁺¹
Awesome video. Very Informative. Thanks a lot!
@AM-yk5yd Месяц назад
What's also would be interesting is to see how fp precision affects long context.
HF for example is very selective when model uses F32 and when base type. In MistralRotaryEmbedding, Sin Cos arguments are calculated in F32 so at least there are >64K values, but even then it depends on inv_freq(len=dim/2) which is casted to F32.
I suspect that when there are more positions than bits in single value, model will be more affected by such "noise" as there are more tokens. (Though with limited VRAM it's not exactly a problem to care)

Следующие

Автовоспроизведение

Speculations on Test-Time Scaling (o1)

Speculations on Test-Time Scaling (o1)

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Do we need Attention? A Mamba Primer

Do we need Attention? A Mamba Primer

진 (Jin) 'Running Wild' Official MV

진 (Jin) 'Running Wild' Official MV

Lil Baby - Insecurities

Lil Baby - Insecurities

THE OFFICIAL GENDER REVEAL 🧡💚7️⃣

THE OFFICIAL GENDER REVEAL 🧡💚7️⃣

RoPE Rotary Position Embedding to 100K context length

RoPE Rotary Position Embedding to 100K context length

Will Merrill: The Illusion of State in State-Space Models

Will Merrill: The Illusion of State in State-Space Models

Should I do a Postdoc? - Niloofar Mireshghallah

Should I do a Postdoc? - Niloofar Mireshghallah

Making 1 MILLION Token Context LLaMA 3 (Interview)

Making 1 MILLION Token Context LLaMA 3 (Interview)

AI Engineering 2025 PLAN: Max out AI COMPUTE for o1 Preview, Realtime API, and AI Assistants

AI Engineering 2025 PLAN: Max out AI COMPUTE for o1 Preview, Realtime API, and AI Assistants

Obsidian + Cursor = Magical AI Knowledge Management

Obsidian + Cursor = Magical AI Knowledge Management

Local GraphRAG with LLaMa 3.1 - LangChain, Ollama & Neo4j

Local GraphRAG with LLaMa 3.1 - LangChain, Ollama & Neo4j

Long Context Windows: Extending Llama 3

Long Context Windows: Extending Llama 3

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

ВЫПЛАТИТЬ ДОЛГИ за 4 ДНЯ С МОНСТРОМ! - Debt Hunt

ВЫПЛАТИТЬ ДОЛГИ за 4 ДНЯ С МОНСТРОМ! - Debt Hunt

Каха новый айфон #непосредственнокаха

Каха новый айфон #непосредственнокаха

БОРК МНЕ ПОДАРИЛИ СТАЙЛЕР ЗА 80.000Р😳

БОРК МНЕ ПОДАРИЛИ СТАЙЛЕР ЗА 80.000Р😳

Какие ИСТОРИЧЕСКИЕ СОБЫТИЯ происходили в одно время? #апвоут #реддит #апвоутистории

Какие ИСТОРИЧЕСКИЕ СОБЫТИЯ происходили в одно время? #апвоут #реддит #апвоутистории

О первых вайнах и реакции коллег #галич #идагалич #меньшова #интервью

О первых вайнах и реакции коллег #галич #идагалич #меньшова #интервью

Злой Учитель от 0 до 100 лет за 24 часа !

Злой Учитель от 0 до 100 лет за 24 часа !

Топ-экономист Липсиц. Как Трамп убьет Россию, страшное падение рубля, крах экономики, катастрофа ЖКХ

Топ-экономист Липсиц. Как Трамп убьет Россию, страшное падение рубля, крах экономики, катастрофа ЖКХ

Она очень сильная... #джарахов #mona #мона #подкаст

Она очень сильная... #джарахов #mona #мона #подкаст