Speculations on Test-Time Scaling (o1)

Do we need Attention? A Mamba Primer

TransformerFAM: Feedback attention is working memory

WEIRDEST FOOTBALL PRODUCTS THAT SHOULD BE ILLEGAL!

Rebuilding A Flooded $2,000,000 McLaren P1 | Part 15

Klass • Vale'w (feat. Rutshelle Guillaume, Official Audio)

MambaByte: Token-Free Language Modeling

Sasha Rush 🤗

Просмотров 7 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 14 ноя 2024

Комментарии • 12

@snippletrap 4 месяца назад
Have loved your work since Annotated Transformer, thank you for sharing. Very clear explanation
@BooleanDisorder 6 месяцев назад ⁺¹
I definitely think attention in some form will survive even into a refined future mamba model due to its powerful ability to capture high dimension representations.
@JunxiongWang-jxw 6 месяцев назад ⁺¹
Hi, for correction: The number of parameters for the transformer is 361M instead of 261M in this video, as shown in the paper.
@ChaseFreedomMusician 6 месяцев назад ⁺⁷
Great presentation!! Thank you!!
@DewEfresh 6 месяцев назад ⁺²
Nice work. I see the models on huggingface. Is there also a github or notebooks to train or run inference on them?
@tan-uz4oe 6 месяцев назад ⁺¹
Thank you for a great presentation Sasha@srush_nlp. You mentioned that MambaByte is still behind token-based models. I wonder what makes Mamba still theoretically inferior to token-based transformers or is it just a matter of discovering best practices and tricks?
@AM-yk5yd 6 месяцев назад ⁺⁴
Charformer(which is mentioned in paper, gradient based tokenizers sound like pog) tests itself in multilingual tasks. It is interesting how RNNs would behave there. RWKV can easily jump to wrong language(model card talks about how using space in the end of the prompt can "upset the tokenizer")
Also can we go lower? MambaBit when. Just imagine. Vocab size of 2. 😱
@donnychan1999 6 месяцев назад ⁺²
Although it's an interesting idea, I don't understand how that is beneficial. Humans neither understand nor produce outputs in bits. I think it's more reasonable that each token is a smallest semantic/graphical unit (sememe/grapheme), but bit does not hold semantic/graphical information though.
@AM-yk5yd 6 месяцев назад ⁺²
@@donnychan1999 Humans also don't produce output in bytes: There is no reason for 'кот' to take 2 times more "thought unit" comparing to 'cat'
Well, meanwhile I decided to be change in world I want to see and published Maykeye/MambaBit on HF after torturing my laptop for 10 hours.
"The cat can never" -> "The cat can never many be my father,
Or else and the good many be my father,
In the good many lord, and my father come."
This is so cursed. Yet it's much better than I've expected.
@vibingcat1 6 месяцев назад ⁺¹
Great work and presentation! Have you also compared MambaByte to the baselines on any downstream tasks/benchmarks?
@Khari99 6 месяцев назад
Great work!
@marcfruchtman9473 6 месяцев назад
Very impressive.

Следующие

Автовоспроизведение

Speculations on Test-Time Scaling (o1)

Speculations on Test-Time Scaling (o1)

Do we need Attention? A Mamba Primer

Do we need Attention? A Mamba Primer

TransformerFAM: Feedback attention is working memory

TransformerFAM: Feedback attention is working memory

WEIRDEST FOOTBALL PRODUCTS THAT SHOULD BE ILLEGAL!

WEIRDEST FOOTBALL PRODUCTS THAT SHOULD BE ILLEGAL!

Rebuilding A Flooded $2,000,000 McLaren P1 | Part 15

Rebuilding A Flooded $2,000,000 McLaren P1 | Part 15

Klass • Vale'w (feat. Rutshelle Guillaume, Official Audio)

Klass • Vale'w (feat. Rutshelle Guillaume, Official Audio)

Deadpool and Blind Al Have a Big Announcement

Deadpool and Blind Al Have a Big Announcement

How to write an okay research paper.

How to write an okay research paper.

[Webinar] LLMs for Evaluating LLMs

[Webinar] LLMs for Evaluating LLMs

Large Language Models in Five Formulas

Large Language Models in Five Formulas

Stanford CS25: V3 I Retrieval Augmented Language Models

Stanford CS25: V3 I Retrieval Augmented Language Models

Street Fighting Transformers

Street Fighting Transformers

Simple Diffusion Language Models

Simple Diffusion Language Models

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Long-Context LLM Extension

Long-Context LLM Extension

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

ЯПОНИЯ СНИМАЕТ САНКЦИИ НА АВТО, АВТОРЫНОК ВЛАДИВОСТОК СЕГОДНЯ, НОЯБРЬ 2024

ЯПОНИЯ СНИМАЕТ САНКЦИИ НА АВТО, АВТОРЫНОК ВЛАДИВОСТОК СЕГОДНЯ, НОЯБРЬ 2024

Пришла в себя в городской больнице... // Было дело. Советский след

Пришла в себя в городской больнице... // Было дело. Советский след

ИГРОК АМКАЛА ЗНАКОМ С МБАППЕ ЧЕРЕЗ РУКОПОЖАТИЕ🤯

ИГРОК АМКАЛА ЗНАКОМ С МБАППЕ ЧЕРЕЗ РУКОПОЖАТИЕ🤯

Academeg - о популярности блогеров, бизнесе, семье и детстве

Academeg — о популярности блогеров, бизнесе, семье и детстве

Incredibox Sprunki vs Inside Out 2 - Which team will win? #shorts #animation

Incredibox Sprunki vs Inside Out 2 - Which team will win? #shorts #animation

13 Карт - Клоны рассказывают страшилку | 7 серия

13 Карт — Клоны рассказывают страшилку | 7 серия

Incredibox Sprunki vs Shin Sonic Tapes - Which team will win? #sprunki #animation #trend

Incredibox Sprunki vs Shin Sonic Tapes - Which team will win? #sprunki #animation #trend

ЛУЧШИЕ РЕАКЦИИ | МАЛЫШИ-ВИРТУОЗЫ разносят МУЗЫКАЛЬНЫХ ПРОДЮСЕРОВ | ПРАНК

ЛУЧШИЕ РЕАКЦИИ | МАЛЫШИ-ВИРТУОЗЫ разносят МУЗЫКАЛЬНЫХ ПРОДЮСЕРОВ | ПРАНК