New AI Discovery: Phase Transition in Learning (no fine-tuning)

What if all the world's biggest problems have the same solution?

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

Seungmin "그렇게, 천천히, 우리(As we are)" | [Stray Kids : SKZ-PLAYER]

Rio Da Yung OG - RIO FREE (Official Video)

The Greatest Comeback Of All Time?

Byte Latent Transformer - BLT explained (Entropy of Next Byte, META)

Discover AI

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 фев 2025
In-depth explanation of the new Byte Latent Transformer architecture, for token-free transformers. Without a tokenizer new methods at the local attention level have to define the byte patching functions via an entropy based prediction for next byte. Explanation of the inner workings of the local Encoder, including its causal local attentions and the cross-attention mechanisms for byte pooling for latent patches.
All rights w/ authors:
"Byte Latent Transformer: Patches Scale Better Than Tokens"
Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller, Margaret Li, Chunting Zhou, Lili Yu, Jason Weston, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Ari Holtzman, Srinivasan Iyer
FAIR at Meta, Paul G. Allen School of Computer Science & Engineering, University of Washington, University of Chicago
#transformer
#airesearch
#meta
#tokenization
#languagemodel
Наука

Комментарии • 15

@code4AI Месяц назад ⁺⁶
Please note, with the automatic dubbing from RUclips /Google you hear a synthetic voice in your regional language. To hear my original voice in English, switch to "Default" or "English" in the settings. Thank you.
@mrpocock Месяц назад ⁺¹⁴
Byte-level LLMs are obviously the way forward for that first round of training where you're predicting 1..n tokens given the prefix, particularly for multi-language models. Tokenization is clearly a hack, like in the dark ages of image neural networks, where we would hand-craft feature detection kernels.
@ProgrammingWIthRiley Месяц назад ⁺¹
Brother, you are amazing.
Thank you for doing this.
@williamervin3272 Месяц назад
I would love to see a follow up paper that explores adding another layer to create patches of patches. Then maybe the "Large Concept Model" idea can finally be realized with good performance. Fun to think about!
@wwkk4964 Месяц назад ⁺¹
Thank you so much for covering this paper! I had been thinking about this specific implementation for a year and i believe its a significant step towards having truly general learning architecture that is minimizing hand crafted human priors.
@TalsBadKidney Месяц назад ⁺²
very very cool
@themax2go Месяц назад ⁺²
i'm having a plantbased BLT right now
@thanhhuynh1139 Месяц назад
I think the entropy formula should be p_x*log(1/p_x) = - p_x*log(p_x).
Where did the ‘-’ go?
@King_Deundel Месяц назад
BLT seems the way to go in an ideal world, but there are definetly problems with it, I think tokenizers have accomplished tremendous work and we are on this state thanks to improving the vocab size and the tokenizations mechanisms, but from this point we may have the technology and resources to try to perform BLT on a model ( I still don't think it would work that much better)
@augmentos Месяц назад
Can you expand on ‘definitely problems’ with it
@davidwynter6856 Месяц назад ⁺²
Can you clarify that the pre training will have to use the BLT embeddings. I.e. unless models pre trained using BLT start appearing on huggingface or elsewhere we mere mortals will not be able to take advantage of this new method?
@pabloescobar2738 Месяц назад
Amen
@Swooshii-u4e Месяц назад
What do you mean? I can't seem to make sense of your comment
@JeomonGeorge Месяц назад
Does the small transformer have bpe then in the H(xi) is it finding the cross entropy. 26:13
@ivangoncharuk607 Месяц назад ⁺¹
Bacon Lettuce Tomato

Следующие

Автовоспроизведение

New AI Discovery: Phase Transition in Learning (no fine-tuning)

New AI Discovery: Phase Transition in Learning (no fine-tuning)

What if all the world's biggest problems have the same solution?

What if all the world's biggest problems have the same solution?

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

Seungmin "그렇게, 천천히, 우리(As we are)" | [Stray Kids : SKZ-PLAYER]

Seungmin "그렇게, 천천히, 우리(As we are)" | [Stray Kids : SKZ-PLAYER]

Rio Da Yung OG - RIO FREE (Official Video)

Rio Da Yung OG - RIO FREE (Official Video)

The Greatest Comeback Of All Time?

The Greatest Comeback Of All Time?

These Are The Worst Job Interviews Ever

These Are The Worst Job Interviews Ever

Beyond RAG: New Continual Learning of LLM w/ InCA

Beyond RAG: New Continual Learning of LLM w/ InCA

LLMs at Their Breaking Point (incl o1, R1)

LLMs at Their Breaking Point (incl o1, R1)

NEW Transformer for RAG: ModernBERT

NEW Transformer for RAG: ModernBERT

2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024]

2024 in Post-Transformer Architectures: State Space Models, RWKV [Latent Space LIVE! @ NeurIPS 2024]

NEW "Autonomous CoT": Beyond o1 for Next-Level AI

NEW "Autonomous CoT": Beyond o1 for Next-Level AI

The Dark Matter of AI [Mechanistic Interpretability]

The Dark Matter of AI [Mechanistic Interpretability]

Visually explaining Byte Latent Transformers - LLMs just got a massive breakthrough!

Visually explaining Byte Latent Transformers - LLMs just got a massive breakthrough!

Crafting Qubits: Harnessing Quantum Mechanics for Computation

Crafting Qubits: Harnessing Quantum Mechanics for Computation

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

СРОЧНО ШОШИЛИНГ ЖУДА КАТА акция iPhone 16 Pro Max Samsung A15 ОЛДИК АРЗОН БУЛДИ ШОШИЛИНГ

СРОЧНО ШОШИЛИНГ ЖУДА КАТА акция iPhone 16 Pro Max Samsung A15 ОЛДИК АРЗОН БУЛДИ ШОШИЛИНГ

"ЛАЗЕРНЫЕ" LED и другие светодиодные "новинки"!

"ЛАЗЕРНЫЕ" LED и другие светодиодные "новинки"!

Топ-5 iPhone #apple #iphone #top5

Топ-5 iPhone #apple #iphone #top5

ПЛОХО ловит сети ПОСЛЕ УДАРА / Смартфон Honor View 20 (РЕМОНТ)

ПЛОХО ловит сети ПОСЛЕ УДАРА / Смартфон Honor View 20 (РЕМОНТ)

КУПИЛ НЕ РАБОЧИЙ игровой ноут ЗА 9К и смог его оживить! Топ за 9к. Ремонт Acer Nitro 5 an515-58

КУПИЛ НЕ РАБОЧИЙ игровой ноут ЗА 9К и смог его оживить! Топ за 9к. Ремонт Acer Nitro 5 an515-58

КУПИЛ НЕ РАБОЧИЙ игровой ноут ЗА 9К и смог его оживить! Топ за 9к. Ремонт Acer Nitro 5 an515-58

КУПИЛ НЕ РАБОЧИЙ игровой ноут ЗА 9К и смог его оживить! Топ за 9к. Ремонт Acer Nitro 5 an515-58

Проблема 2038 года. Старые ПК реально ПЕРЕСТАНУТ РАБОТАТЬ?

Проблема 2038 года. Старые ПК реально ПЕРЕСТАНУТ РАБОТАТЬ?

Воздушные СО и СЖО, Noctua D15 G2 VS Assassin IV VC Vision VS СЖО 360.N D

Воздушные СО и СЖО, Noctua D15 G2 VS Assassin IV VC Vision VS СЖО 360.N D