Let's build the GPT Tokenizer

1 5 Byte Pair Encoding

Lesson 3: Understanding Word Embeddings in AI and LLMs

Gas Fruit Is The MOST OVERPOWERED Fruit.. (Blox Fruits)

HEAR ME OUT CAKE WITH MY BROTHERS

Blox Fruits ALL Changes in Dragon Rework Update

Lesson 2: Byte Pair Encoding in AI Explained with a Spreadsheet

Spreadsheets are all you need

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 фев 2025
In this tutorial, we delve into the concept of Byte Pair Encoding (BPE) used in AI language processing, employing a practical and accessible tool: the spreadsheet.
This video is part of our series that aims to simplify complex AI concepts using spreadsheets. If you can read a spreadsheet, you can understand the inner workings of modern artificial intelligence.
🧠 Who Should Watch:
Individuals interested in AI and natural language processing.
Students and educators in computer science.
Anyone seeking to understand how AI processes language.
🤖 What You'll Learn:
Tokenization Basics: An introduction to how tokenization works in language models like Chat GPT.
Byte Pair Encoding (BPE): Detailed walkthrough of the BPE algorithm, including its learning phase and application in language data tokenization.
Spreadsheet Simulation: A hands-on demonstration of the GPT-2's tokenization process via a spreadsheet model.
Limitations and Alternatives: Discussion on the challenges of BPE and a look at other tokenization methods.
🔗 Resources:
Learn more and download the Excel sheet at spreadsheets-a...

Комментарии • 19

@rshaikh05 Месяц назад
wow! what an explanation. it cannot get better than this. thank you so much.
@lewisbentley3177 11 месяцев назад ⁺³
This is amazing! Thank you for making this. Really looking forward to your next video about text and position embeddings
@michaelmalone7614 11 месяцев назад ⁺⁴
The quality of this video has blown my mind! Really great stuff and looking forward to your next video. Thank you!
@somdubey5436 6 месяцев назад
seeing this video made me feel I am not even close to say I know excel. Your understanding of concept is really really deep as implementing something like GPT-2 in excel requires one to have thorough understanding of all the concepts. Hats off to you.
@paulwillis88 10 месяцев назад ⁺¹
This is an incredible video. The way you describe these advanced AI concepts is awesome. I'd love to see more, maybe on the GAN technology that Sora uses
@BryanSeigneur0 10 месяцев назад
So clear! Great instruction!
@fastler2000 10 месяцев назад
Incredibly brilliant. Words fail me. Thank you for sharing this, it helps me enormously in understanding AI.
On a side note: How ingenious can you handle Excel? Unbelievable!
@jjokela 10 месяцев назад
Really cool stuff, it really helps me to understand how the BPE works. Looking forward to your follow-up videos!
@rpraver1 Год назад ⁺⁴
Very good explanation, are you going to go into positional embedding?
@Spreadsheetsareallyouneed 11 месяцев назад ⁺⁷
thank you! yes just haven't had time to get around to it yet. Embeddings will be the next video. Not sure if I'll do token and positional embeddings in the same video or will break it up into two parts.
@MStrong95 10 месяцев назад ⁺¹
Large language models and AI in general seems to do a good job of compressing and then turning back into an approximation of the input. Is this a byproduct of nural networks in general or just specific subsets? Could you make a large language model or a lot of purpose built AI that are good for various different compression situations and more often than not perform better than current compression algorithms?
@Cal1fax 11 месяцев назад ⁺¹
I learned a ton from your video
@defface777 10 месяцев назад ⁺¹
Very cool! Thanks
@ricp 11 месяцев назад ⁺¹
great explanations
@lordadamson 10 месяцев назад
amazing work. please keep going :D
@kennrich213 10 месяцев назад
Great video. How exactly are the scores calculated from the possible pairs? You said a standard filter? Could you explain more?
@JohnSmith-he5xg 10 месяцев назад
Why does having a large Embedding Table matter? Can't it just be treated as a lookup into the Table (which should be pretty manageable regardless of size)? Do we really have to perform the actual matrix multiply ?
@magickpalms4025 11 месяцев назад ⁺¹
Why did they include reddit usernames in the training? Considering the early days where people would have extreme/offensive names as a meme, that is just asking for trouble.
@ameliexang7543 Год назад
promo sm

Следующие

Автовоспроизведение

Let's build the GPT Tokenizer

Let's build the GPT Tokenizer

1 5 Byte Pair Encoding

1 5 Byte Pair Encoding

Lesson 3: Understanding Word Embeddings in AI and LLMs

Lesson 3: Understanding Word Embeddings in AI and LLMs

Gas Fruit Is The MOST OVERPOWERED Fruit.. (Blox Fruits)

Gas Fruit Is The MOST OVERPOWERED Fruit.. (Blox Fruits)

HEAR ME OUT CAKE WITH MY BROTHERS

HEAR ME OUT CAKE WITH MY BROTHERS

Blox Fruits ALL Changes in Dragon Rework Update

Blox Fruits ALL Changes in Dragon Rework Update

The WORST dog matting I have ever seen in my 13 years as a pet groomer | EXTREME transformation

The WORST dog matting I have ever seen in my 13 years as a pet groomer | EXTREME transformation

Subword Tokenization: Byte Pair Encoding

Subword Tokenization: Byte Pair Encoding

Sorting in Excel

Sorting in Excel

What if all the world's biggest problems have the same solution?

What if all the world's biggest problems have the same solution?

Lecture 8: The GPT Tokenizer: Byte Pair Encoding

Lecture 8: The GPT Tokenizer: Byte Pair Encoding

What are Transformer Models and how do they work?

What are Transformer Models and how do they work?

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

A Hackers' Guide to Language Models

A Hackers' Guide to Language Models

КАК РАБОТАЕТ СЖАТИЕ?

КАК РАБОТАЕТ СЖАТИЕ?

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

ОКАЗИЯ. НАШЛИ СПУСТЯ 7 ЛЕТ. СУДЬБА ПРОЕКТА.

ОКАЗИЯ. НАШЛИ СПУСТЯ 7 ЛЕТ. СУДЬБА ПРОЕКТА.

MOST PEOPLE IN THE CITIES 🏰 #countryhumans

MOST PEOPLE IN THE CITIES 🏰 #countryhumans

Secret to sawing daughter in half

Secret to sawing daughter in half

How Well Would You Do? 👀

How Well Would You Do? 👀

Детородная ЦЕНТРИФУГА?!!! #шортс #shorts

Детородная ЦЕНТРИФУГА?!!! #шортс #shorts

Tilt 'n' Shout 🦸🏻‍♂️ #boardgames #настольныеигры #games #игры #настолки #настольные_игры

Tilt 'n' Shout 🦸🏻‍♂️ #boardgames #настольныеигры #games #игры #настолки #настольные_игры

Провальная Акция Кока-Колы

Провальная Акция Кока-Колы

ПОВТОРИ ВАК МОМЕНТ - ПОЛУЧИ ГОЛДУ ft. Apollon🗿(STANDOFF 2)

ПОВТОРИ ВАК МОМЕНТ - ПОЛУЧИ ГОЛДУ ft. Apollon🗿(STANDOFF 2)