Speculative Decoding: When Two LLMs are Faster than One

The KV Cache: Memory Usage in Transformers

C5W3L03 Beam Search

Slipknot’s Corey Taylor vs. Shawn “Clown” Crahan | Hot Ones Versus

The Judge Dismissed Valve's Defence, Now Steam Is Different.

Micah Richards has Bukayo Saka in STITCHES after Arsenal win! | UCL Today | CBS Sports Golazo

How is Beam Search Really Implemented?

Efficient NLP

Просмотров 12 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 окт 2024

Комментарии • 13

@ameynaik2743 Год назад ⁺¹
ruclips.net/video/tOhWpF5-_z4/видео.html - here beam len is 2.
ruclips.net/video/tOhWpF5-_z4/видео.html - here beam len is 3.
ruclips.net/video/tOhWpF5-_z4/видео.html - here beam len is 6?
Why do we take top 6 (num_beams * 2) as mentioned here ruclips.net/video/tOhWpF5-_z4/видео.html ?
Also ruclips.net/video/tOhWpF5-_z4/видео.html with boy as input, 'and' and 'who' had highest prob (you chose top 2)
but with 'dog' as input only 'who' i.e. top 1 was chosen?
are you picking top 3 across outputs with inputs 'boy' 'dog' and 'woman'?
@EfficientNLP Год назад ⁺²
In the code example, the beam size is 3, but the batch size is 2. That's why it appears we have 6 sequences at a time, and this illustrates how beam search is combined with batching.
About your question about taking the top 3: We are taking the top 3 beams overall, and they may correspond to any beams from the previous iteration (it's not necessarily a 1-to-1 correspondence). So we might use 2 candidates from the beam ending with "boy", 1 from the beam ending in "dog", and 0 from the beam ending in "woman".
Hope this clarifies things!
@amritpandey6964 Год назад ⁺¹
nicely explained!
@feixyzliu5432 6 месяцев назад
seems no kv cache is used in the implementation. How to make beam search compatible with kv cache and make it more efficient?
@EfficientNLP 6 месяцев назад
I didn't mention it in this video, but the KV cache is supported in the Hugging Face implementation (and by default is turned on) -- it is the use_cache parameter.
@feixyzliu5432 6 месяцев назад
I just read the Hugging Face transformers implementation. Sure, it does support kv cache, however beam search in transformers is implemented by simply expanding batch size. I'm sure this is not that efficient, especially for memory, since nothing is reused here, even the kv cache for the prompts in prefilling phase is not reused. Do you know any implementation that is more mature or optimized? Thanks a lot! @@EfficientNLP
@arjunkoneru5461 6 месяцев назад
You can pass your custom past_key_values by doing a forward pass once and load it in generate @@feixyzliu5432
@kushagrabhushan Год назад
hey, great video! I just wanted to ask what you are using as a debugger to get the intermediate values of the variables? looks very interesting...
@EfficientNLP Год назад
I used PyCharm for this video, but most modern IDEs should have a similar feature.
@kushagrabhushan Год назад
@@EfficientNLP thank you so much!
@kevon217 Год назад
Very well explained!
@piotr780 6 месяцев назад
what IDE is this ?
@EfficientNLP 6 месяцев назад
This is PyCharm, but VS Code has similar debugging functionality.

Следующие

Автовоспроизведение

Speculative Decoding: When Two LLMs are Faster than One

Speculative Decoding: When Two LLMs are Faster than One

The KV Cache: Memory Usage in Transformers

The KV Cache: Memory Usage in Transformers

C5W3L03 Beam Search

C5W3L03 Beam Search

Slipknot’s Corey Taylor vs. Shawn “Clown” Crahan | Hot Ones Versus

Slipknot’s Corey Taylor vs. Shawn “Clown” Crahan | Hot Ones Versus

The Judge Dismissed Valve's Defence, Now Steam Is Different.

The Judge Dismissed Valve's Defence, Now Steam Is Different.

Micah Richards has Bukayo Saka in STITCHES after Arsenal win! | UCL Today | CBS Sports Golazo

Micah Richards has Bukayo Saka in STITCHES after Arsenal win! | UCL Today | CBS Sports Golazo

Is This The FINAL Episode of The Simpsons? - Season 36

Is This The FINAL Episode of The Simpsons? - Season 36

11 Beam Search Decoding

11 Beam Search Decoding

How positional encoding works in transformers?

How positional encoding works in transformers?

4. Search: Depth-First, Hill Climbing, Beam

4. Search: Depth-First, Hill Climbing, Beam

ML Was Hard Until I Learned These 5 Secrets!

ML Was Hard Until I Learned These 5 Secrets!

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

The Midpoint Circle Algorithm Explained Step by Step

The Midpoint Circle Algorithm Explained Step by Step

The Sad Reality of Being a Data Scientist

The Sad Reality of Being a Data Scientist

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

ИССЛЕДУЮ САМЫЕ ЗАБЫТЫЕ БАТЛРОЯЛИ

ИССЛЕДУЮ САМЫЕ ЗАБЫТЫЕ БАТЛРОЯЛИ

🔴СРОЧНО ИРАН АТАКОВАЛ ИЗРАИЛЬ | 400 БАЛЛИСТИЧЕСКИХ РАКЕТ! #новости #иран #израиль #атака

🔴СРОЧНО ИРАН АТАКОВАЛ ИЗРАИЛЬ | 400 БАЛЛИСТИЧЕСКИХ РАКЕТ! #новости #иран #израиль #атака

Эминем предупреждал о Дидди #эминем #eminem #diddy #дидди #разоблачение

Эминем предупреждал о Дидди #эминем #eminem #diddy #дидди #разоблачение

HA-HA-HA-HA 👫 #countryhumans

HA-HA-HA-HA 👫 #countryhumans

Сколько стоит ПП?

Сколько стоит ПП?

Как снимали мой клип POLI - Котик

Как снимали мой клип POLI - Котик

СНЯЛ ВСЕ ОГРАНИЧЕНИЯ В ДОТЕ И СЛОМАЛ ИГРУ😰

СНЯЛ ВСЕ ОГРАНИЧЕНИЯ В ДОТЕ И СЛОМАЛ ИГРУ😰

открыли бесплатный магазин техники - все по 0 рублей ч4 - удивили мужика.

открыли бесплатный магазин техники - все по 0 рублей ч4 - удивили мужика.