Distributed and Decentralized Learning - Ce Zhang | Stanford MLSys #68

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Marvel Rivals | Winter Celebration, Joyful Jubilation

The Most DISRESPECTFUL Way To End a Game I've Seen

I Bought Gifts In ONE COLOR For My Sister!

FlashAttention - Tri Dao | Stanford MLSys #67

Stanford MLSys Seminars

Просмотров 31 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 27 дек 2024

Комментарии • 14

@anishbhanushali Год назад ⁺¹⁰
22:08 (basics of attention + memory hierarchy in GPU till here ) actual explainations starts
@TheAIEpiphany Год назад ⁺²
btw at 28:10 the animation got the order wrong compared to the paper's Algorithm 1, the inner loop should be going over queries not over values
@for-ever-22 9 месяцев назад ⁺²
These videos are amazing
@rfernand2 Год назад ⁺²
Great work and presentation. Where else could this be applied?
@shuminghu Год назад
Why does tiling reduce HBM to SRAM transfer? Or is it through pipelining that transfer time overlap more with compute?
@denizlarson8862 Год назад ⁺²
good research and nicely explained
@kawingchan Год назад ⁺¹
I am not familiar at all with CPU or GPU architecture, so i naturally wonder how much of this also applies to Apple GPU (MPS). It was mentioned this is already in pytorch, but i do doubt if it even get activated on MPS. I would love to know, maybe at high level, how it may (if possible) be ported to Apple GPU, which has this unified memory thing.
@xianbiaoqi7009 Год назад
Good idea and nice talk.
@brandomiranda6703 Год назад
ML for theorem proving would also benefit with longer sequences! Reference Lemma proved in 300 BC...
@brandomiranda6703 Год назад
11:09
@aamirmirza2806 Год назад
Really nice well explained.
@sskhdsk Год назад
simple and effective
@JazevoAudiosurf Год назад
well explained
@deepanshusingh2527 Год назад
This is utilised in inference as well? How fast compared to naive implementation?

Следующие

Автовоспроизведение

Distributed and Decentralized Learning - Ce Zhang | Stanford MLSys #68

Distributed and Decentralized Learning - Ce Zhang | Stanford MLSys #68

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Marvel Rivals | Winter Celebration, Joyful Jubilation

Marvel Rivals | Winter Celebration, Joyful Jubilation

The Most DISRESPECTFUL Way To End a Game I've Seen

The Most DISRESPECTFUL Way To End a Game I've Seen

I Bought Gifts In ONE COLOR For My Sister!

I Bought Gifts In ONE COLOR For My Sister!

Boston FBI announce arrest of two Iranians in connection with fatal drone strike

Boston FBI announce arrest of two Iranians in connection with fatal drone strike

MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

LLM inference optimization: Architecture, KV cache and Flash attention

LLM inference optimization: Architecture, KV cache and Flash attention

Flash Attention 2.0 with Tri Dao (author)! | Discord server talks

Flash Attention 2.0 with Tri Dao (author)! | Discord server talks

Tri Dao on FlashAttention and sparsity, quantization, and efficient inference

Tri Dao on FlashAttention and sparsity, quantization, and efficient inference

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Text2SQL: The Dream versus Reality - Laurel Orr | Stanford MLSys #89

Text2SQL: The Dream versus Reality - Laurel Orr | Stanford MLSys #89

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

The Dome Paradox: A Loophole in Newton's Laws

The Dome Paradox: A Loophole in Newton's Laws

Авиакатастрофа в Актау: что известно о погибших казахстанцах

Авиакатастрофа в Актау: что известно о погибших казахстанцах

СНЕЖИНКА (смешное видео, прикол, юмор, поржать, смех)

СНЕЖИНКА (смешное видео, прикол, юмор, поржать, смех)

НУБ И ПРО СПАСАЮТ ГОРОД ОТ ТЫКВА МОНСТРА В МАЙНКРАФТ ! НУБИК И ТРОЛЛИНГ ЛОВУШКА В MINECRAFT

НУБ И ПРО СПАСАЮТ ГОРОД ОТ ТЫКВА МОНСТРА В МАЙНКРАФТ ! НУБИК И ТРОЛЛИНГ ЛОВУШКА В MINECRAFT

Новогодний стол за 1000 рублей - это реально?

Новогодний стол за 1000 рублей - это реально?

Безответственное отношение к работе

Безответственное отношение к работе

三位功夫高手都打不贏的日本武士，卻被直接霍元甲一杯茶嚇#精武門#shorts#功夫#動作#kungfu#甄子丹#武俠

三位功夫高手都打不贏的日本武士，卻被直接霍元甲一杯茶嚇#精武門#shorts#功夫#動作#kungfu#甄子丹#武俠

Factory Assembly Line, Water Transfer #hydrographic #craftshorts #printing #DIY #shorts

Factory Assembly Line, Water Transfer #hydrographic #craftshorts #printing #DIY #shorts

Send this to an artist… 🧑🏻‍🎨✨🎨 #shortsart

Send this to an artist… 🧑🏻‍🎨✨🎨 #shortsart