Unlocking Enterprise AI at Scale, Saurabh Baji, Cohere, SVP of Engineering

vLLM: Easy, Fast, and Cheap LLM Serving, Woosuk Kwon, UC Berkeley

Fast LLM Serving with vLLM and PagedAttention

These Are The Worst Job Interviews Ever

Nardwuar vs. Chappell Roan

Making Cookies For Santa

Efficient LLM Inference with SGLang, Lianmin Zheng, xAI

AMD Developer Central

Просмотров 403

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 дек 2024
In this Advancing AI 2024 Luminary Developer Keynote, Dr. Lianmin Zheng introduces SGLang, a high-performance serving framework optimized for inference with LLMs and vision-language models.
SGLang’s core techniques include RadixAttention for improved KV cache reuse and jump-forward decoding for faster grammar-guided decoding. Additional optimizations, such as low-overhead CPU scheduling and torch native enhancements (e.g., torch.compile and torchao), further enhance efficiency. Benchmark results demonstrate that SGLang achieves superior performance compared to other state-of-the-art inference engines.
As an open-source project with broad adoption, SGLang is also deployed for production serving at xAI.
Speaker: Lianmin Zheng, xAI
Gain access to AMD developer tools and resources.
www.amd.com/en...
The information contained in this video represents the view of AMD or the third-party presenter as of the date presented. AMD and/or the third-party presenters have no obligation to update any forward-looking content in the above presentations. AMD is not responsible for the content of any third-party presentations and does not necessarily endorse the comments made therein. GD-84.
© 2024 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Комментарии •

Следующие

Автовоспроизведение

Unlocking Enterprise AI at Scale, Saurabh Baji, Cohere, SVP of Engineering

Unlocking Enterprise AI at Scale, Saurabh Baji, Cohere, SVP of Engineering

vLLM: Easy, Fast, and Cheap LLM Serving, Woosuk Kwon, UC Berkeley

vLLM: Easy, Fast, and Cheap LLM Serving, Woosuk Kwon, UC Berkeley

Fast LLM Serving with vLLM and PagedAttention

Fast LLM Serving with vLLM and PagedAttention

These Are The Worst Job Interviews Ever

These Are The Worst Job Interviews Ever

Nardwuar vs. Chappell Roan

Nardwuar vs. Chappell Roan

Making Cookies For Santa

Making Cookies For Santa

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

Lecture 35: SGLang

Lecture 35: SGLang

[DT] DX, Digital Transformation

[DT] DX, Digital Transformation

Meet Willow, our state-of-the-art quantum chip

Meet Willow, our state-of-the-art quantum chip

Efficient Inference on MI300X: Our Journey at Microsoft, Rajat Monga, Microsoft, CVP AI Frameworks

Efficient Inference on MI300X: Our Journey at Microsoft, Rajat Monga, Microsoft, CVP AI Frameworks

GDC 2024 - Post-mortem GPU crash analysis with AMD Radeon™ GPU Detective (RGD)

GDC 2024 - Post-mortem GPU crash analysis with AMD Radeon™ GPU Detective (RGD)

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Albert Ying @ 2024 Biomarkers of Aging Conference

Albert Ying @ 2024 Biomarkers of Aging Conference

The Evolution of PyTorch - Interview with Joe Spisak of Meta

The Evolution of PyTorch – Interview with Joe Spisak of Meta

САМЫЙ ДОРОГОЙ НОВОГОДНИЙ СТОЛ ЗА 1200 $ / ОГРОМНЫЙ ОСЬМИНОГ , ИКРА МОРСКОГО ЕЖА , КРАБЫ , ЛАНГУСТЫ

САМЫЙ ДОРОГОЙ НОВОГОДНИЙ СТОЛ ЗА 1200 $ / ОГРОМНЫЙ ОСЬМИНОГ , ИКРА МОРСКОГО ЕЖА , КРАБЫ , ЛАНГУСТЫ

iShowSpeed Reacts To iShowSponge

iShowSpeed Reacts To iShowSponge

Как Форчан разоблачил вербовщиков с помощью ШУТОК ПРО МАТЬ 🍀

Как Форчан разоблачил вербовщиков с помощью ШУТОК ПРО МАТЬ 🍀

Я ЗАСТЫЛА когда увидела кого мы везем в Тайган из подтопленного зоопарка Гениченска!

Я ЗАСТЫЛА когда увидела кого мы везем в Тайган из подтопленного зоопарка Гениченска!

Редакция. News: 148-я неделя

Редакция. News: 148-я неделя

ПОПУСКАЮ геймеров по ПОВЕСТКЕ в играх

ПОПУСКАЮ геймеров по ПОВЕСТКЕ в играх

1,500,000р в ПРИОРУ с Юрой Волковым - ТАЧКА на ПРОКАЧКУ

1,500,000р в ПРИОРУ с Юрой Волковым - ТАЧКА на ПРОКАЧКУ