Shravya Kunamalla - Using application informed pacing to be a friendly internet neighbor

Mike English - Media over QUIC contribution with Rust

dav1d: 1 year later dav1d is a fast AV1 decoder

4 | Number Lore

LONGLEGS | The End Trailer | In Theaters July 12

Zac Efron Rewatches High School Musical, Neighbors, The Greatest Showman & More | Vanity Fair

Ronald S. Bultje - Low-level wizardry in dav1d

Demuxed

Просмотров 619

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 фев 2024
In a world of datacenter virtualization and high-level languages, today we take a peek behind the curtain and watch how the Wizard of Oz does his magic: let’s dive into the world of low level optimizations in dav1d - Videonlan’s software AV1 decoder.
First, we will dig into dav1d's AVX512 optimizations for Intel’s most recent CPUs: IceLake and TigerLake. Historically, video encoders and decoders have had issues optimizing for Intel's AVX512 instruction set. The wider vector length should in theory improve performance, but prior to IceLake, the associated clockspeed frequency penalties and lack of interesting new instructions meant that most applications saw little gains. IceLake brought about significant reductions in clockspeed penalties, and new video codecs (like AV1) utilize bigger block sizes: ideal conditions to leverage AVX512’s wider vectors. More importantly, by MacGyvering the new cross-lane shuffle, multiply-accumulate and cryptographic instructions supported in IceLake’s AVX512 subset, we've reached up to 3x speedups in our new AVX512 functions compared to their AVX2 counterparts. Overall, we see a 10% speedup in a fully optimized AVX512 decoder vs. using AVX2 instructions on the same machine.
Second, we describe our redesigned threading model. Current (tile- or frame-) threading models need many resources (threads, memory) to saturate a limited number of cores, scale poorly across different-throughput cores (like big.LITTLE) and depend on bitstream features which negatively affect compression. Our new threading model scales regardless of specific features present in the bitstream, requires limited system resources and is ideal for big.LITTLE core combinations that are popular in today’s latest mobile devices. With this design, dav1d can keep all your cores busier than a Barista during a Demuxed break, on ARM as well as x86. 720p real-time AV1 software playback on the large majority of Android devices out in the wild is now a reality.
This talk was presented at Demuxed '23, a conference for video nerds in San Francisco featuring amazing talks like this one.
Наука

Комментарии • 2

@TheoneandonlyRAH 2 месяца назад
congrats!!
@BobHannent 4 месяца назад ⁺⁵
Here's a question: Is the dependence on Assembly an indication that compilers are insufficient or that higher level languages aren't capable of describing things in ways that they can be compiled efficiently? If it's the compiler, what would it take to make a higher level compiler to approach the efficiency of assembly?

Следующие

Автовоспроизведение

Shravya Kunamalla - Using application informed pacing to be a friendly internet neighbor

Shravya Kunamalla - Using application informed pacing to be a friendly internet neighbor

Mike English - Media over QUIC contribution with Rust

Mike English - Media over QUIC contribution with Rust

dav1d: 1 year later dav1d is a fast AV1 decoder

dav1d: 1 year later dav1d is a fast AV1 decoder

4 | Number Lore

4 | Number Lore

LONGLEGS | The End Trailer | In Theaters July 12

LONGLEGS | The End Trailer | In Theaters July 12

Zac Efron Rewatches High School Musical, Neighbors, The Greatest Showman & More | Vanity Fair

Zac Efron Rewatches High School Musical, Neighbors, The Greatest Showman & More | Vanity Fair

FULL REACTION: UNITED STATES OUT OF COPA AMERICA 🚨 'UNDERACHIEVERS!' - Herculez Gomez | ESPN FC

FULL REACTION: UNITED STATES OUT OF COPA AMERICA 🚨 'UNDERACHIEVERS!' - Herculez Gomez | ESPN FC

Adam Wieckowski - Why is your encoder so slow? The curse of next-gen standards

Adam Wieckowski - Why is your encoder so slow? The curse of next-gen standards

Fixing RAG with GraphRAG

Fixing RAG with GraphRAG

AV1 is disappointing.

AV1 is disappointing.

Into the Depths: The Technical Details behind AV1 by Nathan Egge

Into the Depths: The Technical Details behind AV1 by Nathan Egge

FINALLY, I Understand the Difference Between the Stack and the Heap

FINALLY, I Understand the Difference Between the Stack and the Heap

Jeremy Doig - Open and Free: How Web philosophy challenged legacy media mindsets - and won

Jeremy Doig - Open and Free: How Web philosophy challenged legacy media mindsets - and won

Pieter-Jan Speelmans - Stories from the trenches: debugging video issues from a client’s perspective

Pieter-Jan Speelmans - Stories from the trenches: debugging video issues from a client’s perspective

Andrey Norkin - Notes on AV1 productization and royalty-free video codecs outlook

Andrey Norkin - Notes on AV1 productization and royalty-free video codecs outlook

Guillem Cabrera - Implementing HLS/DASH Content Steering at Scale

Guillem Cabrera - Implementing HLS/DASH Content Steering at Scale

Комп работает как часы#юмор #коментарі

Комп работает как часы#юмор #коментарі

НЕОБЫЧНЫЕ ЧАСЫ - ВМЕСТО ЦИФР….

НЕОБЫЧНЫЕ ЧАСЫ – ВМЕСТО ЦИФР….

iPhone перегрелся, что делать?!

iPhone перегрелся, что делать?!

Blackview N6000SE Краш Тест!

Blackview N6000SE Краш Тест!

Разбираем ноутбук на ARM! Троттлинг, ограничения и глюки Snapdragon X Elite

Разбираем ноутбук на ARM! Троттлинг, ограничения и глюки Snapdragon X Elite

Срочно мошенники блокируют iCloud Apple ID #icloud как удалить Apple ID блокировка активации

Срочно мошенники блокируют iCloud Apple ID #icloud как удалить Apple ID блокировка активации

ПОЧЕМУ ГЕЙМЕРЫ ТАК НЕ ЛЮБЯТ ИЗОГНУТЫЕ МОНИТОРЫ?

ПОЧЕМУ ГЕЙМЕРЫ ТАК НЕ ЛЮБЯТ ИЗОГНУТЫЕ МОНИТОРЫ?

Лучшая защита iPhone - Apple Silicone/Leather Case и защитное стекло!

Лучшая защита iPhone - Apple Silicone/Leather Case и защитное стекло!