This AI Learned To Isolate Speech Signals

Eulerian Video Magnification

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set

Marvel Television’s Agatha All Along | Teaser Trailer | Disney+

Elden Ring DLC, But ANY Boss I Defeat Becomes My Summon...

Quickest American Car Ever: Demon 170 vs Lucid Sapphire VHT & Asphalt - Cammisa Ultimate Drag Race

Looking to Listen: Audio-Visual Speech Separation (SIGGRAPH 2018)

Miki Rubinstein

Просмотров 30 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 апр 2018
The video accompanying our paper: "Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation".
Наука

Комментарии • 26

@nolanjshettle 5 лет назад ⁺⁸
as we all know, the human brain is incredibly well suited for this sort of task. On the sports debate example where it shows combined, then each part seperate, I watch the combined part three times. Once without looking, and couldn't understand much of what either was saying. Then watched it while looking at the right and left to do my own audio isolation. I can focus on each and know what they're saying, however, THIS DOES A BETTER JOB. I can more clearly understand each of them while listening to it's isolation, than I can doing it myself. PLUS it frees me from focusing on their mouth. INCREDIBLE. Another superhuman neural network, I am repeatedly amazed by this.
@DakotaJones-nn2oi 8 месяцев назад
With all due respect: No. the fuck. it's not. This is like a fever dream where I'm able to see how sound happens in my head. It's insane. How do human beings function? It's a mess, the whole thing. No semblence of consistency or constance.
@DakotaJones-nn2oi 8 месяцев назад
I can barely even tell who's talking with all the motion.
@PaulDoesIt 5 лет назад ⁺¹⁵
This is crazy good. The implementation possibilities.. wow
@maxraider49 5 лет назад
Pavel Lelin is there an app for that
@PaulDoesIt 5 лет назад ⁺¹
Max Raider not yet
@freekbeta 5 лет назад ⁺¹⁹
Can you make it available in real life when I argue with my wife? I will pay gold :)
@stefchristensen47 3 года назад
Miki, this is amazing work. I really appreciate that you added the "Comparison with audio-only" section. That greatly helps us understand how much better audio-video is than audio only. Appreciate all your work, man. :)
@BryanSteacy 6 лет назад ⁺²
This is amazing. I really hope to see this in the hands of consumers at some point in the future. Or at the very least law enforcement. Could prove invaluable for anything from home videos all the way up to professional uses.
@iwozzy 5 лет назад ⁺⁴
Amazing! Where can we try it ourselves?
@billherreid9661 6 лет назад ⁺⁵
Even after cleanup, RUclips's auto generated captions renders 4:14 "I hope you do" (should be "OK Google")
I would have thought this was the one phrase it would get right!
@sirbughunter 5 лет назад
Well... Isn't this kind of proof that Google (at least nowadays) doesn't advertise themselves over what is belonging to the law and orders of countries? Doesn't this actually mean that Google tries to get rid of any bias in their algorithms?
@AtasiNarksri 6 лет назад ⁺¹
Great job guys! Keep going :)
@canlin2189 11 месяцев назад
Hopefully this technology will be open for free for personal use, TYVM!
@antoniotech2000 6 лет назад ⁺¹
well Done! Resolved various issues in audio!
@JDLeeArt 10 месяцев назад
I image this would be extremely beneficial to those CIA/NSA types. If you could monitor large crowds and focus in on individual conversations and combine this with some sort of real-time keyword monitoring...
@KrishnaDN 5 лет назад
Is that Bill Freeman @2:55 ?
@ONDANOTA 5 лет назад ⁺⁵
finally Italy will see a new age XD
@easycuttv 5 лет назад ⁺²
It will be used on RUclips?! When?
@subjectnamehere3023 5 лет назад
It probably uses a lot more processing power than the current method, I'd expect at least 3 months - 1 year for further refinement.
@sirbughunter 5 лет назад
@@subjectnamehere3023 Well... Google I/O 2019 is in May. So until then they have time to refine it and then introduce it in beta for the public :)
@NeWx89 6 лет назад ⁺²
Oh man, would be cool if this could be done with music. Separating both vocals and different instruments.
@fleecemaster 5 лет назад ⁺²
Already been done
@johneygd 5 лет назад
Phonicmind and izotope RX7 is the ansure for this.
@triplemmm3 2 года назад
Send paper link
@omerbenbaron6433 4 года назад
אשמח לקבל את המייל שלך ליצירת קשר

Следующие

Автовоспроизведение

This AI Learned To Isolate Speech Signals

This AI Learned To Isolate Speech Signals

Eulerian Video Magnification

Eulerian Video Magnification

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set

Marvel Television’s Agatha All Along | Teaser Trailer | Disney+

Marvel Television’s Agatha All Along | Teaser Trailer | Disney+

Elden Ring DLC, But ANY Boss I Defeat Becomes My Summon...

Elden Ring DLC, But ANY Boss I Defeat Becomes My Summon...

Quickest American Car Ever: Demon 170 vs Lucid Sapphire VHT & Asphalt - Cammisa Ultimate Drag Race

Quickest American Car Ever: Demon 170 vs Lucid Sapphire VHT & Asphalt — Cammisa Ultimate Drag Race

Jonny Rotten - I Miss My Friends....

Jonny Rotten - I Miss My Friends....

Looking to Listen: Stand-up captions

Looking to Listen: Stand-up captions

[SIGGRAPH 2018] Toward Wave-based Sound Synthesis for Computer Animation

[SIGGRAPH 2018] Toward Wave-based Sound Synthesis for Computer Animation

Research at NVIDIA: Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects

Research at NVIDIA: Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects

Looking to Listen: Stand-up

Looking to Listen: Stand-up

Looking to Listen: Sports debate

Looking to Listen: Sports debate

Revealing Invisible Changes In The World

Revealing Invisible Changes In The World

Improved Seam Carving for Video Retargeting

Improved Seam Carving for Video Retargeting

Multi-Agent Hide and Seek

Multi-Agent Hide and Seek

Apple добавила еще 30 функций в iOS 18! Обзор iOS 18 beta 3 и iPadOS 18 beta 3!

Apple добавила еще 30 функций в iOS 18! Обзор iOS 18 beta 3 и iPadOS 18 beta 3!

WATERPROOF RATED IP-69🌧️#oppo #oppof27pro#oppoindia

WATERPROOF RATED IP-69🌧️#oppo #oppof27pro#oppoindia

899$ vs 360$ which one will you choose ? #iphone #poco

899$ vs 360$ which one will you choose ? #iphone #poco

Зависает / Нет изображения | Видеорегистратор TrendVision MR-712GP (РЕМОНТ)

Зависает / Нет изображения | Видеорегистратор TrendVision MR-712GP (РЕМОНТ)

he followed the finger movements #shortvideo #iphonefold #smartphone

he followed the finger movements #shortvideo #iphonefold #smartphone

ИГРОВАЯ СБОРКА ПК ЗА 30К ОТ А ДО Я

ИГРОВАЯ СБОРКА ПК ЗА 30К ОТ А ДО Я

Первый обзор Galaxy Watch Ultra | Galaxy Watch 7

Первый обзор Galaxy Watch Ultra | Galaxy Watch 7

Наш ПК с OZON умер у клиента за 3 часа! 🤯

Наш ПК с OZON умер у клиента за 3 часа! 🤯