as we all know, the human brain is incredibly well suited for this sort of task. On the sports debate example where it shows combined, then each part seperate, I watch the combined part three times. Once without looking, and couldn't understand much of what either was saying. Then watched it while looking at the right and left to do my own audio isolation. I can focus on each and know what they're saying, however, THIS DOES A BETTER JOB. I can more clearly understand each of them while listening to it's isolation, than I can doing it myself. PLUS it frees me from focusing on their mouth. INCREDIBLE. Another superhuman neural network, I am repeatedly amazed by this.
With all due respect: No. the fuck. it's not. This is like a fever dream where I'm able to see how sound happens in my head. It's insane. How do human beings function? It's a mess, the whole thing. No semblence of consistency or constance.
Miki, this is amazing work. I really appreciate that you added the "Comparison with audio-only" section. That greatly helps us understand how much better audio-video is than audio only. Appreciate all your work, man. :)
This is amazing. I really hope to see this in the hands of consumers at some point in the future. Or at the very least law enforcement. Could prove invaluable for anything from home videos all the way up to professional uses.
Even after cleanup, RUclips's auto generated captions renders 4:14 "I hope you do" (should be "OK Google") I would have thought this was the one phrase it would get right!
Well... Isn't this kind of proof that Google (at least nowadays) doesn't advertise themselves over what is belonging to the law and orders of countries? Doesn't this actually mean that Google tries to get rid of any bias in their algorithms?
I image this would be extremely beneficial to those CIA/NSA types. If you could monitor large crowds and focus in on individual conversations and combine this with some sort of real-time keyword monitoring...
as we all know, the human brain is incredibly well suited for this sort of task. On the sports debate example where it shows combined, then each part seperate, I watch the combined part three times. Once without looking, and couldn't understand much of what either was saying. Then watched it while looking at the right and left to do my own audio isolation. I can focus on each and know what they're saying, however, THIS DOES A BETTER JOB. I can more clearly understand each of them while listening to it's isolation, than I can doing it myself. PLUS it frees me from focusing on their mouth. INCREDIBLE. Another superhuman neural network, I am repeatedly amazed by this.
With all due respect: No. the fuck. it's not. This is like a fever dream where I'm able to see how sound happens in my head. It's insane. How do human beings function? It's a mess, the whole thing. No semblence of consistency or constance.
I can barely even tell who's talking with all the motion.
This is crazy good. The implementation possibilities.. wow
Pavel Lelin is there an app for that
Max Raider not yet
Can you make it available in real life when I argue with my wife? I will pay gold :)
Miki, this is amazing work. I really appreciate that you added the "Comparison with audio-only" section. That greatly helps us understand how much better audio-video is than audio only. Appreciate all your work, man. :)
This is amazing. I really hope to see this in the hands of consumers at some point in the future. Or at the very least law enforcement. Could prove invaluable for anything from home videos all the way up to professional uses.
Amazing! Where can we try it ourselves?
Even after cleanup, RUclips's auto generated captions renders 4:14 "I hope you do" (should be "OK Google")
I would have thought this was the one phrase it would get right!
Well... Isn't this kind of proof that Google (at least nowadays) doesn't advertise themselves over what is belonging to the law and orders of countries? Doesn't this actually mean that Google tries to get rid of any bias in their algorithms?
Great job guys! Keep going :)
Hopefully this technology will be open for free for personal use, TYVM!
well Done! Resolved various issues in audio!
I image this would be extremely beneficial to those CIA/NSA types. If you could monitor large crowds and focus in on individual conversations and combine this with some sort of real-time keyword monitoring...
Is that Bill Freeman @2:55 ?
finally Italy will see a new age XD
It will be used on RUclips?! When?
It probably uses a lot more processing power than the current method, I'd expect at least 3 months - 1 year for further refinement.
@@subjectnamehere3023 Well... Google I/O 2019 is in May. So until then they have time to refine it and then introduce it in beta for the public :)
Oh man, would be cool if this could be done with music. Separating both vocals and different instruments.
Already been done
Phonicmind and izotope RX7 is the ansure for this.
Send paper link
אשמח לקבל את המייל שלך ליצירת קשר