Looking to Listen: Audio-Visual Speech Separation (SIGGRAPH 2018)

Поделиться
HTML-код
  • Опубликовано: 16 апр 2018
  • The video accompanying our paper: "Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation".
  • НаукаНаука

Комментарии • 26

  • @nolanjshettle
    @nolanjshettle 5 лет назад +8

    as we all know, the human brain is incredibly well suited for this sort of task. On the sports debate example where it shows combined, then each part seperate, I watch the combined part three times. Once without looking, and couldn't understand much of what either was saying. Then watched it while looking at the right and left to do my own audio isolation. I can focus on each and know what they're saying, however, THIS DOES A BETTER JOB. I can more clearly understand each of them while listening to it's isolation, than I can doing it myself. PLUS it frees me from focusing on their mouth. INCREDIBLE. Another superhuman neural network, I am repeatedly amazed by this.

    • @DakotaJones-nn2oi
      @DakotaJones-nn2oi 8 месяцев назад

      With all due respect: No. the fuck. it's not. This is like a fever dream where I'm able to see how sound happens in my head. It's insane. How do human beings function? It's a mess, the whole thing. No semblence of consistency or constance.

    • @DakotaJones-nn2oi
      @DakotaJones-nn2oi 8 месяцев назад

      I can barely even tell who's talking with all the motion.

  • @PaulDoesIt
    @PaulDoesIt 5 лет назад +15

    This is crazy good. The implementation possibilities.. wow

    • @maxraider49
      @maxraider49 5 лет назад

      Pavel Lelin is there an app for that

    • @PaulDoesIt
      @PaulDoesIt 5 лет назад +1

      Max Raider not yet

  • @freekbeta
    @freekbeta 5 лет назад +19

    Can you make it available in real life when I argue with my wife? I will pay gold :)

  • @stefchristensen47
    @stefchristensen47 3 года назад

    Miki, this is amazing work. I really appreciate that you added the "Comparison with audio-only" section. That greatly helps us understand how much better audio-video is than audio only. Appreciate all your work, man. :)

  • @BryanSteacy
    @BryanSteacy 6 лет назад +2

    This is amazing. I really hope to see this in the hands of consumers at some point in the future. Or at the very least law enforcement. Could prove invaluable for anything from home videos all the way up to professional uses.

  • @iwozzy
    @iwozzy 5 лет назад +4

    Amazing! Where can we try it ourselves?

  • @billherreid9661
    @billherreid9661 6 лет назад +5

    Even after cleanup, RUclips's auto generated captions renders 4:14 "I hope you do" (should be "OK Google")
    I would have thought this was the one phrase it would get right!

    • @sirbughunter
      @sirbughunter 5 лет назад

      Well... Isn't this kind of proof that Google (at least nowadays) doesn't advertise themselves over what is belonging to the law and orders of countries? Doesn't this actually mean that Google tries to get rid of any bias in their algorithms?

  • @AtasiNarksri
    @AtasiNarksri 6 лет назад +1

    Great job guys! Keep going :)

  • @canlin2189
    @canlin2189 11 месяцев назад

    Hopefully this technology will be open for free for personal use, TYVM!

  • @antoniotech2000
    @antoniotech2000 6 лет назад +1

    well Done! Resolved various issues in audio!

  • @JDLeeArt
    @JDLeeArt 10 месяцев назад

    I image this would be extremely beneficial to those CIA/NSA types. If you could monitor large crowds and focus in on individual conversations and combine this with some sort of real-time keyword monitoring...

  • @KrishnaDN
    @KrishnaDN 5 лет назад

    Is that Bill Freeman @2:55 ?

  • @ONDANOTA
    @ONDANOTA 5 лет назад +5

    finally Italy will see a new age XD

  • @easycuttv
    @easycuttv 5 лет назад +2

    It will be used on RUclips?! When?

    • @subjectnamehere3023
      @subjectnamehere3023 5 лет назад

      It probably uses a lot more processing power than the current method, I'd expect at least 3 months - 1 year for further refinement.

    • @sirbughunter
      @sirbughunter 5 лет назад

      @@subjectnamehere3023 Well... Google I/O 2019 is in May. So until then they have time to refine it and then introduce it in beta for the public :)

  • @NeWx89
    @NeWx89 6 лет назад +2

    Oh man, would be cool if this could be done with music. Separating both vocals and different instruments.

    • @fleecemaster
      @fleecemaster 5 лет назад +2

      Already been done

    • @johneygd
      @johneygd 5 лет назад

      Phonicmind and izotope RX7 is the ansure for this.

  • @triplemmm3
    @triplemmm3 2 года назад

    Send paper link

  • @omerbenbaron6433
    @omerbenbaron6433 4 года назад

    אשמח לקבל את המייל שלך ליצירת קשר