Google's AI Clones Your Voice After Listening for 5 Seconds! 🤐

Поделиться
HTML-код
  • Опубликовано: 11 ноя 2019
  • ❤️ Check out Weights & Biases here and sign up for a free demo: www.wandb.com/papers
    The shown blog post is available here: www.wandb.com/articles/fundam...
    📝 The paper "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" and audio samples are available here:
    arxiv.org/abs/1806.04558
    google.github.io/tacotron/pub...
    An unofficial implementation of this paper is available here. Note that this was not made by the authors of the original paper and may contain deviations from the described technique - please judge its results accordingly! github.com/CorentinJ/Real-Tim...
    🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
    Alex Haro, Anastasia Marchenkova, Andrew Melnychuk, Angelos Evripiotis, Anthony Vdovitchenko, Benji Rabhan, Brian Gilman, Bryan Learn, Christian Ahlin, Claudio Fernandes, Daniel Hasegan, Dennis Abts, Eric Haddad, Eric Martel, Evan Breznyik, Geronimo Moralez, James Watt, Javier Bustamante, John De Witt, Kaiesh Vohra, Kasia Hayden, Kjartan Olason, Levente Szabo, Lorin Atzberger, Lukas Biewald, Marcin Dukaczewski, Marten Rauschenberg, Matthias Jost, Maurits van Mastrigt, Michael Albrecht, Michael Jensen, Nader Shakerin, Owen Campbell-Moore, Owen Skarpness, Raul Araújo da Silva, Rob Rowe, Robin Graham, Ryan Monsurate, Shawn Azman, Steef, Steve Messina, Sunil Kim, Taras Bobrovytsky, Thomas Krcmar, Torsten Reil.
    / twominutepapers
    Splash screen/thumbnail design: Felícia Fehér - felicia.hu
    Károly Zsolnai-Fehér's links:
    Instagram: / twominutepapers
    Twitter: / karoly_zsolnai
    Web: cg.tuwien.ac.at/~zsolnai/
    #VoiceCloning #Google
  • НаукаНаука

Комментарии • 2,1 тыс.

  • @Neftegna
    @Neftegna 4 года назад +9093

    imagine getting a call from AI claiming to be you

    • @einekartoffel2490
      @einekartoffel2490 4 года назад +824

      Imagine finding out that they are actually the real you.

    • @That_Awesome_Guy1
      @That_Awesome_Guy1 4 года назад +159

      @@einekartoffel2490 You could just ask them a question only the real you would know.

    • @yiboliang8338
      @yiboliang8338 4 года назад +263

      @@That_Awesome_Guy1 And then you realise they know everything about you because they are from NSA .

    • @martiddy
      @martiddy 4 года назад +22

      It will be awesome if this could be implemented to predict the voice of a person just by looking at 5 seconds of video/gif.

    • @spuriousc
      @spuriousc 4 года назад +76

      Dwight, At 8 a.m. today someone poisons the coffee. Do not drink the coffee. More instructions will follow. Cordially, Future Dwight.

  • @Addsomehappy
    @Addsomehappy 4 года назад +3710

    Now you don't even have to read your scripts, just put them through this thing

    • @dariuszrdk
      @dariuszrdk 4 года назад +717

      U think he isn't doing it already ? (~ ̄▽ ̄)~

    • @TwoMinutePapers
      @TwoMinutePapers  4 года назад +849

      @@dariuszrdk Psst! 😄

    • @NguyenDuy-jd6sm
      @NguyenDuy-jd6sm 4 года назад +214

      Plot twist he already used this to make this video

    • @PRANSHU49
      @PRANSHU49 4 года назад +192

      @@TwoMinutePapers Wait you guys think he wrote the whole script by himself? No Sir. Open AI released the full version of GPT-2 yesterday, pretty sure he used it to write most of the script. Tell me I am wrong @Two Minute Papers ;)

    • @scruff8072
      @scruff8072 4 года назад +7

      Fellini would've loved this

  • @ginsan8198
    @ginsan8198 4 года назад +544

    This paper: We nailed it!
    Voice actors around the world: *trembling in fears*

    • @jeromyperez5532
      @jeromyperez5532 4 года назад +34

      Probably not. Text to speech is never going to have the same kind of vocal control as a quality actor will. Speech to speech is the AI voice acting future. Check out Respeecher. If anything it will mean that the industry will have two different types of actors. Timbre actors who will get probably on a sliding scale depending on their number of 'credits' to date. And then performance actors who are hired for their talent acting instead of just their timbre and register. I think overall if speech to speech technology is improved enough we'll be able to see acting taken to an entirely new level in terms of quality.

    • @tlatosmd
      @tlatosmd 3 года назад +4

      @@jeromyperez5532 THIS! It's not gonna put voice actors out of work, as you can clone a voice but you can't program good acting. I can't wait to resurrect dead voice actors this way.

    • @ilovecairns5181
      @ilovecairns5181 3 года назад

      Jeromy Perez it’s a joke

    • @ginsan8198
      @ginsan8198 3 года назад

      @@jeromyperez5532 Thanks for the wonderful information. I was just joking tho. Haha..
      But yeah, looking forward to the implementation of such technology.

    • @Jacen_Rockwell
      @Jacen_Rockwell 3 года назад +1

      Why only "voice actors"? Similar dystopian to this can of scorpions, is quite capable of rendering the botoxed face of every Hollywood narcissist (even more) irrelevant.
      By rendering their image and making them actual seem MORE human (actually thats not too hard but still...). And that's where the positive spin ends, I'm thinking about, potential 100% "solving" of "murders "caught on camera", kind of tomfoolery.
      God, I sent my friend I haven't seen for a while, *a single selfie,*last week. With some AI app from play store, she's sent me an animated skit of me singing a lousy Katie Perry song...
      Lip sync, eye movement, head movement from different angles...And it's BONE CHILLINGLY IMMACULATE.
      I don't like where any of this is heading...

  • @cks2020693
    @cks2020693 4 года назад +1449

    "What's your credentials?"
    "I've been a voice actor for about 20 years, pretty well respected in the industry"
    "Ok, you are hired, now if you would just kindly read this into the microphone"
    (5 seconds later)
    "Ok your job is finished, nice knowing you"

    • @m88c96
      @m88c96 4 года назад +39

      😢

    • @4zdr456
      @4zdr456 4 года назад +16

      F

    • @trinidad17
      @trinidad17 4 года назад +116

      "Can you please talk about yourself?" (5 seconds later) "Ok sorry, we'll get back to you if we're interested" (voice stolen)

    • @sciencehistoryandentertain734
      @sciencehistoryandentertain734 4 года назад +48

      Yeah, they could hire Tom Hanks for like 10 minutes and sign a contract that includes the use of his voice/manipulation and image. Your next cartoon movie could have the biggest star voices... Or could use the voice of dead famous people using samples.

    • @tristenarctician6910
      @tristenarctician6910 4 года назад

      F

  • @wailrimouche1171
    @wailrimouche1171 4 года назад +2743

    RIP voice recognition security commands.

    • @NguyenDuy-jd6sm
      @NguyenDuy-jd6sm 4 года назад +287

      This research rise a serious concern about security and identification, just like deep fake, I think there will be a war between those who make fake ai and those who try to detect it. Embraced

    • @singatias
      @singatias 4 года назад +80

      People will make an AI to detect cloned voice just like people made an AI to detect deep fakes.

    • @Bobstew68
      @Bobstew68 4 года назад +60

      @@singatias The problem with that is that you can't inspect a voice sample in its "original" resolution, like you can with image and video, pixel by pixel. The sample will be recorded with a microphone before it's inspected, so there's imperfect information. I believe deepfake detection relies on this fine grained resolution.

    • @wailrimouche1171
      @wailrimouche1171 4 года назад +9

      @@singatias well that means more processing power required for speech recognition in small scale embedded systems GREAT...

    • @AsphaltAntelope
      @AsphaltAntelope 4 года назад +8

      1-7-3-4-6-7-3-2-1-4-7-6-Charlie-3-2-7-8-9-7-7-7-6-4-3-Tango-7-3-2-Victor-7-3-1-1-7-8-8-8-7-3-2-4-7-6-7-8-9-7-6-4-3-7-6

  • @rolithesecond
    @rolithesecond 4 года назад +2136

    Missed opportunity: Revealing at the end that your voice was synthesized throughout the video.

    • @bennemann
      @bennemann 4 года назад +250

      Throughout the video would be too obvious. The REAL way to do it would be to silently switch it halfway and see if the spectator notices!

    • @anduro7448
      @anduro7448 4 года назад +9

      Agreed

    • @rolithesecond
      @rolithesecond 4 года назад +69

      @@bennemann How about switching after....5 seconds xD

    • @tokutickler
      @tokutickler 4 года назад +75

      @@bennemann Luckily, this RUclipsr already has a somewhat robotic voice.

    • @a8552bc
      @a8552bc 4 года назад +7

      sam1370 listen closer.

  • @ppmaks668
    @ppmaks668 4 года назад +2524

    Finally i can make my crush say" i love you"

  • @stevenlewis9234
    @stevenlewis9234 4 года назад +91

    The Terminator: [impersonating John's voice] Hey Janelle, what's wrong with Wolfie? I can hear him barking.
    T-1000 impersonating Janelle: Wolfie's fine, honey, Wolfie's just fine. Where are you?
    The Terminator: [hangs up the phone] Your foster parents are dead.

  • @MrFloRolf
    @MrFloRolf 4 года назад +1099

    Imagine going to an audition for voice acting and after 5 Seconds the judges kick you out but a year later you hear yourself in the movie.

    • @zedg7473
      @zedg7473 4 года назад +95

      Yeah that sounds lawsuit worthy. I mean singing 1 second of a song is enough for copystrike.

    • @eelnai2503
      @eelnai2503 4 года назад +3

      like in Bojack Horseman

    • @Scientificmethods
      @Scientificmethods 4 года назад +16

      @@zedg7473 Unbelievably it's legal unless you've copyrighted your voice (which isn't possible), songs have melodies or lyrics which you can copyright.

    • @trinidad17
      @trinidad17 4 года назад +3

      @@Scientificmethods Even after the BTF2 fiasco where they used molds from the face of an actor to impersonate him? But yeah I can see how they could have left out voice impersonation, but likeness is not ok iirc.

    • @Zaire82
      @Zaire82 4 года назад +5

      @@Scientificmethods Is that something you confirmed for this exact scenario?
      Does it not count as exploiting the property of others for personal benefit? Or possibly identity fraud?

  • @omarcortes88
    @omarcortes88 4 года назад +976

    So, basically, this is how Terminators mimic other people's voices.

    • @sadderwhiskeymann
      @sadderwhiskeymann 4 года назад +10

      and TNG Data!!

    • @anrwlias
      @anrwlias 4 года назад +11

      That was my first thought, too.

    • @sadderwhiskeymann
      @sadderwhiskeymann 4 года назад

      @@anrwlias terminator or Data?

    • @recklessroges
      @recklessroges 4 года назад +4

      Terminator mimicry was what I thought of first.

    • @Menaceblue3
      @Menaceblue3 4 года назад +25

      @@DiceKrispy
      Wolfie is fine dear... when are you coming home John?

  • @allisonjuno7654
    @allisonjuno7654 4 года назад +204

    imagine if at the end of the video he revealed the entire voiceover was an AI

    • @annabrenda8694
      @annabrenda8694 2 года назад

      He'd be demonitized if he revealed he was an AI

  • @auzzgamesoi9947
    @auzzgamesoi9947 4 года назад +294

    Expect from the internet:
    Shrek Trilogy with all voice actors replaced by Hitler.

    • @mimitsunekitkat
      @mimitsunekitkat 4 года назад +33

      Shrek Trilogy but Shrek's voice is swapped with Donkey's

    • @bucket4255
      @bucket4255 3 года назад +10

      @@mimitsunekitkat Shrek trilogy but Shrek's voice is replaced by a motorcycle.

    • @CrazyCrayfish
      @CrazyCrayfish 3 года назад +2

      @@bucket4255 motercycle sounds 10 hr loop but the sounds are shreks scripts done in a crude impersonation by donkey

    • @bucket4255
      @bucket4255 3 года назад +2

      @@CrazyCrayfish Motorcycle sounds 1 decade loop but the sounds are shrek's screams played by Donkey with a pitchfork and fire.

    • @DoctorNemmo
      @DoctorNemmo 3 года назад +2

      Shrek trilogy but it's all yaoi voices

  • @colonelgraff9198
    @colonelgraff9198 4 года назад +2970

    Add this to deepfakes. Nothing bad can happen from that...

    • @forsakenbins8835
      @forsakenbins8835 4 года назад +262

      Everything what can be done, will be done.

    • @mzflighter6905
      @mzflighter6905 4 года назад +312

      @Solve Everything Imagine declaring a war as a president. Imagine faking sextapes.
      Pretty terryfying
      Fortunatelly a bit fun tho

    • @arminneashrafi2846
      @arminneashrafi2846 4 года назад +66

      Since this tehnology is known to cause these effects,We will scrutinize evidence even more harshly

    • @millsdickson8498
      @millsdickson8498 4 года назад +173

      People are naive in how they take these things lightly. It will get real when someone is accused of something and the authorities can produce a video where the accused is seen/heard admitting their guilt (police cam video, interrogation videos, etc.). It won't be long before these technologies mature enough to be used outside of the testing environment...

    • @llejk
      @llejk 4 года назад +118

      @@millsdickson8498 It wont be long until any video evidence is useless because you can get perfect videos of everything. This means we reset our trust network to somewhere around 19th century...

  • @xumx
    @xumx 4 года назад +1621

    Pretty sure this episode is synthesised by an AI.

    • @457Deniz457
      @457Deniz457 4 года назад +39

      Yes do you hear that perfect accent ?!? :D :)

    • @danielforrest3871
      @danielforrest3871 4 года назад +6

      I feel the same way.

    • @PromethorYT
      @PromethorYT 4 года назад +38

      It would have been so funny if it was and using that technique. And then at the end of the video tell us the whole episode was synthesized haha.

    • @Mark-zg4ky
      @Mark-zg4ky 4 года назад +15

      hey @twominutepapers you should do a video with an AI reading your script and not tell us until the next episode. Then reveal it!

    • @XxXforestx
      @XxXforestx 4 года назад +2

      I agree! It was driving me so nuts I could only concentrate on that.

  • @brampedgex1288
    @brampedgex1288 4 года назад +87

    Imagine getting a spam call in the voice of your friend, and after speaking for 5 seconds immediately hangs up, and suddenly people use your voice everywhere

    • @halfsni6804
      @halfsni6804 3 года назад +3

      Awesome Idea! thy buddy.

    • @brentfisher902
      @brentfisher902 3 года назад +3

      Patience is a virtue...They already have the call where they have a reel to reel recorder going when they call up and ask you "Can you hear me now." and your instinct will be to say "Yes" and the make copies of that tape and with razor blades and splicing tape and rulers they have your voice signing up for all sorts of financial misery...

    • @khakipeach2128
      @khakipeach2128 2 года назад

      @@halfsni6804 aw hell nah

  • @FengXingFengXing
    @FengXingFengXing 4 года назад +254

    They not tell WHICH human languages use for training. Can AI create voice people say sentence use different human languages?

    • @10_Bit
      @10_Bit 4 года назад +32

      I don't know much but if I'd make an educated guess is that it takes voice and speech mannerisms (accents) and whatever language the machine is outputting will have that accent or the machine is set to that specific language.

    • @fromfareast3070
      @fromfareast3070 3 года назад +6

      you have datasets of the unofficial implementation

    • @usingforposting
      @usingforposting 3 года назад +7

      The point is, you need to have that Language Voice Sources for the AI training. Once you have it, Ai can train itself, after that AI can synthesize the voice.

    • @tylerblackwater6137
      @tylerblackwater6137 3 года назад +6

      They trained 2 models on 2 English datasets (US accent, LibriSpeech, and British accent, VTCK). To be clear, one model is trained on the LibriSpeech dataset and the other one is trained on the VTCK dataset. These models only synthesize English words/sentences. However, it can take in voice recordings of different languages, and the synthesized English words will most likely NOT sound like natural English.
      Finally, to answer your question, it seems that it hasn't been done yet, but it seems very possible for this AI model to create sentences in different human languages. Only small changes need to be made to the model's design to be able to say sentences in one different language. Then, give that model voice recordings in that one language to train with. Boom, give it a recording in that language and you got yourself a model that can replicate that recording's natural voice in that language.
      I glossed over various meticulous details, but overall, that's what needs to be done to make this model talk in a different language. It's definitely something that's easier said than done, but yea :)

    • @fanfare100
      @fanfare100 3 года назад +1

      It can, but the algorithms for one language are not always good for another. It would need to be built and tested for all of the target languages. These prototypes are built in the languages that they will likely first use.

  • @ItsTonyCO
    @ItsTonyCO 4 года назад +682

    This model was trained with ~20K voice recordings. Imagine Facebook training it with 2 Billion.

    • @alexreitler
      @alexreitler 4 года назад +94

      The real power of that would be the ability of the AI to talk in almost any language

    • @farifurido
      @farifurido 4 года назад +8

      @SpinazFou i am inevitable

    • @louis.bodota
      @louis.bodota 4 года назад +41

      Why force 2 billion people to train AI when you can use already existing recordings from Google Assistant

    • @jeromyperez5532
      @jeromyperez5532 4 года назад +6

      @@louis.bodota Force?

    • @ygorgallina2691
      @ygorgallina2691 4 года назад +4

      And twitter introducing voice tweets.

  • @syafsanai
    @syafsanai 4 года назад +1079

    So I can hear my voice in another language. All with the correct native accent?!

    • @photelegy
      @photelegy 4 года назад +129

      Would be awesome to try!
      But not good news for synchron speakers.

    • @TheSwaroopB
      @TheSwaroopB 4 года назад +112

      @syafsanai, Nope, not with this paper and its implementation. Check out the samples at the end of the page (link is from this video's description). They try the exact same thing you mentioned: google.github.io/tacotron/publications/speaker_adaptation/

    • @cuentafake140
      @cuentafake140 4 года назад +7

      Probably not, but I wonder what would happen if the input was in a different language (like spanish)

    • @syafsanai
      @syafsanai 4 года назад +5

      @@carlosquintero4957 Cool, thanks for tthe link.

    • @louiserocks1
      @louiserocks1 4 года назад +12

      Yeah I really wanna use this to hear what my wife's voice would sound like if she spoke English

  • @Certio0
    @Certio0 3 года назад +1

    This is far better than current commercial solutions, amazing.

  • @erik-fisher
    @erik-fisher 4 года назад +1

    Most találtam rá a csatornádra. Szuper a videó.

  • @muaawiyahtucker
    @muaawiyahtucker 4 года назад +117

    Skynet friends. It’s already here!!! Remember that scene in T2 where John connars step parents were killed, and John connar calls time speak to his step mom. Buts she’s already dead, and the T1000 was speaking for her.

    • @muaawiyahtucker
      @muaawiyahtucker 4 года назад +1

      gu4t4f4c thanks.

    • @Johnnymoo
      @Johnnymoo 4 года назад +3

      “Wolfy’s just fine...”

    • @artysanmobile
      @artysanmobile 4 года назад

      Arabic Courses I hate to disappoint you, but that was... ahem... a movie.

    • @Ebendejongh1
      @Ebendejongh1 4 года назад

      I was thinking of the exaxt same scene

    • @frtard
      @frtard 4 года назад +1

      ​@@artysanmobile Just because something is in a movie doesn't mean that it can't exist in reality. The terrifying part isn't the technology itself, it's how lightly people dismiss it as fantasy. Well, it's not. We've been slowly advancing technology so even if it's the _tiniest infinitesimal_ improvement, it's still making that movie look more and more like a documentary. It's only a matter of time.

  • @__-tz6xx
    @__-tz6xx 4 года назад +900

    Is the video narrated by this AI trained on Károly Zsolnai-Fehér's voice?

    • @joshbreidinger2616
      @joshbreidinger2616 4 года назад +102

      It should have been. Would have been such a mind fuck.

    • @chrisbartha2002
      @chrisbartha2002 4 года назад +9

      Heyyy that's a Hungarian name

    • @AndreyAntonchik
      @AndreyAntonchik 4 года назад +8

      I would never be able to spell his name

    • @matthiasvancampen3770
      @matthiasvancampen3770 4 года назад +2

      other than the 5s recording, it also needs to have the full text in speach. Not sure whether this can be an A.I. generated speach as well...

    • @jovijona1425
      @jovijona1425 4 года назад +2

      Fortunately thanks to the accent AI unable to copy. Instant BSOD.

  • @deeelmore4560
    @deeelmore4560 4 года назад

    ur videos bring me joy and u deserve a hug

  • @IapetusII
    @IapetusII 4 года назад +18

    I'm just imagining what this can do for SFM and GMod animations, since this could enable previous inaccessible voice lines.

    • @delta2317
      @delta2317 4 года назад

      if the creator is willing to pay 30 dollars a month for a non monetized video, or 499 a month if they tried to monetize it.

  • @herbin45
    @herbin45 4 года назад +535

    Now this is pretty freaky, i feel like this might be used to scam people.

    • @martiddy
      @martiddy 4 года назад +131

      Or even worse, to create political misinformation.

    • @merowing8588
      @merowing8588 4 года назад +68

      I belive, you misspelled "will be", sir. ;)

    • @NoOne-fe3gc
      @NoOne-fe3gc 4 года назад +29

      Invest in tinfoil hats, I predict a market boom

    • @mittamoa
      @mittamoa 4 года назад +20

      At least we have a reason to come off social media and talk face to face again.

    • @cerebralm
      @cerebralm 4 года назад +2

      @@mittamoa man oh man... how do we do this. :'( humanity MUST be incentivized to return to IRL. but how?

  • @Raudaschl
    @Raudaschl 4 года назад +281

    Somebody the other day was asking if there was a way they could run an online sales seminar without actually having to present the whole thing - they just wanted to answer questions at the end. I think I just found their solution...

    • @nabhanq
      @nabhanq 4 года назад +5

      @@untitled795 in a few years everyone will have it average joes can make deep fakes of pictures easily

  • @starzandearth
    @starzandearth 4 года назад

    You're the only channel I enjoy binging.

  • @vikaskumarojha8616
    @vikaskumarojha8616 2 года назад

    Thanks for bringing out our attention to such good research.

  • @woofcaptain8212
    @woofcaptain8212 4 года назад +57

    Wtf, the synthesis is so good that I might not even suspect it was synthesized if I wasn't told.

    • @sciencecompliance235
      @sciencecompliance235 4 года назад

      Not if you knew them well. There's no way it could begin to replicate someone's personality in such a short amount of time.

    • @unfetteredparacosmian
      @unfetteredparacosmian 4 года назад +1

      @@sciencecompliance235 But that's not the purpose of this, is it? In terms of voice acting and other potential applications, it's pretty much perfect.

    • @sciencecompliance235
      @sciencecompliance235 4 года назад

      @@unfetteredparacosmian Sure, I was just thinking of trying to fool someone into thinking it was someone they knew, either personally or a famous personality.

  • @GglSux
    @GglSux 4 года назад +238

    This is another one that isn't to hard to find a bit "scary"...

    • @clivebonham9944
      @clivebonham9944 4 года назад +2

      If you use it with deepfake could be bad.

    • @Nick_fb
      @Nick_fb 4 года назад +2

      Karol's channel is the scariest on youtube

    • @xanniegaming8760
      @xanniegaming8760 4 года назад +1

      too*

    • @Gambit771
      @Gambit771 4 года назад +1

      Thank god we live in the age of cancel culture so that we can ban anything that is a bit 'scary'.

    • @alessandromangaXD
      @alessandromangaXD 4 года назад +1

      Also quite literally. On the website, made by the same team i think, are other ( more recent??? ) examples, made i think with different datasets and algorithms i think ( i honestly have no idea of how all of this works ). Some of them show faults and failures of the synthesis program, and i swear to god, it sounds like a person being possessed by a demon. Also it's so impressive, i can see a few years from now, when the technology will be openly available, a huge boom of audiobooks. You won't even need someone to dub it, just feed it to the AI. And can you imagine how personal assistants will be five years from now? This shit is creepy and awesome and the same time.

  • @bailahie4235
    @bailahie4235 Год назад +1

    With some adaptions, seems also great for translating human speech from one language to another. By combining it with an automatic text translation tool. E.g. translating a podcast in English to Spanish, preserving the voices of the speakers. First translate the content with Google Translate, then let a Text-To-Speech engine, trained with this approach and the voices of the original speakers, read out loud the translation.

  • @sasufreqchann
    @sasufreqchann 4 года назад +161

    Can I get a singing AI ? I want to do some crazy tracks without singing myself . Or imagine recreating Michael Jackson's voice !

    • @papayagurl9275
      @papayagurl9275 4 года назад +26

      That’s what vocaloid is for

    • @BlackBloodCombatClub
      @BlackBloodCombatClub 4 года назад +12

      UTAU is your best bet since you can use custom voices to make a UTAUloid, but it's rather difficult and not very advanced.

    • @FunnyParadox
      @FunnyParadox 4 года назад +13

      There is UTAU and Vocaloid, but they are not IA based, but IA based Voice Synthesizer you need to see Synth V who is half IA baser (the rendering is IA based) or NEUTIRNO who absolutely everything is IA based ^^

    • @saosaqii5807
      @saosaqii5807 4 года назад +1

      If it’s garbage in then it’s garbage out

    • @davidwuhrer6704
      @davidwuhrer6704 3 года назад

      Yes, you can. It has been done.

  • @Corey_Brandt
    @Corey_Brandt 4 года назад +522

    So technically I could use this to clone Hitler’s normal speaking voice and make it sound like he was testifying at the Nuremberg trials.

    • @Corey_Brandt
      @Corey_Brandt 4 года назад +55

      DeKleinsteCools 20th century alternate history is kind of my thing.

    • @boi9433
      @boi9433 4 года назад +79

      shakira songs with hitler voice

    • @nickelpence
      @nickelpence 4 года назад +10

      That's the best use I found yet scrolling down the comments😂👍

    • @llejk
      @llejk 4 года назад +7

      And you could add the characteristics that have been missed by 1940s microphones and recordings, and add the slighly better 1950 recording sound .

    • @Corey_Brandt
      @Corey_Brandt 4 года назад +3

      llejk you’d have to train a new network to correct for that. Using what ever 1940s recording equipment was used in tandem with modern equipment then train it to correct the errors in old recording to make it sound modern.

  • @grantanderson5100
    @grantanderson5100 4 года назад +143

    Few more steps and we’ll be able to swap actors out of movies and replace them.
    Can’t wait to watch the Incredible Hulk with Mark Ruffalo and Solo with Harrison Ford.

    • @kebomueller732
      @kebomueller732 4 года назад +6

      It has been already been done! Look at ctrl shift face on youtube. Schwarzenegger is so brilliantly done. It's only short clips though.

    • @Exeros
      @Exeros 4 года назад +10

      Grant Anderson yeah imagine improved deep fakes and this.

    • @ge2719
      @ge2719 4 года назад

      @@kebomueller732 not with the voice though, anyone who knows what an actor sounds like the voice being wrong is obvious.

    • @kebomueller732
      @kebomueller732 4 года назад

      @@ge2719 Look at: Schwarzenegger in the coin toss.. The voice is really amazing.

    • @jendabekCZ
      @jendabekCZ 4 года назад +1

      @@kebomueller732 The problem is that 2 actors would act the same scene differently, based on his complex personality ... so if you just swap the faces it will look creepy - not usable. So I don't think this will work for any characters where good acting is important.

  • @green8026
    @green8026 2 года назад

    This may be the best channel of all time for scientists and entrepreneurs!

  • @katomiccomics202
    @katomiccomics202 4 года назад +22

    AI voice replication plus deep-fakes are going to be overpowered.

  • @benjaminbrady2385
    @benjaminbrady2385 4 года назад +153

    Researchers make a perfect AI for killbots
    Two minute papers: *What a time to be alive!*

    • @Kira-just
      @Kira-just 4 года назад +6

      @Hernando Malinche Really good use case

    • @Zharque
      @Zharque 4 года назад +6

      "What a time to be alive!"
      Not for long unfortunately.

    • @polarnyne
      @polarnyne 4 года назад +1

      @Hernando Malinche Skyrim modders like this.

    • @simjianxian
      @simjianxian 4 года назад

      this is so exciting for rpgs

  • @lordkekz4
    @lordkekz4 4 года назад +13

    I love that you put in a pop-up when you said hold on to your papers! (0:36)

  • @zacknetic1463
    @zacknetic1463 3 года назад +10

    Was half expecting him to reveal that the whole voiceover was the AI output.

  • @lavarsch
    @lavarsch 4 года назад +37

    Imaine something like Skyrim where you can choose to use your own voice, just read a 5-second text, and the rest of the game you hear your own voice.

    • @jacksnacc6145
      @jacksnacc6145 3 года назад +3

      Stfu Biden furry.

    • @dinnercat1142
      @dinnercat1142 3 года назад +3

      That would be cool!

    • @lavarsch
      @lavarsch 3 года назад

      @@jacksnacc6145 Careful, or i UwU you.

    • @jacksnacc6145
      @jacksnacc6145 3 года назад

      @@lavarsch I fear no man.... But that Lavars... It scares me.

    • @lavarsch
      @lavarsch 3 года назад

      @@jacksnacc6145 hahaha :'3

  • @codingstation7741
    @codingstation7741 4 года назад +184

    Two minute papers: What a time to be alive!
    Ai: Just wait until I replace you.

    • @616Metalhead616
      @616Metalhead616 4 года назад +6

      I think it already did it, and we are the AI now in a Simulation in a dream of a flying turtle. Or wait what?

    • @ItsMeFacu
      @ItsMeFacu 4 года назад

      AI: Yeah... alive...

    • @defaultaccount9522
      @defaultaccount9522 4 года назад

      No..... Skynet.......

  • @rtificial8292
    @rtificial8292 4 года назад +275

    You know whats really cool about this.
    The recordings google has on you.
    /Ha ha I'm in danger/

    • @4zdr456
      @4zdr456 4 года назад +7

      Oh fuck....

    • @OmniMC
      @OmniMC 4 года назад +2

      i may be wrong but im pretty sure google legall has to delete all data they have of you upon request, and if not you can sue them

    • @jameswalker199
      @jameswalker199 4 года назад +5

      Omni
      Google might delete it, but don't forget that they're an American company, so the NSA can just threaten that they aren't being patriotic enough and that it would be unamerican of them to not hand over all those voice recordings the moment they come in.

    • @davidwuhrer6704
      @davidwuhrer6704 3 года назад

      @@jameswalker199 No need to threaten. It's their legal responsibility.

    • @NelRogge
      @NelRogge 3 года назад +2

      The research for this was most likely done with those recordings. These are the types of things they collect it for :)

  • @aleksandari.7834
    @aleksandari.7834 4 года назад +6

    Perfect! I need Stephen Fry to read my audio books. :)

  • @mattrommel9521
    @mattrommel9521 3 года назад +14

    I've been inspired to make a voice encoding/decoding workflow from watching your previous videos, so this one was very exciting.
    My interest isn't in copying voices, but making changes to a person's voice (changing his pitch, vibrato, vowel, placement, etc). If successful, it would serve as a singing coach and a next-gen autotune device.
    Has there been any other relevant research in that area?

  • @roua.
    @roua. 4 года назад +20

    This kind of tech would be really useful for translating movies into other languages while keeping the original actor's voice, i hope to see some of this soon. Thanks for the great video.

  • @gorgolyt
    @gorgolyt 4 года назад +5

    May I say that, in addition to the "wow factor" of the final results at the start of the video, your more detailed expositions of the technical details of the papers are very much appreciated.

  • @OmarAli-mw9gq
    @OmarAli-mw9gq 4 года назад +6

    Imagine the potential for fan games/animation voice lines etc

  • @ykVORTEX
    @ykVORTEX 3 года назад +5

    2019: AI needs 5 sec to learn your voice
    2025: getting your DNA while driving a car at 60mph from a CCTV footage

  • @TechnoMinarchist
    @TechnoMinarchist 4 года назад +316

    Indie and visual novel developers could really use this to improve their games without having to hire voice actors.

    • @djr4yman
      @djr4yman 4 года назад +62

      Also for games that have a lot of NPCs like Skyrim or similar. One would only need to pay an actor for some seconds and then have the voices forever

    • @Corey_Brandt
      @Corey_Brandt 4 года назад +111

      That raises questions as to who has the rights to your voice.

    • @TechnoMinarchist
      @TechnoMinarchist 4 года назад +42

      @@Corey_Brandt We could probably get the AI to tweak the voice to whatever we like. Perhaps blend voices to create hybrids.

    • @lampuhijau9900
      @lampuhijau9900 4 года назад +18

      yeah but it mean another people will lose their job and position.

    • @TechnoMinarchist
      @TechnoMinarchist 4 года назад +32

      @@lampuhijau9900 Ultimately this is the future for all jobs. AI will replace all jobs one day even scientists will get replaced. Governments should figure out what their plans are for the future because there won't be a job market by the end of the century.

  • @sigmata0
    @sigmata0 4 года назад +14

    "My voice is my passport"
    AI:"I let my myself in... thanks"

  • @Alexrider02
    @Alexrider02 4 года назад +1

    Could we use this to make voice command software work nearly perfectly? As in, instead of having to program it to understand every individual word, or gradually get better over time as you use it, could you just speak a 5-second phrase and have it synthesize your voice, checking what you say against its own synthesized version in order to confirm accuracy?

  • @bbglas007
    @bbglas007 3 года назад +1

    This is probably the best voice cloning software i have seen so far. Even newer ones arn't as good as this demo

    • @ResembleAI
      @ResembleAI 3 года назад +1

      Challenge accepted ;)

  • @sunnywung6395
    @sunnywung6395 4 года назад +93

    why do i feel like we're constantly playing with fire?

    • @furinick
      @furinick 4 года назад +13

      Ever since we invented fire and spears

    • @a.person1805
      @a.person1805 4 года назад +7

      Because playing with fire is fun and educational. If you survive.

    • @mjt1517
      @mjt1517 4 года назад +1

      Don't be a puss.

    • @Audiostoke1
      @Audiostoke1 4 года назад

      @@furinick yep for the most part worked out ok

    • @OnEiNsAnEmOtHeRfUcKa
      @OnEiNsAnEmOtHeRfUcKa 4 года назад +2

      You wouldn't be eating cooked food if we hadn't.

  • @Gotinha123
    @Gotinha123 4 года назад +230

    “How many hours do we need?”
    “No.”
    Well thats no awkward at all

    • @smallbluemachine
      @smallbluemachine 4 года назад +5

      How much training did the network need?
      Yes.

    • @allstar4065
      @allstar4065 2 года назад

      If you remove the word 'how' in the question, the answer kinda makes more sense.
      Many hours do we need?
      No

  • @ssabykoops
    @ssabykoops 4 года назад +2

    Best use for this in video games would be to have the voice actors address the player as their chosen name, like let's say for example skyrim's npc's calling you the name you typed at the character creation screen. Emersion +100%

  • @user-sb3ds9om4c
    @user-sb3ds9om4c 4 года назад

    Your content is amazing!

  • @maxdmn99
    @maxdmn99 4 года назад +25

    "Wolfie's fine, honey, Wolfie's just fine"

    • @mikebordeaux8218
      @mikebordeaux8218 4 года назад +2

      max d ... your parents are dead.

    • @bassplayer807
      @bassplayer807 3 года назад

      Somethings wrong she’s never this nice, lol

  • @KeithKritselis
    @KeithKritselis 4 года назад +20

    Interesting... Is there a "quick fox lazy dog' phrase for phonemes?

  • @cold_static
    @cold_static 3 года назад +3

    "Hey Janelle, what's wrong with Wolfie? I can hear him barking."

  • @yeong126
    @yeong126 4 года назад

    It's getting closer and closer, I'd love to see it with my eyes

  • @DogsaladSalad
    @DogsaladSalad 4 года назад +5

    wish i knew of more channels like this. def one of my favorites

    • @ayy2193
      @ayy2193 4 года назад

      Singularityprosperity is another one, not a whole lot of similarities but it's a good AI channel

  • @ayy2193
    @ayy2193 4 года назад +149

    Plot twist: this channel has been narrated by an AI this whole time

    • @rkan2
      @rkan2 4 года назад +11

      He does sound a bit robotic now that I think about it.

    • @DigitalXrisXros
      @DigitalXrisXros 4 года назад +4

      🤣🤔😱

    • @matthew8153
      @matthew8153 4 года назад +2

      Dawn
      Ok M night Shyamalan

    • @616Metalhead616
      @616Metalhead616 4 года назад +2

      Plot twist twist: We all here in the comments are nothing more than AI.

    • @yahyaahmed8145
      @yahyaahmed8145 4 года назад

      @@616Metalhead616 plz don't bring that Elon Musk theory here

  • @nurbekfayziev
    @nurbekfayziev 4 года назад +3

    I was waiting for the ai cloned voice, apparently realised that i already heard the both of them. Didnt notice anything. Thats scary perfect !

  • @slicedpage
    @slicedpage 2 года назад

    just great, not only do we need two-step authentication for aspects of the internet we now need it for real life.

  • @0MVR_0
    @0MVR_0 4 года назад +16

    augmented or specialized autoencoders and Boltzman machines are getting more powerful these days
    question the narrative that such architecture is outdated or primitive

  • @DrZaius3141
    @DrZaius3141 4 года назад +5

    Holy crap. We need to protect voice actors/actresses right now or otherwise they will be bullied out of existence. Imagine a video game where you just have to provide a small voice sample and your character will be fully voiced from there on - with your own voice. Brilliant and amazing, but it also means that developers might pay voice actors for 5sec of dialogue (at least for the minor parts) and synthesize the rest, unless it gets properly put in law.

    • @michaelkochalka3251
      @michaelkochalka3251 4 года назад

      Good luck having it "properly put in law", unless the law is done by some A.I., otherwise we will take just too much time arguing and deciding that by the time they properly regulate it, 100x more things would require atention.
      They way we are doing politics needs a complete overhaul otherwise we will slowly descent into chaos.

    • @connormccluskey9103
      @connormccluskey9103 4 года назад +2

      Or fix the underlying problem of us creating useless jobs and start implementing UBI because 90% of us will not have jobs in the next century.

  • @teroblepuns
    @teroblepuns 4 года назад

    Finally, some great-sounding navigation voices

  • @marknorville9827
    @marknorville9827 4 года назад

    All of this is scary but also exciting at the same time. Scary as in a lot of harm and damage can come out of this. However, you could bring back movie stars, do new movies with their faces and voices, potentially even singers and comedians as well.

  • @willrope5839
    @willrope5839 4 года назад +35

    I replayed 0:02-0:04 three times and the furniture in my room started floating.

    • @rsahu609
      @rsahu609 3 года назад

      This cracked me hard 😂😂😂

  • @ArnaudMEURET
    @ArnaudMEURET 4 года назад +5

    “Wolfie’s fine, honey, Wolfie’s just fine”

  • @TheHumanSystem
    @TheHumanSystem 3 года назад

    The Human System approves this content. Good work two minute papers!

  • @_vox
    @_vox 4 года назад

    your channel is incredible

  • @UonBoat
    @UonBoat 4 года назад +54

    Oh so now I can sound like two minute papers :D
    ...or you can sound like me?

    • @Iosaiv
      @Iosaiv 4 года назад +2

      read this in his voice.

    • @bonbonpony
      @bonbonpony 4 года назад +1

      @@Iosaiv I do that all the time with many people's voices, which means my brain can do that, which means it's possible. That's what I always thought is possible. And now it really becomes possible.

    • @Iosaiv
      @Iosaiv 4 года назад

      @@bonbonpony yeah I've done that more often as well. Can be very fun. :)

  • @zakuro8532
    @zakuro8532 4 года назад

    This is making your own UTAU VB way more easy...

  • @SCAlex_Musician
    @SCAlex_Musician 2 года назад

    That also poses major security concerns. Identification theft is one example. IVR systems that ask vocal permission to do things over the phone should be also updated now security wise to prevent that.

  • @BombalurinaAI
    @BombalurinaAI 4 года назад +3

    With this technology, my parents can finally tell me how much they love me!

  • @P8qzxnxfP85xZ2H3wDRV
    @P8qzxnxfP85xZ2H3wDRV 4 года назад +5

    This episode would have blown my mind, if you revealed in the end that the voice-over actually was synthesized from a 5 second sample of your voice.

  • @roadblok6713
    @roadblok6713 4 года назад

    This technology amongst a few others is what I think could eventually force and require every human to wear a personalised piece of (perhaps integrated) technology. For use of identification but also clarification in circumstances where you can imagine this will get ugly. In court is my first thought. The only possible way I can think to dispute a clear video showing you committing a crime and/or verbally confessing is to have something on you always recording, tracking, communicating with nearby people to show you weren't there at that time etc.
    Amazing but also scary stuff. What a time to be alive.

  • @hyperchango
    @hyperchango 3 года назад

    Each and every one of your videos fill me with an existential dread for our future as a species.

  • @luck3949
    @luck3949 4 года назад +8

    Károly, please, make it happen : generate one of the episodes with AI already! Show us that we've passed beyond the singularity.

  • @CuriousAndCuriouser1865
    @CuriousAndCuriouser1865 3 года назад +7

    This is weird, imagine making someone you know is dead speak to you.

  • @Exe3D
    @Exe3D 2 года назад

    wow i remember this, two years, two new papers right !

  • @SlyHikari03
    @SlyHikari03 3 года назад

    This is cool.
    There needs to be one for making a Utau/Vocaloid out of vocal clips.
    I wanna make a vocaloid of my voice!

  • @WangleLine
    @WangleLine 4 года назад +3

    WHAT ON EARTH!?
    This is the first time I looked at the results and my jaw fell off my head

  • @ohjein
    @ohjein 4 года назад +18

    when they get Zizek right I'll be impressed

    • @watermelonhead8054
      @watermelonhead8054 4 года назад +2

      you'd need to add an RNG that gives each of the words spoken a 5% chance to be followed by *snrff*

  • @marcusrudd6675
    @marcusrudd6675 3 года назад

    Whoooaaaaaaa, actually how does that work. That is beyond comprehension. O' wait you just got to the part where you explain it... So it builds crazy banks of every talking style then profiles someone into a pre learnt set of vocals. That's insane. What a time to be a live indeed.

  • @ntkbnd7897
    @ntkbnd7897 3 года назад

    So finally the "Case Closed"-Tie can be build now!!!

  • @DriesduPreez
    @DriesduPreez 4 года назад +4

    ...aaand now that scene from Mission Impossible 3 has been justified. Now we only need face capture and costume prosthetics 3D printing to be nailed.

  • @generalfishcake
    @generalfishcake 4 года назад +9

    I wasn't the only one listening for irregularities in Károly's voice, was I?

  • @slurpii4142
    @slurpii4142 3 года назад

    As a producer, I'm loving it

  • @sergejzr
    @sergejzr 4 года назад

    Amaizing work!

  • @sarchoj
    @sarchoj 4 года назад +4

    Waiting for the Facebook viral "Hear how your voice sounds like when simulated by a machine"

  • @NithinJune
    @NithinJune 3 года назад +68

    Most voice cloning apps: You have to read this specific script that's 1 hour long and sign this written contract and verbally confirm that you are the person
    This thing: haha 5 second clip go brrrrr

  • @amosnimos
    @amosnimos 4 года назад +2

    I can think of scary application for this... and some funny creative one. Each time you give a hammer, the user can use it, to break or to build.

  • @rasikamayalu2883
    @rasikamayalu2883 4 года назад

    Wow this is amazing ⭐️

  • @MinerMovie
    @MinerMovie 4 года назад +33

    Im totally making a JARVIS that sounds like morgan freeman

    • @RBLXProd
      @RBLXProd 4 года назад

      acapela-box.com/AcaBox/index.php There is an old man feature in this website, It almost sounds like Morgan Freeman with a smooth sentence reading AI

  • @468erpeashooter9
    @468erpeashooter9 3 года назад +18

    It's incredible, but all the synthesized voices sound like they've been diagnosed with a new level of depression.

    • @kipchickensout
      @kipchickensout 2 года назад

      so you say people won't notice I'm actually just using my trained network?

  • @WillakaPlaeground
    @WillakaPlaeground 3 года назад

    thanks, please do more audio and music related papers

  • @ThatExplo
    @ThatExplo 2 года назад +2

    I've been trying to install this for the past 3 hours

  • @abdulhfhd
    @abdulhfhd 4 года назад +4

    bruhhhhhhhhhhh
    I need this so bad, I want someone to read EPUB books for me