Deepfake AI voice clone: 30min vs 8hrs of training (Descript Overdub demo)

Поделиться
HTML-код
  • Опубликовано: 14 сен 2021
  • I updated my Descript Overdub AI voiceover tool with 8 hours of training audio of my voice, and WOW did it make a difference! You've got to check this out.
    3,670
  • НаукаНаука

Комментарии • 118

  • @lachlanmoore2345
    @lachlanmoore2345 Год назад +83

    I am planning on using this tool to get my friends arrested by confessing to crimes they didn't do!

    • @sun-eye
      @sun-eye Год назад +3

      Hey, that's a cool little trick.

    • @billagond9209
      @billagond9209 Год назад

      Something liike this happen on Discord, never believe anything you see online.

    • @generationzee13
      @generationzee13 Год назад

      Hell Nah

    • @thisguyIoI
      @thisguyIoI Год назад

      💀

    • @Ayplus
      @Ayplus Год назад

      That's what friends are for

  • @OwlofAsia
    @OwlofAsia Год назад +20

    Short but to the point, just what I wanted. Thank you

  • @allenhuffman
    @allenhuffman 2 года назад +20

    That’s a really good demo of the technology. Back when I had a bunch of podcast shows, this would have greatly reduced the time I spent editing them each week. I can only imagine where it will be in ten years.

    • @IntenseInvestor
      @IntenseInvestor 2 года назад +2

      Can probably just think about the words and they will appear....

  • @FusionThunder
    @FusionThunder 2 года назад +24

    It's pretty good! I wonder how it will sounds like after a 100 hours

  • @hasan7786
    @hasan7786 2 года назад

    Whooooo! Thanks for these videos. Answered all my questions before I pulled the trigger.

  • @dougjohnson1517
    @dougjohnson1517 2 года назад +5

    Sounds like I could replace audio book readers I hate with my favorite narrators. And if it's a little flat, good! Because what I hate is overacting.

  • @gonzcasa
    @gonzcasa 2 года назад +10

    mind blown, this is a game changer for a lot of businesses

  • @formulavicio4273
    @formulavicio4273 2 года назад +1

    Lol everything was text to speech man this is awesome, i have problems in create videos since my house makes a lot of noises with small space and multiple persons around this is awesome!

  • @ricebeansrockroll882
    @ricebeansrockroll882 2 года назад +7

    It still sounds a bit "crunchy".
    But I'm not sure I would have thought you where a computer as much as using a bad microphone.

  • @tomdchi12
    @tomdchi12 2 года назад +10

    Yikes! I didn't notice the on first viewing (or hearing)! Second time through, I picked up where the system mushes some sounds, and that the intonation is flat, but overall... scary good.

  • @ArkyonVeil
    @ArkyonVeil 2 года назад +15

    Great improvement over the last test. Though its still quite noticeable when someone has heard your voice before and is wearing headphones. The artifacts like a slight electronic tinge and the unnatural inflection kind of reveal the whole sharade.
    HOWEVER: If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone.
    Technology is improving fast, likely it is that in 2022 most of these issues have been ironed out.

    • @RealJamesArcher
      @RealJamesArcher  2 года назад +4

      Yeah, the artifacts are one of the biggest giveaways, but I don't think it'll take them long to sort that out. It's amazing to me how realistic they're getting the intonation. It's only a matter of time.

    • @mnomadvfx
      @mnomadvfx 2 года назад

      "If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone"
      ^^ This 100%.
      Not just the microphone, but also the audio compression which is far from perfect in every single encoding - especially if you are experiencing frame drops causing audio glitches at the same time.

    • @onliemovie8994
      @onliemovie8994 8 месяцев назад

      It was obvious that this wasn’t your voice but still impressive honestly

  • @Masterhunter325
    @Masterhunter325 2 года назад

    This is sick!)) I couldn't tell you used Overdub for the entire video)))

  • @chrismarzarella2106
    @chrismarzarella2106 Год назад +3

    I've been trying to improve the robotic voice that it gave me to begin with, but I cant seem to find the settings of how to upload my podcasts which I have an abundance of my natural voice. Can you do a video of how to do this?

  • @CassiusTheShow
    @CassiusTheShow Год назад

    Hey James -- great video. What would you recommend today is the best way to train a model that I have 10 hours of interview audio with?
    For a documentary I'm working on -- I want to feed it audio of a professional actor performing a monologue I wrote, and use the model to overdub the documentary subject's voice onto.

  • @ikemreacts
    @ikemreacts Год назад +1

    This video deserves the follow just because it is so clever.

  • @georged822
    @georged822 Год назад +1

    I noticed it right away, it reminds me of listening to low bit rate audio. But the 8hrs def made your voice sound higher rez. Maybe after 400 hours it will sound realistic?

  • @RootoonsEkim
    @RootoonsEkim Год назад +1

    Something I plan to do with Overdub is to clone my high voice for a character in my show I have, so when I get older and cannot do that anymore I will be able to use overdub to keep the voice in store.

  • @BungieStudios
    @BungieStudios Год назад

    Very accurate. I could tell though probably because I am wearing headphones and also expected it based on the subject material. Slight gaps and no breaths in the audio. However, if I didn't know any better it would fool me.

  • @takiatravels
    @takiatravels Год назад +1

    Sounds Awesome, I've added about 12mins. Is it best to make a new dub with new longer audio or add new audio by editing the existing one?

  • @jartinte
    @jartinte Год назад

    Awesome did you have some guide for foundational model to train voices ?

  • @jasonwood999
    @jasonwood999 Год назад +1

    Wow...I hadn't noticed... this is fantastic

  • @troylee5205
    @troylee5205 2 года назад +2

    that was a good one James... Question, are you able to download the audio and put it elsewhere, let's say i wanna edit on Davinci, instead of on Descript's video editor?

  • @HowTechTo
    @HowTechTo 2 года назад +8

    That's pretty good! Did you use your own audio for the 8 hours or did you use their script? Not 100% clear if you have to use their script or not. I submitted 10 minutes and... it's not enough

  • @HansCNelson
    @HansCNelson Год назад

    Is there a way to train an overdub voice on a specific speaker once speaker labels have been applied to the video?

  • @VaibhavShewale
    @VaibhavShewale 2 года назад +2

    i tried to train but it never worked for me

  • @ericstephenvorm
    @ericstephenvorm Год назад

    Impressive results!

  • @LaTigerGenesis
    @LaTigerGenesis 2 года назад

    lovin' your channel, hombre!

  • @dobishs
    @dobishs Год назад +1

    would this work when using someone else voice but I need to input words from another language?

  • @atranimecs
    @atranimecs Год назад

    wow its gotten way better.

  • @NightDocs
    @NightDocs 2 года назад +20

    It was pretty immediately apparent it was AI from the beginning. I’m actually kind of surprised because my personal voice clone on Descript sounds a little better while in yours I’m still hearing some artifacts.
    Also, the 8 hours of voice training was probably a little wasteful because the voice trainer only accepts about an hour of training, as far as I remember which means it probably took the first hour and discarded the rest.
    The improvement was almost entirely because their algorithm was recently updated and improved, not necessarily because you gave it more training
    Edit: saw your other video was only a few months ago so the extra training did probably help. It’s also been awhile since I read about the max training time so they may have changed that shrug 🤷🏼‍♂️
    Great video either way!

    • @Revontuletband
      @Revontuletband 2 года назад +1

      I think that different voices will get different results, well, just because they are different. For instance, I feel like a lot of quirks in James's clone are coming from the fact that his real voice has a lot of vocal fry. It's a kinda of natural distortion so it's probably not good for the AI. But anyway, it's very interesting to watch how thngs improve!

    • @mnomadvfx
      @mnomadvfx 2 года назад +3

      I feel like using compressed audio as training data isn't the best start however much more time it takes to upload uncompressed audio.

  • @gfreezy619
    @gfreezy619 Год назад

    This is exciting and scary at the same time

  • @GingerBooth
    @GingerBooth Год назад

    Great demo!

  • @themaskio4804
    @themaskio4804 Год назад

    Awesome. But how do you train the AI with the voice you want it to learn?

  • @TheAaronalden
    @TheAaronalden 2 года назад +18

    I could tell immediately because I was listening for it. Still very impressive! I think within a year they could have it perfected. I'm not sure what makes it sound off, it just has kind of a digital glitch like when you set spoken words to autotune.

    • @Daniel_WR_Hart
      @Daniel_WR_Hart 2 года назад +2

      Same, but only because I was partially expecting to get punk'd. If this was anyone else's video on a different topic, I would have assumed they had a sore throat.

  • @instacoachhim
    @instacoachhim Год назад

    Hey I would like to ask a tiny problem of descript do you have it before : wrong pronounce ! do you know how to fix it? Thank you

  • @TheADCRogue_YT__
    @TheADCRogue_YT__ 10 месяцев назад

    As soon As you played the first version and said you were going to compare I realized the entire video had been dubbed

  • @realjgerard
    @realjgerard Год назад

    Crazy how this was a year ago… my have we come a long way…

  • @ivirlei
    @ivirlei 2 года назад

    Incredible video!

  • @FERNANDOPENAS
    @FERNANDOPENAS 8 месяцев назад

    Will descript immitate my accent as well or only my voice pitch and tone?

  • @BAWalks
    @BAWalks 11 месяцев назад

    Mind Blown.

  • @augustine_
    @augustine_ 2 года назад

    Is this means we only need to upload our voice id statement + the audio of our podcast/anything and not the script given by them for the initial voice overdub setup? Or we want to first upload voice id statement plus their 30 minute transcript to get overdub voice, then for more accurate overdub, upload other file with voice id statement with our podcast audio? Waiting for your reply.

    • @RealJamesArcher
      @RealJamesArcher  2 года назад +2

      Hey, Augustine, it pretty much means you just need the voice ID statement and then whatever audio you can pull together. I just took all the raw recordings from my past video shoots and stitched them together in a single audio file, and that worked for me!

  • @roastking860
    @roastking860 Год назад

    I'm trained as hell, let's go. I could totally the tell the difference

  • @LiveWellUkraine
    @LiveWellUkraine Год назад +1

    Creepy... and cool. (Which is how good tech starts.) The question is James... will you use this power for good or evil?

  • @illiniry
    @illiniry Год назад

    I have lots of audio files of my deceased father, can I use descript to clone his voice or can I only train it by repeating their phrases? Please help, I would really like to bring my dad back thanks.

  • @matthewfuller9760
    @matthewfuller9760 2 года назад +3

    From the owner: "While you can edit Projects offline, you still need an internet connection to transcribe audio." Does this mean I cannot record output from my own voice without an internet connection once the project files have been generated? Transcription refers to the conversion of speech to text. I want to be able to type and have it output my voice without being online.
    In other words, they dont have a general model of my voice. Every new word is novel and must be computed using their servers otherwise it would take forever?

    • @RealJamesArcher
      @RealJamesArcher  2 года назад

      I'm not positive, but I would assume that both the text-to-speech and speech-to-text require round trips to the server because they're both pretty processor intensive.

    • @matthewfuller9760
      @matthewfuller9760 2 года назад

      @@RealJamesArcher yep. I am thinking the same way.

  • @oz4549
    @oz4549 11 месяцев назад

    This will be huge in the adult industry

  • @marcs7847
    @marcs7847 2 года назад

    Sick!

  • @user-hb3zm3nj8j
    @user-hb3zm3nj8j Год назад

    Can I create digital audio in Arabic?

  • @Thatsmessedupman
    @Thatsmessedupman Год назад

    Yup. I knew instantly your voice was ,ai even with the training, There is an underlying gravely sound in the voice with a hit of warping and electronic feel.

  • @IntenseInvestor
    @IntenseInvestor 2 года назад

    Wild....going to try this on my channel lol

  • @davidcovington901
    @davidcovington901 2 года назад +1

    Thanks for all the hard-work investigation and the Buy rating. Hoping my old laptop is up to specs for using it, because I plan on becoming addicted, to rest my voice.
    Will we ever hear your voice live again?

    • @RealJamesArcher
      @RealJamesArcher  2 года назад +2

      Oh yes, I don't expect to actually use this much on a day-to-day basis. There's no substitute for the real human voice and the subtle distinctions it can make. I'll probably use this for occasional patching up or repairing something I said wrong, but not much else. I still plan to shoot my videos the old fashioned way!

  • @randyrektor
    @randyrektor 2 года назад +4

    I love this, but I hate this, but I love this.. you know?

    • @RealJamesArcher
      @RealJamesArcher  2 года назад +3

      I feel the same! It:s very clever, but it's a disturbing road to be on.

  • @founderedmoney
    @founderedmoney 9 месяцев назад

    First time I’ve said wow out loud to an ai tool

  • @huynhdanghaiau
    @huynhdanghaiau 2 года назад

    I want to learn, you can open this good knowledge class

  • @TechieSewing
    @TechieSewing 2 года назад +2

    To be fair it doesn't sound much like your voice or you own sound arrangement, all that combination of mic proximity, echo and so on. But it does sound like a human voice :)

  • @singlesightart
    @singlesightart 2 года назад

    That is so awesome

  • @Omnikam
    @Omnikam Год назад

    This could give back someones voice lost to cancer

  • @Madbeef878
    @Madbeef878 2 года назад +1

    Wow! As great as this tool would be for content creators, I can see it 100% being a 'must have', for the criminally minded. "Hello Mr Archer, How are you doing today? I'm just calling you about your bank account......"

  • @annabrenda8694
    @annabrenda8694 2 года назад

    I noticed RIGHT FROM THE BEGINNING that you were using the AI

  • @joshualopez9259
    @joshualopez9259 2 года назад

    I could tell it wasn't really you like 5 words in, still has a tone to it that let's you know but not bad, but doesn't help that it made every last word of a sentence you say sound so low and goes down.

  • @matthewfuller9760
    @matthewfuller9760 2 года назад

    Once trained by your voice for 8 hours, can you then use the tool offline? I imagine not right.

    • @mnomadvfx
      @mnomadvfx 2 года назад +1

      Indeed.
      I would be wary of doing this in the first place.
      It's one thing for people like celebrities that have tens of thousands of hours of their voice on record due to their public exposure - but for the average joe not wanting their identity stolen this could potentially be dangerous.

  • @gonzcasa
    @gonzcasa 2 года назад

    soon you'll be able to make your own music with an artist that you want

    • @RealJamesArcher
      @RealJamesArcher  2 года назад

      Yeah, there are definitely some weird ethical concerns here. This particular company requires the training voice to recite a verbal contract, but there are ways to get around that (like having an impersonator do it) and nothing stopping people from downloading and using their own software on any training data they want. Weird times ahead.

  • @Anurania
    @Anurania 2 года назад +1

    Not good enough to replace voice actors yet but maybe within five years. I'm thinking mostly in terms of video games where we want characters to have an endless amount of things to say.

  • @andersonsystem2
    @andersonsystem2 2 года назад

    wow! OMG

  • @relaxbro5605
    @relaxbro5605 2 года назад +1

    It was obvious from the beginning BUT I wonder how much better it would get if you let autotune work on it🤔 do you know an audio engineer who could do this? Would love to see/ hear how this turns out. Maybe with autotune it would be even harder to tell the difference.

  • @sun-eye
    @sun-eye Год назад

    Yeah, I could tell from the very beginning that your voice sounded robotic. I would suggest using it for longer. Maybe, a couple of days because the difference between the 30 minute and the 3 hour is very big.

  • @shondmichael1363
    @shondmichael1363 2 года назад

    Wow.

  • @alistair21
    @alistair21 2 года назад

    sounds way better than mine which is still pretty shit after providing an hour of training.

  • @angelicearthling
    @angelicearthling 2 года назад

    It's pretty good, but it still has that robotic tone to it. I could tell it wasn't your actual voice from the beginning.

  • @aliaskennedy7897
    @aliaskennedy7897 Год назад

    I was telling myself something wrong with the audio in this video , haha

  • @radedjordjevic8638
    @radedjordjevic8638 2 месяца назад

    not work

  • @CP-dl4nc
    @CP-dl4nc 2 года назад +2

    It is really good but if someone already knows your voice they will detect the "machine" quality immediately. The tell is the lack of inflection and pitch that is directly connected to the context of the words. I see it as a tool but no substitute (yet) for an actual human.

    • @RealJamesArcher
      @RealJamesArcher  2 года назад +1

      Absolutely agree. The human touch makes all the difference.

    • @matthewfuller9760
      @matthewfuller9760 2 года назад +1

      @@RealJamesArcher It's 90% of the way there to replacing humans some of the time in video games.

  • @TomerGamerTV
    @TomerGamerTV Год назад

    1:41 at the second that the video started i already knew there was something wrong going on with your voice

  • @LostInTech3D
    @LostInTech3D 2 года назад

    sounds like you with the flu phoning into work 😂
    I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube.

    • @RealJamesArcher
      @RealJamesArcher  2 года назад

      Or audiobooks. I can't imagine trying to listen to a whole audiobook with an AI voice, because even a great one would still be...awkward.

    • @mnomadvfx
      @mnomadvfx 2 года назад

      "I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube"
      Au contrare - the current ones are terrible.
      This will at least give them a nice upgrade - unfortunately as they improve it will become harder to tell a real one from a fake one.
      Someone could literally just

  • @murrayr100
    @murrayr100 2 года назад +1

    It sounds distorted and emotionally flat. Like a real person who is having equipment problems. It's really not good enough for podcasting. It might be alright for some short edits though.

  • @keisaboru1155
    @keisaboru1155 Год назад

    sounds exactly like your broken mic hahaha xD

  • @ktestable
    @ktestable 2 года назад

    damn..

  • @moneymentor_channel
    @moneymentor_channel 12 часов назад

    not good anyway. The quality of the sound is bad, the voice cloning was decent.

  • @redferdroyale9510
    @redferdroyale9510 2 года назад

    Yea its much better

  • @chumleyk
    @chumleyk Год назад

    Too many channels using fake audio. Your one included.. It sounded fake from the beginning. Soon everyone will get sick of it once they know the signs and render all the libraries of videos using it as trash.

  • @tsechee
    @tsechee 2 года назад

    support chinese?

    • @strukitru
      @strukitru 11 месяцев назад

      AI doesn't really care about the language you speak. The only thing you should mind is that you want to train the model in the language you want it to speak in. The AI works with the phonetics of your input, not a databank of words of a given language. And since Chinese sounds different than for example English .. u get the idea.

  • @breehimself
    @breehimself 2 года назад +2

    mindblown.gif

  • @patrickstavros7429
    @patrickstavros7429 2 года назад +2

    congrats you just gave away your voice for free. You cannot stop them from using your voice in a podcast, commericial, product or service that you do not agree with or align with. The best part it you get no money in return, hence the ROYALTY-FREE term used in the agreement. You are a podcaster, you are making money off your RUclips channel, why on earth would you let your voice go for free?
    8.2 License to User Content. We claim no ownership rights in your User Content. You hereby grant to us a nonexclusive, royalty-free, sublicensable, worldwide license to access, reproduce, distribute, process, publish, display, perform, adapt, modify, analyze, and otherwise use the User Content to provide, maintain, and improve Descript and the Descript technology, without compensation to you, provided that our use of any Projects you create is subject to the usage limitations and confidentiality obligations set forth in Section 9 below.

    • @enriquemontero74
      @enriquemontero74 2 года назад +1

      is a simple voice lol

    • @percythefisherman
      @percythefisherman 2 года назад +1

      This is worrying. You have highlighted a very legitimate problem that for paid voice over artists and actors is a minefield. I feel that the producer of this video should at least let us know his opinion on this.

    • @mnomadvfx
      @mnomadvfx 2 года назад

      @@percythefisherman It's exactly why the initial enthusiasm for deepfake tech waned so quickly in academia.
      Juat like with the stem cell problem some time ago.
      The ethics problem reared it's ugly head, people started pointing fingers, legislation started restricting what they could do and funding dried up - the academics are too afraid to push the tech forward for fear of losing funding they need to do research.

  • @kbuss10
    @kbuss10 2 года назад +1

    what on earth is the point??? just use your own voice if you need to, it is much less work... this is not deepfake! deepfake would be if you TALK into it, and then it converts so it sounds like Dirty Harry, Obama, or who you train it to. if you do that from text youd stil have to adjust the voiceovers timing which is an extreemely tedious process basically unviable