Deepfake AI voice clone: 30min vs 8hrs of training (Descript Overdub demo)
HTML-код
- Опубликовано: 14 сен 2021
- I updated my Descript Overdub AI voiceover tool with 8 hours of training audio of my voice, and WOW did it make a difference! You've got to check this out.
3,670 Наука
I am planning on using this tool to get my friends arrested by confessing to crimes they didn't do!
Hey, that's a cool little trick.
Something liike this happen on Discord, never believe anything you see online.
Hell Nah
💀
That's what friends are for
Short but to the point, just what I wanted. Thank you
That’s a really good demo of the technology. Back when I had a bunch of podcast shows, this would have greatly reduced the time I spent editing them each week. I can only imagine where it will be in ten years.
Can probably just think about the words and they will appear....
It's pretty good! I wonder how it will sounds like after a 100 hours
Whooooo! Thanks for these videos. Answered all my questions before I pulled the trigger.
Sounds like I could replace audio book readers I hate with my favorite narrators. And if it's a little flat, good! Because what I hate is overacting.
mind blown, this is a game changer for a lot of businesses
Lol everything was text to speech man this is awesome, i have problems in create videos since my house makes a lot of noises with small space and multiple persons around this is awesome!
It still sounds a bit "crunchy".
But I'm not sure I would have thought you where a computer as much as using a bad microphone.
Yikes! I didn't notice the on first viewing (or hearing)! Second time through, I picked up where the system mushes some sounds, and that the intonation is flat, but overall... scary good.
Great improvement over the last test. Though its still quite noticeable when someone has heard your voice before and is wearing headphones. The artifacts like a slight electronic tinge and the unnatural inflection kind of reveal the whole sharade.
HOWEVER: If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone.
Technology is improving fast, likely it is that in 2022 most of these issues have been ironed out.
Yeah, the artifacts are one of the biggest giveaways, but I don't think it'll take them long to sort that out. It's amazing to me how realistic they're getting the intonation. It's only a matter of time.
"If someone isn't all that used to your voice or expecting the fakery, I imagine this could go unnoticed in the vein of a lackluster take and a spotty microphone"
^^ This 100%.
Not just the microphone, but also the audio compression which is far from perfect in every single encoding - especially if you are experiencing frame drops causing audio glitches at the same time.
It was obvious that this wasn’t your voice but still impressive honestly
This is sick!)) I couldn't tell you used Overdub for the entire video)))
I've been trying to improve the robotic voice that it gave me to begin with, but I cant seem to find the settings of how to upload my podcasts which I have an abundance of my natural voice. Can you do a video of how to do this?
Hey James -- great video. What would you recommend today is the best way to train a model that I have 10 hours of interview audio with?
For a documentary I'm working on -- I want to feed it audio of a professional actor performing a monologue I wrote, and use the model to overdub the documentary subject's voice onto.
This video deserves the follow just because it is so clever.
I noticed it right away, it reminds me of listening to low bit rate audio. But the 8hrs def made your voice sound higher rez. Maybe after 400 hours it will sound realistic?
Something I plan to do with Overdub is to clone my high voice for a character in my show I have, so when I get older and cannot do that anymore I will be able to use overdub to keep the voice in store.
Very accurate. I could tell though probably because I am wearing headphones and also expected it based on the subject material. Slight gaps and no breaths in the audio. However, if I didn't know any better it would fool me.
Sounds Awesome, I've added about 12mins. Is it best to make a new dub with new longer audio or add new audio by editing the existing one?
Awesome did you have some guide for foundational model to train voices ?
Wow...I hadn't noticed... this is fantastic
that was a good one James... Question, are you able to download the audio and put it elsewhere, let's say i wanna edit on Davinci, instead of on Descript's video editor?
That's pretty good! Did you use your own audio for the 8 hours or did you use their script? Not 100% clear if you have to use their script or not. I submitted 10 minutes and... it's not enough
I submitted about 45 minutes. It's not enough.
Is there a way to train an overdub voice on a specific speaker once speaker labels have been applied to the video?
i tried to train but it never worked for me
Impressive results!
lovin' your channel, hombre!
Thank you! 🙌
would this work when using someone else voice but I need to input words from another language?
wow its gotten way better.
It was pretty immediately apparent it was AI from the beginning. I’m actually kind of surprised because my personal voice clone on Descript sounds a little better while in yours I’m still hearing some artifacts.
Also, the 8 hours of voice training was probably a little wasteful because the voice trainer only accepts about an hour of training, as far as I remember which means it probably took the first hour and discarded the rest.
The improvement was almost entirely because their algorithm was recently updated and improved, not necessarily because you gave it more training
Edit: saw your other video was only a few months ago so the extra training did probably help. It’s also been awhile since I read about the max training time so they may have changed that shrug 🤷🏼♂️
Great video either way!
I think that different voices will get different results, well, just because they are different. For instance, I feel like a lot of quirks in James's clone are coming from the fact that his real voice has a lot of vocal fry. It's a kinda of natural distortion so it's probably not good for the AI. But anyway, it's very interesting to watch how thngs improve!
I feel like using compressed audio as training data isn't the best start however much more time it takes to upload uncompressed audio.
This is exciting and scary at the same time
Great demo!
Awesome. But how do you train the AI with the voice you want it to learn?
I could tell immediately because I was listening for it. Still very impressive! I think within a year they could have it perfected. I'm not sure what makes it sound off, it just has kind of a digital glitch like when you set spoken words to autotune.
Same, but only because I was partially expecting to get punk'd. If this was anyone else's video on a different topic, I would have assumed they had a sore throat.
Hey I would like to ask a tiny problem of descript do you have it before : wrong pronounce ! do you know how to fix it? Thank you
As soon As you played the first version and said you were going to compare I realized the entire video had been dubbed
Crazy how this was a year ago… my have we come a long way…
Incredible video!
Will descript immitate my accent as well or only my voice pitch and tone?
Mind Blown.
Is this means we only need to upload our voice id statement + the audio of our podcast/anything and not the script given by them for the initial voice overdub setup? Or we want to first upload voice id statement plus their 30 minute transcript to get overdub voice, then for more accurate overdub, upload other file with voice id statement with our podcast audio? Waiting for your reply.
Hey, Augustine, it pretty much means you just need the voice ID statement and then whatever audio you can pull together. I just took all the raw recordings from my past video shoots and stitched them together in a single audio file, and that worked for me!
I'm trained as hell, let's go. I could totally the tell the difference
Creepy... and cool. (Which is how good tech starts.) The question is James... will you use this power for good or evil?
I have lots of audio files of my deceased father, can I use descript to clone his voice or can I only train it by repeating their phrases? Please help, I would really like to bring my dad back thanks.
From the owner: "While you can edit Projects offline, you still need an internet connection to transcribe audio." Does this mean I cannot record output from my own voice without an internet connection once the project files have been generated? Transcription refers to the conversion of speech to text. I want to be able to type and have it output my voice without being online.
In other words, they dont have a general model of my voice. Every new word is novel and must be computed using their servers otherwise it would take forever?
I'm not positive, but I would assume that both the text-to-speech and speech-to-text require round trips to the server because they're both pretty processor intensive.
@@RealJamesArcher yep. I am thinking the same way.
This will be huge in the adult industry
Sick!
Can I create digital audio in Arabic?
Yup. I knew instantly your voice was ,ai even with the training, There is an underlying gravely sound in the voice with a hit of warping and electronic feel.
Wild....going to try this on my channel lol
Thanks for all the hard-work investigation and the Buy rating. Hoping my old laptop is up to specs for using it, because I plan on becoming addicted, to rest my voice.
Will we ever hear your voice live again?
Oh yes, I don't expect to actually use this much on a day-to-day basis. There's no substitute for the real human voice and the subtle distinctions it can make. I'll probably use this for occasional patching up or repairing something I said wrong, but not much else. I still plan to shoot my videos the old fashioned way!
I love this, but I hate this, but I love this.. you know?
I feel the same! It:s very clever, but it's a disturbing road to be on.
First time I’ve said wow out loud to an ai tool
I want to learn, you can open this good knowledge class
To be fair it doesn't sound much like your voice or you own sound arrangement, all that combination of mic proximity, echo and so on. But it does sound like a human voice :)
That is so awesome
This could give back someones voice lost to cancer
Wow! As great as this tool would be for content creators, I can see it 100% being a 'must have', for the criminally minded. "Hello Mr Archer, How are you doing today? I'm just calling you about your bank account......"
I noticed RIGHT FROM THE BEGINNING that you were using the AI
I could tell it wasn't really you like 5 words in, still has a tone to it that let's you know but not bad, but doesn't help that it made every last word of a sentence you say sound so low and goes down.
Once trained by your voice for 8 hours, can you then use the tool offline? I imagine not right.
Indeed.
I would be wary of doing this in the first place.
It's one thing for people like celebrities that have tens of thousands of hours of their voice on record due to their public exposure - but for the average joe not wanting their identity stolen this could potentially be dangerous.
soon you'll be able to make your own music with an artist that you want
Yeah, there are definitely some weird ethical concerns here. This particular company requires the training voice to recite a verbal contract, but there are ways to get around that (like having an impersonator do it) and nothing stopping people from downloading and using their own software on any training data they want. Weird times ahead.
Not good enough to replace voice actors yet but maybe within five years. I'm thinking mostly in terms of video games where we want characters to have an endless amount of things to say.
wow! OMG
It was obvious from the beginning BUT I wonder how much better it would get if you let autotune work on it🤔 do you know an audio engineer who could do this? Would love to see/ hear how this turns out. Maybe with autotune it would be even harder to tell the difference.
Yeah, I could tell from the very beginning that your voice sounded robotic. I would suggest using it for longer. Maybe, a couple of days because the difference between the 30 minute and the 3 hour is very big.
Wow.
sounds way better than mine which is still pretty shit after providing an hour of training.
It's pretty good, but it still has that robotic tone to it. I could tell it wasn't your actual voice from the beginning.
I was telling myself something wrong with the audio in this video , haha
not work
It is really good but if someone already knows your voice they will detect the "machine" quality immediately. The tell is the lack of inflection and pitch that is directly connected to the context of the words. I see it as a tool but no substitute (yet) for an actual human.
Absolutely agree. The human touch makes all the difference.
@@RealJamesArcher It's 90% of the way there to replacing humans some of the time in video games.
1:41 at the second that the video started i already knew there was something wrong going on with your voice
sounds like you with the flu phoning into work 😂
I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube.
Or audiobooks. I can't imagine trying to listen to a whole audiobook with an AI voice, because even a great one would still be...awkward.
"I can see this being bad news for us regarding those horrible text+speech videos that exist on youtube"
Au contrare - the current ones are terrible.
This will at least give them a nice upgrade - unfortunately as they improve it will become harder to tell a real one from a fake one.
Someone could literally just
It sounds distorted and emotionally flat. Like a real person who is having equipment problems. It's really not good enough for podcasting. It might be alright for some short edits though.
Perfect summation
sounds exactly like your broken mic hahaha xD
damn..
not good anyway. The quality of the sound is bad, the voice cloning was decent.
Yea its much better
Too many channels using fake audio. Your one included.. It sounded fake from the beginning. Soon everyone will get sick of it once they know the signs and render all the libraries of videos using it as trash.
support chinese?
AI doesn't really care about the language you speak. The only thing you should mind is that you want to train the model in the language you want it to speak in. The AI works with the phonetics of your input, not a databank of words of a given language. And since Chinese sounds different than for example English .. u get the idea.
mindblown.gif
congrats you just gave away your voice for free. You cannot stop them from using your voice in a podcast, commericial, product or service that you do not agree with or align with. The best part it you get no money in return, hence the ROYALTY-FREE term used in the agreement. You are a podcaster, you are making money off your RUclips channel, why on earth would you let your voice go for free?
8.2 License to User Content. We claim no ownership rights in your User Content. You hereby grant to us a nonexclusive, royalty-free, sublicensable, worldwide license to access, reproduce, distribute, process, publish, display, perform, adapt, modify, analyze, and otherwise use the User Content to provide, maintain, and improve Descript and the Descript technology, without compensation to you, provided that our use of any Projects you create is subject to the usage limitations and confidentiality obligations set forth in Section 9 below.
is a simple voice lol
This is worrying. You have highlighted a very legitimate problem that for paid voice over artists and actors is a minefield. I feel that the producer of this video should at least let us know his opinion on this.
@@percythefisherman It's exactly why the initial enthusiasm for deepfake tech waned so quickly in academia.
Juat like with the stem cell problem some time ago.
The ethics problem reared it's ugly head, people started pointing fingers, legislation started restricting what they could do and funding dried up - the academics are too afraid to push the tech forward for fear of losing funding they need to do research.
what on earth is the point??? just use your own voice if you need to, it is much less work... this is not deepfake! deepfake would be if you TALK into it, and then it converts so it sounds like Dirty Harry, Obama, or who you train it to. if you do that from text youd stil have to adjust the voiceovers timing which is an extreemely tedious process basically unviable