SubZakk check out the channel “two minute papers”, the creator’s catchphrase is “What a time to be alive!”. Probably my favourite channel on RUclips. Great content in short presentations.
I think this is a unique piece of software. this software, can potentially, give someone who has lost their physical voice, the powerful ability to speak fluently while assisted. you have done a remarkable thing. well done.
I think the best way to get the cloner to produce the tone or structure you want would to have it try to use your own voice's inflections and changes in tone. and use that as the guide/instructions as to how it should make the cloned voice speak and sound during speaking.
The first two voices are British English in the sample but gain a US accent in the synthesis (though they do admittedly sound like those same people, just putting on an American accent). I'm guessing that's due to the voices of the audiobooks used in some of the training? Very clever though, nice work!
Question: Can you use something like this to take a voice input, rather than a piece of text? That way, you can do things like preserving tone and inflection. It's probably a couple of additional steps, but I would love to have something like that.
You mean by speaking to the mic? That would be scary tho, but also great (scary - for people to actually impersonate higher-up celebrities, etc pretending to be them live - would be more catfishes tho)
Is there an open-source solution that doesn't focus on speed, but rather on quality and that you can clone on a large dataset? I can only find this as an open-source packet that does voice cloning...
This is great software and thanks for keeping it free. I was searching for voice cloning software to clone my granny's voice(she is not alive) so i can feel her close to me. Thanks again for such a wonderful tool. I will try it.
This would help greatly for something like making new voice lines for mods using the vanilla voice actors in Skyrim/Fallout. Bit of a legal and Intellectual Property grey area though when you use establishrd actors' voices through this as opposed to the generic voices (ie making new lines using the voices of Max Von Sydow, Joan Allen, Vladimir Kulich, Michael Hogan, etc for their respective characters when they never really recorded them, a machine did it via programming and learning) Scary when you look at it as it could theoretically bring dead actor's voices back to life with the right samples too. (Like doing another Godfather game without Marlon Brando voicing Vito Corleone)
Well if you're doing that in EU it's illegal because voice is a private data thus protected by the RGPD. (Yes I wanted to use celebrity audio samples too so I made a bit of researches on legal aspects lol).
A work around would be to replace all the original voices with completely synthetic ones that could be freely used by the community. And what if a company only ever uses synthetic voice models in the first place? Would they have exclusive rights to the model? What about a model built using embeddings from the company's model?
This seems pretty good. Although I don't much care about the voice synthesis. I want to try to use this as a realtime voicechanger. That's going to take a bit of work.
@@HonorMacDonald Well this already IS a way to do it. Same way this is done can for sure be used but I didn't get around to making it. I'm not a programmer so it would be a lot of work for me to make it.
@@OwenPrescott No, synthesis is already a thing, I don't want to go that route. I wanted to do a realtime voicechanger that goes phoneme by phoneme. If you're doing it with synthesis and you're speaking a longer word, your synthesis based voicechanger has to wait for the whole word to be heard before it synthesizes it because otherwise it wouldn't know how to pronounce it.
A while back I changed the description of this video to invite whoever is interested in cloning a voice to not use my repo and to instead head over to resemble.ai. That came out as a sellout and that wasn't my intention. I've changed back the description. My initial intention with that message was to avoid having new people spending hours trying to setup the project ultimately to give up or to obtain subpar results. While resemble does offer a free plan that will let you clone your voice with more naturalness than this project will, purely for legal reasons we cannot allow you to clone the voice of someone else without a bit of legal work. This repo lets you do anything you want on that regard, so now I get why people want to use it. I've posted an update here: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/364. I've also included a link in the readme to that installation guide that seems to work for most people.
I would love to figure out to take what you have done and turn it into a parrot for my robot. I.e. Have the robot ask a few questions, record the answers, then use those answers to start talking back to the user in their OWN voice. (These would be scripted questions, answers) But I think that would be an awesome project. It's almost like terminator, the robot becomes the person!!
yeah uh, is there a way to get a pre-compiled executable version of this that doesn't require me to manually install and run python and know how to use python?
@@markgiroux3442 all the paid services are web based AFAIK, i want a desktop executable, which is, i assume, how python *would* be working once set up, its just the set up part that i cant seem to do correctly
Hello! whenever I try to import audio file (Urdu language ) in sv2tts so i get this error (can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool). And I already convert my audio file datatype for float, int, complex but still not working. I am very disappointed. Please help me as soon as possible. @Corentin Jemine
Doesn't look like could be done easily - but can be done. See this thread: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/30#issuecomment-507864097
the dataset speaker and utterance things are grayed out for me why is that also it generates quite a different picture while vocoding the thing and the output doesnt really sound like the put in sound ive synthesized and vocoded a few now and its further away on the thing than a completely different speaker i guess ive set up something wrong but its not very clear
the fact that this piece of technology is hidden behind a wall of technobabble is the only thing barring me from making cicero skyrim recite insane clown posse lyrics. unfortunately for me and almost every other person on planet earth this is completely unuseable. what is code. i don't know. please help
@@Hsaelt im aware of that of course. but until someone starts selling a user friendly version of this i am going to have to sit here and stew in the fact that my own ignorance in this field is the ONE thing stopping me from making Commander Data say 'fuck'. sitting and shaking my fist at the sky. one day
It's sad that you are saying this isn't worth my or your time, simply because you want to collect some dividend from resemble? I don't get it, something like resemble requires a specific set of words to be learned from, that's ridiculous if you are trying to teach an AI anything but your own voice.... this is still a fantastic resource if you have the patience to set it up, with a little modification it can learn its own voice and search the internet for say... idk maybe a podcast host that consistently has good quality audio, and over the years get better and better and its own speech, building on its possible minute changes in intination and pitch and speed. Idc what your description says, this is bloody brilliant man
This is a project that assumes you have prior knowledge in programming and in python. It's not a piece of software distributed for everyone to use. I merely added a fancy GUI for the sake of demonstration and for toying around.
The project is dependent on you getting a lot of extremely finicky libraries and modules to get it running, so it's unlikely that really anyone aside from the creator can get it up and running.
can someone explain to a small brain how to install all of this? I have python, just dont understand the steps on github to get pytorch and everything else.
Very interesting! Do you think it's possible to preform speech to speech synthesis? Keeping tone and pitch of voice of the input intact for the output? I want to be solid snake :O
ruclips.net/video/nvBF0Rf2m2Y/видео.html I was able to get it up and running but I got an error where it said I couldn't use mp3 files. Follow every step EXACTLY make sure you download all the things, in addition to something called ffmpeg. THE ONLY EXCEPTION is leave the tensorflow==1.15 on. IF YOU END UP WITH THE SAME ERROR as I did just run the demo with this [python demo_toolbox.py --no_mp3_support] you can easily convert mp3 to wav or flac flies online. it works but its super delicate when it comes to background noises. make sure you upload clean audio. i'm running a 908ti on my computer so tensorflow 1.15 worked for me, if it doesn't then uninstall it from the anaconda thing and use tensorflow 1.14 this sounds like a lot, it is a bit complicated but its well worth it, it took me 6 hours because of dumb errors but I got it to work.
howcome, that people like you, make smart comments on someting, but arent even able to get the smallest amount of brain cells? you wonder why im asking this? really? ok i will not act like you did, i will act smarter. 1. not anyone has knowledge aboout codes or python based systems etc. 2. many like to use this and are used to have a simple installer ready to use. 3. just because they dont know about it doesnt mean they are not worthy to wish to have such a tool for their work or purpose but still dont know how to use it without an simple installer. 4. howcome that people like you overseen such simple things? how? or howcome they think anyone is a pro in computing or using codes and stuff? it turns out for me, you tried made a smart comment but at the same time you fail about simple think processe.
@@richardwatterstan another stupid answer. are you family? lol. 1. i did the code. 2. years ago. 3. didnt talked about myself but in the name of all other casuals. THINK twice and use your brain. same mistake...
Hello, If you are not a coder and have now clue on how to work with Pytorch, how can you use these tools? I`m a video editor, I want to rebuild audio for dropped connections on streams.
Librosa will be unable to open mp3 files if additional software is not installed. Please install ffmpeg or add the '--no_mp3_support' option to proceed without support for mp3 files. the got the above message when run this:python demo_toolbox.py I had already installed ffmpeg.
been getting a ton of errors and eventually gave up. got it to actually give me a result at first albeit a very bad one. it essentially just farted at me. but now i can't get it to work at all. too much of a head ache to try and fix it
DID IT !!!!!! Had errors so hours of troubleshooting. Use anaconda with a virtual environment theres a command for getting cuda, and others all working from the virtual env. !
Running this tool is cool, but haphazard.. I get a lot of long pauses, "breathing", lost audio, garbled audio. I have good samples. Do the samples HAVE to conform to 5 seconds, or can you use longer ones? Again, this works, but I want to make the clones sound more natural. If anyone has some tips on improving playback, please post them here? Looking for things to do during this unending layover at home due to C-19.
I'm having the same issue as you -- long pauses, missing words, garbled audio, and that weird "blowing on the microphone" noise. Haven't been able to find a fix yet.
Another question: Could you use something like this to get speech input in one language, and then use a piece of text in another language to get speech output of that language? That function will be more interesting.
So basically you have already trained the models in English language and now using knowledge transfer for any voice type who is speaking in English to get the generated voices ?
I find this really awsome and i would like to try it out myself but i'm new to python and all this stuff, i know i'm probably just really dumb missing something that i shouldn't but is there any tutorial/guide that would be able to teach someone that never used any program like this before? just want to make sure that i'm not missing something that's so dumb it hasen't been described. i mean from opening the program to generating the tts
I wonder how does the law and copyright holders react if you for example train the software with famous actors' voice from movies and use the generated audio commercially on your own products.
I run a lot of DnD and other tabletop games, I have voice augmenting software with notes like "Mountain Troll, "Squirrel Folk", etc, so I can change my voice in real-time while I speak for NPCs that interact with characters. With all these embeddings, spectrograms, etc. Would I be able to use X amount of audio of Morgan Freeman (Audio A), then record my own voice saying the exact same thing at the same pace as the Morgan Freeman voice file (Audio B) and it would be able to log all the differences between them so that when I talk in real-time, it applies all the changes so my voice of Audio A comes out sounding like Audio B?
Yo lo que busco es un video donde pueda clonar la voz de una persona que falleció pero solo cuento con un video donde dice 3 palabras,y quiero que en ese audio ella diga algo así como "Te amaré por siempre mi pequeña" es para un regalo
Thank you! This finally works, but "Dataset", "Speaker", and "Utterance" are all greyed out. also, I have no option for "pretrained" under any sub-heading. I have "Encoder" under encoder, "Synthesizer" under synthesizer, and "Vocoder" and "Griffin-Lim" under vocoder. Can you help me sort that out please? I have downloaded LibriSpeech and unpacked it to a folder within the root directory, but I have no option to select it anywhere that I can see. UPDATE: 2 things, I needed to pass this argument: [python demo_toolbox.py -d ], and then I had to remember that the GZ file had to be unzipped, and THEN the resulting TAR file had to be unzipped. 😕...But now my noob ass has actually unlocked this thing, and it's working - even the dataset voices! I STILL have no option for "Pretrained" anywhere, so I don't know why others have that, but that's pretty much the last thing I've seen that I'm lacking.
@@fischy0339 That bit of coding that's within the "[ ]" in my response is the bit that you need to copy/paste into a command prompt, and hit "enter". That passes an argument that allows the program to access those functions. You also need to make sure that the GZ is unzipped to a TAR file, and then that has to be unzipped also. It has now been a while since I did this myself, so hopefully it will help you!
Hello Sir. Tried to install your tool from your github link. But facing some issue with pytorch and tensorflow-gpu. I even tried to modify the requirements.txt but to no avail. Can you please post a video for the installation of this great tool. Thanks in advance!
Issues with TensorFlow usually relate to the CUDA driver. You MUST have the CUDA toolkit installed with a driver at 9.0 or higher. I ran into this while trying to use an old supposedly CUDA-enabled video card installed, but this failed. I then switched to a Dell laptop running native Ubuntu (no VirtualBox VMs, must be native) and got it work pretty easily once I figured out all the tools that needed to be downloaded. Also, don't edit requirements txt file, there is no need. In addition, you don't need the audio sample library, as this is 5.6 GB of wasted HD space. Just make sure your WAV files are of good quality with little to no noise in the background. I used TV news people and it worked pretty well.
Super cool. If you use longer input samples do you get a cleaner output? Some of the output sounds like its breaking up or glitchy... maybe extra reverb in the sound. I'm not sure how to describe it.
But can you have it work in realtime with a microphone instead of typing what I want to say? It's not realtime if I can't talk with my microphone and have it come out with that voice
realtime in this context is the ability to type and let the algorithm generate speech. This is not an app for your phone, this is just an example of the algorithm which could be downloaded from github. This is meant to be for Machine Learning developers and students who use python.
Hi Corentin, I wanna use the Real-Time Voice Cloning tool on Google Colab: colab.research.google.com/github/ak9250/Real-Time-Voice-Cloning/blob/master/Real_Time_Voice_Cloning.ipynb But Google Colab don´t support interactive GUI elements. So I must use the cloning tool on command line without GUI. Can u give me some advices? With Best Regards !
Hello! I installed everything as it should, but I cannot figure out how to open the application as in the video? exe files are not present, thanks for the answer in advance!
This is really clever. It would be neat if you could have a speaker talk into it instead of text to speech that way you can capture emotion and inflection too but I get it that's hard to do
Amazing. This could be really useful for people with Lou Gehrig's disease etc if the model could be trained or 'banked' before serious symptoms appear. I wonder if anyone has already done any work in that area/that regard.
when I click at the "Vocode only" or "Synthesize and vocode" using the pretrained vocode hapenn thisr: cuda runtime error(100) : no CUDA-capable device is detected at ../aten/src/THC/THCGeneral.cpp:50
Wondering, what would be the best way to clone voice timbre without TTS? I've tried Real-Time-Voice-Cloning but it seems to generate only TTS text to the target voice and has no emotions whatsoever. I would like to record my own phrase with my voice, and then encode it as if spoken by the target person's voice and keep my original emotions and inflections.
@Vegan Pete But will it be able to apply those emotions at the exact places where I need them, and not by some "typical behavior"? An example. I want to have a sentence that sounds authoritative and patronizing and puts accents on specific words. Let's say, I have a large voice library of someone who has been reading different styles - patronizing, normal, depressed etc. I doubt Descript will somehow automagically know which voice style to pick for which sentence, and which words to accentuate. If there was a system that could pick up the emotions and accents for specific text I'm recording, and then apply them directly onto a TTS engine voice, this would make indie game development so much easier - you wouldn't need voice actors to record your phrases, you could record them yourself and then run through the hypothetical "voice changing engine".
Keep at it. I got it to work using a Cuda-enabled Nvidia graphics card on a Dell laptop. You don't need the 5.6 GB voice samples either. It does take practice and patience to get this going.
4 года назад+3
no you arent stupid! i have tried to install this since july 2019 and still got no help.
@ That's what I hate about GitHub. Never a folder with a compiled executables you can just start. According to the documention file you need to install more than just Python. You need to install as well libraries and other things like Tensorflow-gpu, umap-learn, visdom, webrtcvad, librosa, matplotlib, numpy, scipy, tqdm, sounddevice, Unidecode, inflect , PyQt5, multiprocess and numba. According to the manual you can do this with just one command "pip install -r requirements.txt". But the program "pip" is not a standard Windows command or program and there is no information about where to download the executable "pip". Maybe it only works in a different operating system. There are programs that make it easier to start py-files but it will never work without the required libraries. Very annoying and frustrating. :(
What a time to be alive !
I see what you did there
@@PokettoMusic I dont understand haha
SubZakk check out the channel “two minute papers”, the creator’s catchphrase is “What a time to be alive!”. Probably my favourite channel on RUclips. Great content in short presentations.
@@pglove Ahh yeah I watch him and I just now realized he says that thank you
Hold on to your papers
I think this is a unique piece of software. this software, can potentially, give someone who has lost their physical voice, the powerful ability to speak fluently while assisted. you have done a remarkable thing. well done.
pretty sure apple is doing this for ios 17
Plot twist: This video was narrated by the AI
of course it was!
Always has been
plot twist: this comment was copied from others
@@warker6186 how do u know
this comment is like a year old
any tutorial on how to this? from the start?
Wow ! Nice work, nice presentation, congrats !
This is awesome and horrifying at the same time.
I think the best way to get the cloner to produce the tone or structure you want would to have it try to use your own voice's inflections and changes in tone. and use that as the guide/instructions as to how it should make the cloned voice speak and sound during speaking.
The first two voices are British English in the sample but gain a US accent in the synthesis (though they do admittedly sound like those same people, just putting on an American accent). I'm guessing that's due to the voices of the audiobooks used in some of the training? Very clever though, nice work!
I can confirm this actually does work on windows 10. I followed the main branch of the github repo.
Question: Can you use something like this to take a voice input, rather than a piece of text?
That way, you can do things like preserving tone and inflection.
It's probably a couple of additional steps, but I would love to have something like that.
You mean by speaking to the mic? That would be scary tho, but also great (scary - for people to actually impersonate higher-up celebrities, etc pretending to be them live - would be more catfishes tho)
Is any such toolkit available?
would love this for my dnd characters
did you ever find an answer?
@@J4cobSkiJumpingImagine like an horror game like that s
Is there an open-source solution that doesn't focus on speed, but rather on quality and that you can clone on a large dataset?
I can only find this as an open-source packet that does voice cloning...
Why not start with a demonstration of how to install and run?
I know, im trying to work that out now
:D
use google man
or you could just follow the guide on the github page. just dont be lazy
@@WomboBraker followed the instructions on github, still having problems to run this thing
This is great software and thanks for keeping it free. I was searching for voice cloning software to clone my granny's voice(she is not alive) so i can feel her close to me. Thanks again for such a wonderful tool. I will try it.
Finally i can now complete my Conan the detective cosplay
This is amazing and frightening at the same time. Fantastic job!
Is there a simple Windows installer I can run, cos all I see at GH is a lot of individual file thingies? Thanks
This is really freaking cool! My friends are going to get freaked out on discord lol
ahhahaahah
This would help greatly for something like making new voice lines for mods using the vanilla voice actors in Skyrim/Fallout.
Bit of a legal and Intellectual Property grey area though when you use establishrd actors' voices through this as opposed to the generic voices (ie making new lines using the voices of Max Von Sydow, Joan Allen, Vladimir Kulich, Michael Hogan, etc for their respective characters when they never really recorded them, a machine did it via programming and learning)
Scary when you look at it as it could theoretically bring dead actor's voices back to life with the right samples too. (Like doing another Godfather game without Marlon Brando voicing Vito Corleone)
Well if you're doing that in EU it's illegal because voice is a private data thus protected by the RGPD.
(Yes I wanted to use celebrity audio samples too so I made a bit of researches on legal aspects lol).
A work around would be to replace all the original voices with completely synthetic ones that could be freely used by the community.
And what if a company only ever uses synthetic voice models in the first place? Would they have exclusive rights to the model? What about a model built using embeddings from the company's model?
You would need a different network for each tone though (narrative, angry, interrogative, etc.)
@@mikerhinos but if you publish it (by playing in a movie) gdpr might not protect it.
Never thought of that! Nice!
I would love to get this on my PC. Sadly, we all get lost after reading Number 1.
HOLYYYYY. I don't even know what to say. The possibilities are endless for this tech.
Like what?
You're one of the good guys, aren't you?
@@1hitkill973 I'm honestly curious about this. I'll love to hear about practical uses of this technology.
I need to test this on GLaDOS
How did it go ?
yes pls tell how did it went
exactly the reason I am looking for this
did you get it to work?? i cant get it to work :(
@ tried to train it on glados but there is no way you'll get the robotic voice. it just sounds like a really bad human voice :^(
This seems pretty good.
Although I don't much care about the voice synthesis. I want to try to use this as a realtime voicechanger. That's going to take a bit of work.
I'm very interested in this functionality, to voice different characters in animations. Did you ever find a way to do it?
@@HonorMacDonald Well this already IS a way to do it. Same way this is done can for sure be used but I didn't get around to making it. I'm not a programmer so it would be a lot of work for me to make it.
synthesis is the important part, voice changer can be added later
@@OwenPrescott No, synthesis is already a thing, I don't want to go that route. I wanted to do a realtime voicechanger that goes phoneme by phoneme.
If you're doing it with synthesis and you're speaking a longer word, your synthesis based voicechanger has to wait for the whole word to be heard before it synthesizes it because otherwise it wouldn't know how to pronounce it.
¿No está disponible en español?
A while back I changed the description of this video to invite whoever is interested in cloning a voice to not use my repo and to instead head over to resemble.ai. That came out as a sellout and that wasn't my intention. I've changed back the description.
My initial intention with that message was to avoid having new people spending hours trying to setup the project ultimately to give up or to obtain subpar results.
While resemble does offer a free plan that will let you clone your voice with more naturalness than this project will, purely for legal reasons we cannot allow you to clone the voice of someone else without a bit of legal work.
This repo lets you do anything you want on that regard, so now I get why people want to use it. I've posted an update here: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/364. I've also included a link in the readme to that installation guide that seems to work for most people.
I would love to figure out to take what you have done and turn it into a parrot for my robot. I.e. Have the robot ask a few questions, record the answers, then use those answers to start talking back to the user in their OWN voice. (These would be scripted questions, answers) But I think that would be an awesome project. It's almost like terminator, the robot becomes the person!!
@Robert Sapolsky It's a python project, so you will need to run it with a python interpreter.
Does it work with non english langauge
Does it work with different languages?
Resemble AI seems like a cool company ;)
This Guy doesn't get enough credit. Every implementation of this technology should be paying you 10% at least.
Wow... just wow. I am in awe of your work!
yeah uh, is there a way to get a pre-compiled executable version of this that doesn't require me to manually install and run python and know how to use python?
Do you want easy, or do you want free? There's plenty of paid services out there if you don't want to put in the minimal work to get this set up
@@markgiroux3442 all the paid services are web based AFAIK, i want a desktop executable, which is, i assume, how python *would* be working once set up, its just the set up part that i cant seem to do correctly
This... This is amazing ! Thank you for creating that tool and for making it so accessible !
Hello! whenever I try to import audio file (Urdu language ) in sv2tts so i get this error (can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool).
And I already convert my audio file datatype for float, int, complex but still not working. I am very disappointed. Please help me as soon as possible. @Corentin Jemine
There is a standalone version in the form of an installation file, as is done with Stable deffusion, to run the program through an exe shortcut.
Can voices in other languages be used too?
I think you are waiting for Japanese?)
@@NoName-br8pb maybe? ;) But Russian Shrek dub was amazing.
ЧТО ТЫ ДЕЛАЕШЬ В МОЁМ БОЛОТЕ!?
@@KatouMegumiosu lol
Doesn't look like could be done easily - but can be done. See this thread: github.com/CorentinJ/Real-Time-Voice-Cloning/issues/30#issuecomment-507864097
@@KatouMegumiosu привет, а можешь, пожалуйста, подсказать, как запустить программу?) Я туплю дико. Скачал zip с гита, но не догоняю, где exe)).
the dataset speaker and utterance things are grayed out for me why is that
also it generates quite a different picture while vocoding the thing and the output doesnt really sound like the put in sound
ive synthesized and vocoded a few now and its further away on the thing than a completely different speaker
i guess ive set up something wrong but its not very clear
the fact that this piece of technology is hidden behind a wall of technobabble is the only thing barring me from making cicero skyrim recite insane clown posse lyrics. unfortunately for me and almost every other person on planet earth this is completely unuseable. what is code. i don't know. please help
lmao to get something you need to sacrifice something no one's gonna give you everything ready to use
@@Hsaelt im aware of that of course. but until someone starts selling a user friendly version of this i am going to have to sit here and stew in the fact that my own ignorance in this field is the ONE thing stopping me from making Commander Data say 'fuck'. sitting and shaking my fist at the sky. one day
A question that the original audios of the voices to imitate have to say
hello! i'm going crazy trying to understand if is possible to use italian
It's sad that you are saying this isn't worth my or your time, simply because you want to collect some dividend from resemble? I don't get it, something like resemble requires a specific set of words to be learned from, that's ridiculous if you are trying to teach an AI anything but your own voice.... this is still a fantastic resource if you have the patience to set it up, with a little modification it can learn its own voice and search the internet for say... idk maybe a podcast host that consistently has good quality audio, and over the years get better and better and its own speech, building on its possible minute changes in intination and pitch and speed. Idc what your description says, this is bloody brilliant man
I am searching for a better install tutorial. I need to know in which terminal to write everything. Kinda new in this. Thanks
still works?
This looks like amazing technology but I wish there were more clear directions on how to use it as the readme on github is not very clear
This is a project that assumes you have prior knowledge in programming and in python. It's not a piece of software distributed for everyone to use. I merely added a fancy GUI for the sake of demonstration and for toying around.
Seems very clear to me, Jacob. Like Corentin mentioned to you, this isn't consumer software.
@@Artixou Im kind of glad its not just wrapped up for anyone to use.
@@CorentinJemine For those people who doesn't have a knowledge in python but want to use this program. Can you please share this GUI ?
The project is dependent on you getting a lot of extremely finicky libraries and modules to get it running, so it's unlikely that really anyone aside from the creator can get it up and running.
Would it be possible to use this for other languages besides English?
Perfect for indie game devs who want some voice acting. Not perfect but good enough
can you do a tutorial for infusing this technology into a voice Chatbot?
The fact is that the AI tools that exist today existed before
How do you import data sets like you have in the drop-down menu? Mine says "Random".
Let me guess, 3 years later and no progress was made?
can someone explain to a small brain how to install all of this? I have python, just dont understand the steps on github to get pytorch and everything else.
Very interesting! Do you think it's possible to preform speech to speech synthesis? Keeping tone and pitch of voice of the input intact for the output? I want to be solid snake :O
How do I even get started? github download has no app to launch or anything. How do I get to that program?
howcome whenever something amazing like this is brought to light its impossible for anyone else to put it together themself, even with the given code?
Conspiracy?
ruclips.net/video/nvBF0Rf2m2Y/видео.html
I was able to get it up and running but I got an error where it said I couldn't use mp3 files. Follow every step EXACTLY make sure you download all the things, in addition to something called ffmpeg. THE ONLY EXCEPTION is leave the tensorflow==1.15 on.
IF YOU END UP WITH THE SAME ERROR as I did just run the demo with this [python demo_toolbox.py --no_mp3_support] you can easily convert mp3 to wav or flac flies online. it works but its super delicate when it comes to background noises. make sure you upload clean audio.
i'm running a 908ti on my computer so tensorflow 1.15 worked for me, if it doesn't then uninstall it from the anaconda thing and use tensorflow 1.14
this sounds like a lot, it is a bit complicated but its well worth it, it took me 6 hours because of dumb errors but I got it to work.
howcome, that people like you, make smart comments on someting, but arent even able to get the smallest amount of brain cells? you wonder why im asking this? really? ok i will not act like you did, i will act smarter. 1. not anyone has knowledge aboout codes or python based systems etc. 2. many like to use this and are used to have a simple installer ready to use. 3. just because they dont know about it doesnt mean they are not worthy to wish to have such a tool for their work or purpose but still dont know how to use it without an simple installer. 4. howcome that people like you overseen such simple things? how? or howcome they think anyone is a pro in computing or using codes and stuff? it turns out for me, you tried made a smart comment but at the same time you fail about simple think processe.
@@CYM80 Learn to code.
@@richardwatterstan another stupid answer. are you family? lol. 1. i did the code. 2. years ago. 3. didnt talked about myself but in the name of all other casuals. THINK twice and use your brain. same mistake...
I need an installations tutarial 😢
python demo_toolbox.py doesnt open, any solutions ?
Hi , same problem for to me, it make some issues on lines: 5, and another lines in another files, can you solve your problem?
Hello,
If you are not a coder and have now clue on how to work with Pytorch, how can you use these tools? I`m a video editor, I want to rebuild audio for dropped connections on streams.
Is there a step by step tutorial on how to get into the interface?
And that's how Terminator discovered that John Connor's adoptive parents were killed by T-1000
Which languages have been supported? I didn't find any info about it. I would like to play with hungarian language and text.
Librosa will be unable to open mp3 files if additional software is not installed.
Please install ffmpeg or add the '--no_mp3_support' option to proceed without support for mp3 files.
the got the above message when run this:python demo_toolbox.py
I had already installed ffmpeg.
wow it can be used on real-time translation with the speaker's own voice. thats cool
Hello!
This is awesome tool!
How can i adapt my own datasets for training? The program allows you to use only those listed in the training section.
Where find this vocals, that you use for example
This will change my life
been getting a ton of errors and eventually gave up. got it to actually give me a result at first albeit a very bad one. it essentially just farted at me. but now i can't get it to work at all. too much of a head ache to try and fix it
DID IT !!!!!! Had errors so hours of troubleshooting. Use anaconda with a virtual environment theres a command for getting cuda, and others all working from the virtual env. !
Where did you get the software? I'm still having trouble finding it. Thank you
could you tell me how to train models?
I'm from Brazil, here you won't find anything like your videos.
Thank you very much for your attention!
Running this tool is cool, but haphazard.. I get a lot of long pauses, "breathing", lost audio, garbled audio. I have good samples. Do the samples HAVE to conform to 5 seconds, or can you use longer ones? Again, this works, but I want to make the clones sound more natural. If anyone has some tips on improving playback, please post them here? Looking for things to do during this unending layover at home due to C-19.
I'm having the same issue as you -- long pauses, missing words, garbled audio, and that weird "blowing on the microphone" noise. Haven't been able to find a fix yet.
Another question: Could you use something like this to get speech input in one language, and then use a piece of text in another language to get speech output of that language? That function will be more interesting.
So basically you have already trained the models in English language and now using knowledge transfer for any voice type who is speaking in English to get the generated voices ?
Yeah
I can't execute pip install -r requirements.txt
where i am supposed to find that txt file?
I find this really awsome and i would like to try it out myself but i'm new to python and all this stuff, i know i'm probably just really dumb missing something that i shouldn't but is there any tutorial/guide that would be able to teach someone that never used any program like this before? just want to make sure that i'm not missing something that's so dumb it hasen't been described. i mean from opening the program to generating the tts
That's incredible. What about different language accents?
I wonder how does the law and copyright holders react if you for example train the software with famous actors' voice from movies and use the generated audio commercially on your own products.
how can we install the program
Omg! This is amazing! Very nice work,man!
Where do I download more vocoders?
I run a lot of DnD and other tabletop games, I have voice augmenting software with notes like "Mountain Troll, "Squirrel Folk", etc, so I can change my voice in real-time while I speak for NPCs that interact with characters. With all these embeddings, spectrograms, etc. Would I be able to use X amount of audio of Morgan Freeman (Audio A), then record my own voice saying the exact same thing at the same pace as the Morgan Freeman voice file (Audio B) and it would be able to log all the differences between them so that when I talk in real-time, it applies all the changes so my voice of Audio A comes out sounding like Audio B?
I'm not sure if that's possible right now. But it will. Crazy things will happen in the next two decades.
@@BlackStarEOP 2 years and no practical progress, feels bad XP I'm sure a lot of progress has actually been made, just it will take A LOT more even :P
Yo lo que busco es un video donde pueda clonar la voz de una persona que falleció pero solo cuento con un video donde dice 3 palabras,y quiero que en ese audio ella diga algo así como "Te amaré por siempre mi pequeña" es para un regalo
the fact that this has already existed 4 years ago
how do i run this? i have the zip but its all in parts i have no idea how this is supposed to be "compiled"?
after selecting speaker if i click on load button then it shows load() takes 1 positional argument but 2 were given
Thank you! This finally works, but "Dataset", "Speaker", and "Utterance" are all greyed out. also, I have no option for "pretrained" under any sub-heading. I have "Encoder" under encoder, "Synthesizer" under synthesizer, and "Vocoder" and "Griffin-Lim" under vocoder. Can you help me sort that out please? I have downloaded LibriSpeech and unpacked it to a folder within the root directory, but I have no option to select it anywhere that I can see.
UPDATE: 2 things, I needed to pass this argument: [python demo_toolbox.py -d ], and then I had to remember that the GZ file had to be unzipped, and THEN the resulting TAR file had to be unzipped. 😕...But now my noob ass has actually unlocked this thing, and it's working - even the dataset voices!
I STILL have no option for "Pretrained" anywhere, so I don't know why others have that, but that's pretty much the last thing I've seen that I'm lacking.
Thanks, I've been looking for this solution
Did you ever figure out the pretrained or other vocoder options?
@@michaelsmith4904 Sadly, no. :(
how exactly did you get them working now?
@@fischy0339 That bit of coding that's within the "[ ]" in my response is the bit that you need to copy/paste into a command prompt, and hit "enter". That passes an argument that allows the program to access those functions. You also need to make sure that the GZ is unzipped to a TAR file, and then that has to be unzipped also. It has now been a while since I did this myself, so hopefully it will help you!
I think I'm close to opening the toolbox, but I'm getting syntax errors when trying to run the module for requirements.txt in Python 3.7.0 Shell.
Then try installing each segment of requirements manually via py -m pip install (module name) without brackets
@@Hsaelt Thanks for the advice my good man! After countless trial and errors I got it open.
@@SneakiestChameleon sigh glad it worked for you ar least. I am still trying to make it work.
DOES ANYONE KNOW WHERE TO DOWNLOAD THE TOOLBOX?? I can't seem to find it online and when I downloaded from the link it didn't come with an executable.
you need to clone and setup the github repo
@@notigor325 and where do I find that?
@@OvSpP link in description
It still sounds a bit robotic. You definitely can hear a small difference but whoa I love it
Hello Sir. Tried to install your tool from your github link. But facing some issue with pytorch and tensorflow-gpu. I even tried to modify the requirements.txt but to no avail. Can you please post a video for the installation of this great tool. Thanks in advance!
@Chris Connelly not yet
hey !!!!
did you install it in terminal / cmd ,
see this is an project which is done on PyCharm, and that's the project file
@@hariprasath9222 thanks Hari for the pointer.
Issues with TensorFlow usually relate to the CUDA driver. You MUST have the CUDA toolkit installed with a driver at 9.0 or higher. I ran into this while trying to use an old supposedly CUDA-enabled video card installed, but this failed. I then switched to a Dell laptop running native Ubuntu (no VirtualBox VMs, must be native) and got it work pretty easily once I figured out all the tools that needed to be downloaded. Also, don't edit requirements txt file, there is no need. In addition, you don't need the audio sample library, as this is 5.6 GB of wasted HD space. Just make sure your WAV files are of good quality with little to no noise in the background. I used TV news people and it worked pretty well.
Super cool. If you use longer input samples do you get a cleaner output? Some of the output sounds like its breaking up or glitchy... maybe extra reverb in the sound. I'm not sure how to describe it.
But can you have it work in realtime with a microphone instead of typing what I want to say? It's not realtime if I can't talk with my microphone and have it come out with that voice
realtime in this context is the ability to type and let the algorithm generate speech. This is not an app for your phone, this is just an example of the algorithm which could be downloaded from github. This is meant to be for Machine Learning developers and students who use python.
it can be if you use an nvidia tesla card but you still have to use a soundboard
already exists but it takes a little but long delay to perform
For whatever reason, it doesn't recognize my mic. Tried everything.
Hi Corentin,
I wanna use the Real-Time Voice Cloning tool on Google Colab:
colab.research.google.com/github/ak9250/Real-Time-Voice-Cloning/blob/master/Real_Time_Voice_Cloning.ipynb
But Google Colab don´t support interactive GUI elements. So I must use the cloning tool on command line without GUI. Can u give me some advices?
With Best Regards !
I would like to do the same thing as you , run it in colab, but how
Hello! I installed everything as it should, but I cannot figure out how to open the application as in the video? exe files are not present, thanks for the answer in advance!
This is really clever. It would be neat if you could have a speaker talk into it instead of text to speech that way you can capture emotion and inflection too but I get it that's hard to do
Amazing. This could be really useful for people with Lou Gehrig's disease etc if the model could be trained or 'banked' before serious symptoms appear.
I wonder if anyone has already done any work in that area/that regard.
I arrived here exactly with that in mind, probably will be running some experiments soon
I want to replace my voice with the voice of my preference rather than a tts. Is that even possible?
How do you train this vocoders for another language than english ?
I'm having a hard time finding exactly where I download this software.
go to GitHub and search voice cloning
@@ironaccount5301 where can I get a guide... To install this software?
this isn't Live Mode right?
is there a way to like do this in real time?
something like speech to speech so you can clone on the fly
downloaded the files but where i should load it? how can i use that software? there's no exe on the folder.. i'm lost
when I click at the "Vocode only" or "Synthesize and vocode" using the pretrained vocode hapenn thisr:
cuda runtime error(100) : no CUDA-capable device is detected at ../aten/src/THC/THCGeneral.cpp:50
Make sure your GPU supports cuda.
@@IIIIIIIIIIIIIIlIIIIIIIIIIII Ok, thank you! I will check my cuda
Is there any way to change the emotion of the tone of voice?
How did you import the dataset into the toolbox? The browse button only allows searching for music.
Wondering, what would be the best way to clone voice timbre without TTS? I've tried Real-Time-Voice-Cloning but it seems to generate only TTS text to the target voice and has no emotions whatsoever. I would like to record my own phrase with my voice, and then encode it as if spoken by the target person's voice and keep my original emotions and inflections.
@Vegan Pete But will it be able to apply those emotions at the exact places where I need them, and not by some "typical behavior"?
An example. I want to have a sentence that sounds authoritative and patronizing and puts accents on specific words. Let's say, I have a large voice library of someone who has been reading different styles - patronizing, normal, depressed etc.
I doubt Descript will somehow automagically know which voice style to pick for which sentence, and which words to accentuate.
If there was a system that could pick up the emotions and accents for specific text I'm recording, and then apply them directly onto a TTS engine voice, this would make indie game development so much easier - you wouldn't need voice actors to record your phrases, you could record them yourself and then run through the hypothetical "voice changing engine".
I tried to install the entire thing... turned out I'm too stupid to handle Python
Keep at it. I got it to work using a Cuda-enabled Nvidia graphics card on a Dell laptop. You don't need the 5.6 GB voice samples either. It does take practice and patience to get this going.
no you arent stupid! i have tried to install this since july 2019 and still got no help.
@ That's what I hate about GitHub. Never a folder with a compiled executables you can just start. According to the documention file you need to install more than just Python. You need to install as well libraries and other things like Tensorflow-gpu, umap-learn, visdom, webrtcvad, librosa, matplotlib, numpy, scipy, tqdm, sounddevice, Unidecode, inflect , PyQt5, multiprocess and numba. According to the manual you can do this with just one command "pip install -r requirements.txt". But the program "pip" is not a standard Windows command or program and there is no information about where to download the executable "pip". Maybe it only works in a different operating system. There are programs that make it easier to start py-files but it will never work without the required libraries. Very annoying and frustrating. :(
@@SquirrelMonkeyCom
Because you need to type pip in python command and not in windows command
@@civismesecret Oh, and then you're done? :-O
Managed to make it work with windows 10 but dataset speaker and utterance are grayed out. I can play Flac files