CLONE ANY VOICE WITH AI (GOOGLE COLAB) | 3 MINUTE TORTOISE-TTS TUTORIAL
HTML-код
- Опубликовано: 23 авг 2024
- In this video I'll be teaching you the fundamentals of the open source AI voice cloner tortoise-tts. I'll show you how to use the AI to clone voices in as little as 3 easy steps. If you enjoy learning about AI subscribe because this will be one of many open source AI projects I'll be showing you how to run.
🔗 Links
Google Colab | Free Version of Tortoise - bit.ly/3Hq6HHI
Tortoise TTS Github - github.com/neo...
Audacity Download - bit.ly/3ZEN9X9
🌎 Socials
Twitter - / thecodingbranch
GitHub - github.com/The...
#tortoise #aivoice #aitutorial #voice #googlecolab
you don't need audacity just use ffmpeg from command line:
ffmpeg -i "input.mp3" -segment_time 00:00:10 -f segment "output%03d.wav"
This will convert to wav + split it into 10 second chunks.
How can i do that. I dont undestand. Could you help me.
@@LewisLetsPlay I have the location of the file, but where do I put it? in "input.mp3" replacing "input"? Also if it's like a 60 second .mp3 file, would it split each 10 seconds into multiple files or what?
Navigate your terminal to the location of the .mp3 and you won't need to use the full path of the file, just the filename, relative to the current directory you're accessing in the terminal instance.
And yes each "chunk" is its own .wav file
@@LewisLetsPlay aight I managed to get it working, thank you for the help!
I love ffmpeg. A must have for any computer. CLI for the win.
"With great power comes great responsibility" I will remember that when using this AI
Imagine the capacity for content creators to create entire episodes, one man studio style. Completely changes the game
Hopefully we are able to get an open source version of voice cloning AI that is very close to or at the same quality of Elevenlabs soon. This looks hopeful!
Couldn't agree with you more! Been keeping my eyes peeled for something like it 👍
I had much better results with Tortoise than with Elevenlabs, especially regarding intonation and speech dynamics!
@@thecodingbranch2479 Elevenlabs only does speech talk , but natural singing voices such as your Tacotron2 link are not possible there !
iirc, Elevenlabs dudes said that their model is based on tortoise.
It already exists
I like this channel - straight to the point, clean explaining, you just earned a new subscriber🎉😊
yah, need more like this. tired of 34 min videos
Would love to see an updated version without having to reinstall everything all the time. does a very great job. just wish we could use our own pc to make it speed up the process.
It wouldn't speed up the process unless you have a very good computer.
@@ChasishOnRUclips I do have a good pc I think. But yeah looking forward to an update if you make one.
Just realized - I was only your 30th sub earlier :P RUclips AI Algorythm working - you were on my front page under my first few recommendations!
Thank you for that!
I tried this out, it's pretty good. I anticipate how far its capabilities can go.
were you able to download the clip?
Thanks a lot for sharing this product!
It's crazy hard to find something like that for free and easy.
Nice brother. Content like these are valuable for me
I pressed play and it just wouldn't work like the first cell wont complete and the 2nd cell doesnt give a option to upload the clips
I fixed the error, so you can use it. Let me know if you have any more issues.
Hi loved your channel, also gonna try to run in my own windows pc, still gonna look for more ways to make it sound more natural with hesitations, sudden rising of pitch, changing tone to reflect more emotions etc, was astounded how advance our current ai now since im not that up to date with present tech
Also kudos to its creators for making it available to us all
Love you man, thanks for the tutorial
Thanks. Eleven labs just did the unthinkable so this is perfect. Any deep fake face tutorials?
I was actually going to do a video on deep fakes soon. Great recommendation! Stay tuned!
Excellent video, what if I want to make the voice in other language, for example spanish, do I need to tune something?
Thank you! This is exactly what I wanted.
I am eagerly waiting for a version that accepts the full set of IPA symbols and some kind of tone marks to define the phrases.
Hello, Love your video but I'm not a coder at all and this may be a stupid question but how to enter your new text? Like what did you press to load? because it not changing my test from your. Thanks if you can help me
How do you clone 'singing' voices?
I have acapella clips of my son singing I'd like to be able to customize
Mine took 15 min to complete although i tried to voice clone from and to another language. It's not too bad but i recommend you to use it only for english voices and texts.
Did you try greek language?
@@kostas9849 no, i tried to clone a spanish voice. It just sounds like an american trying to speak it. I imagine it'd be like so for all other non- english languages.
@@anagnorisis1522 i wonder if someone audio expert could fix the accent and make the cloning audio sounds natural...🙄
@@kostas9849 The script has been trained with the english accent. That's why it sounds the way it does when cloning into another language. In order to fix it, for example, to clone a spanish voice, the model must be trained with at least 1000 hours of spanish sounds so it can clone them using your sample voice. Elevenlabs had this problem until recently. Now their custom english voices can mimic any other language realistically.
@@anagnorisis1522 could you tell me what is the maximum number of characters the entry can generate?
I did all of these steps what should i do next how to make a ipynd file i didn't get it please tell me
Hi, when I run the second code block I don't have the option to upload my files, and it shows "fileexistserror", I believe is something basic but I don't know what to do. Thank you
is it possible to clone a realtime voice...Like i mean making it to a project that can take voice input in real time and replicate the same
I wonder if this could work with voice clips from video games, anime, movies, and shows to create deepfakes. This technology could lead to full length abridged dubs of anime if capable.
Do it
🤓
@@chlorhexidine2506Bros an idiot
It does lol but also it feels wrong doing it
Did a few voices from honkai star rail
Can you change the tone of the voice reading text {e.g. excited, sad, etc}?
hey can you please fix it? its not working. I keep getting "NameError: name 'os' is not defined", "ModuleNotFoundError: No module named 'huggingface_hub'" and "ModuleNotFoundError"
Thank you for this awesome video. I will love you to make a video on how we can use colab with video calling
It worked well for more samples, its simple more voice data you train better output u get : )
why do i get this error when I run the first block?:
ModuleNotFoundError Traceback (most recent call last)
in
14 import IPython
15
---> 16 from tortoise.api import TextToSpeech
17 from tortoise.utils.audio import load_audio, load_voice, load_voices
18
3 frames
/content/tortoise-tts/tortoise/models/xtransformers.py in
8 from collections import namedtuple
9
---> 10 from einops import rearrange, repeat, reduce
11 from einops.layers.torch import Rearrange
12
ModuleNotFoundError: No module named 'einops'
same here, its like it doesnt work at all. I even installed all dependecies. Getting same error and the name "os" is not defined. I may have to install on my computer. I just wonder if this is safe on my computer.
same error
same here same error, I tried to fix it with ChatGPT but the problem is that more errors showed up once I fixed the previous error. , a complete nightmare that link , he changed something probably
same
@@briansxml These are notebooks that might not work for everyone because you probably don't have the nessecary modules installed that others have, or some cells could have been updated while this link has become an old not updated notebook which causes errors to newcomers. Most of these several RUclips tutorials are private selfmade notebooks by the uploader. Each of these tutorials has their different created notebook cells and installs by the uploader. It might work for some. I figured out that from all tutorials , only 1 notebook does not give errors , and that is the Uberduck notebook . I could not find a RUclips tutorial for that one, but if you have seen a few of the other tutorials around than you would be fine to try it without tutorial now. Good luck ! You will be fine.
Sorry to bother you, but I'm unable to upload audio. It says "Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable. "
I click the play button on the left side but then the upload voice widget disappears. Am I doing something wrong?
Big thank man
But how to download voice after generate ?
Hehe im new in this word 😂
hey man I'm getting an error "RangeError: Maximum call stack size exceeded." when I try uploading even just one wav file. Any ideas?
How long is your sound file? 🤔
@@thecodingbranch2479 a few clips between 6-10 seconds, but I tried just uploading one single 8 second clip and whatever I do it gives the same error.
Alright. Thanks for letting me know. I'll go through and test it today and let you know what I come up with.
@@thecodingbranch2479 this is the line that gives the error: "12 for i, file_data in enumerate(files.upload().values()):"
@@thecodingbranch2479 Is there anything to copy from the GitHub repo into my google drive or something I'm missing that could be causing this? It seems these notebooks load themselves without connecting to google drive so I don't get what I'm doing wrong. I tried like 10 different kinds of wavs just to see if it's working.
there are errors like :
ModuleNotFoundError: No module named 'einops'
Hi I had same error. Can you please share the solution in case you debugged it?
I fixed the error, so you can use it. Let me know if you have any more issues.
@@thecodingbranch2479 Thank you
I like it. Which program or method do you recommend to create the highest quality and best reproduced own voice that will read different texts?
This would have been a great video if you started from scratch and explained all of the steps. That is the difference between 36K views and 360K views. Please consider posting both a long and short version and see which one does better.
Thanks for the feedback. I'll give that a shot.
i keep getting an error
NameError: name 'os' is not defined
same
Wonderful video, it's very easy to follow
can it copy intonation as well? I would like to use my own voice but my intonation is not clear.
@codingbranch....do you offer help with the tortoise errors. I have been trying for a month to make a short snippet voice clone of my friends mother that just passed away as a sentimental gift. Please help. Anyone.
I may be able to help. But to be completely honest I would heavily recommend checking out my latest voice cloning video. It is technically a lot easier to accomplish and is far more accurate. That sounds like a very nice gift!
Hi. Thanks for this video. After successfully creating a sentence with a nice voice, I immediately tried with another sentence, but it brought the following Error: NameError: *name 'os' is not definedmessage, stack overflow.* I wonder if I will have to repeat the process for each text (sentence), I mean, reload and upload everything from the start. The issue would seem to be the memory, but I don't know how to reset the memory for processing another different sentence. Thanks in advance for any help. Regards
Yes unfortunately you have to reinstall everything. It's ridiculous but I can't get it to work otherwise
@@thecodingbranch2479 Many thanks for your response. Does that also happen with the long text version? (of another video from yours). And please just one more question: Are there any symbols or punctuation marks that can serve to make pauses, etc. within a sentence? I mean, apart from the split symbol you already included in the longtext version.
@@redpillmath I'd also like to know that
Awesome video. Could you do a tutorial on cloning a voice and then using a tts library locally to further use that trained model?
You can kind of understand it from the code in the collab, just copy a few lines here and there and download the folder with your custom voice to keep using it
@@eplatamx2017 can you talk in more detail about what to copy and how to do it ?
Browse the generated folder in collab and look for the one that was made from your audios, it was a default name like "New model" I don't remember, but you can change that in the line of code around the second block. Once you have that folder you can download it and keep it in your computer, then you can reupload that into your next collab session and skip the training part of it.
@@30secsdaily
Thanks for the video. How is this compare with Bark?
Great video
I guess this can be used to clone scary story voices as well right?
I've been trying to run the "install package" command and it always stops because it says its missing "einops", does anyone here have any solutions? Do I need to install python for it to run?
Same bud
@@dogepotatogamedev yes me too , I used ChatGPT to fix it , but it is a nightmare ,because I got that fixed with a code to paste and then other errors showed up , endless problems. and the uploader here never replies what is wrong with his link.
I fixed the error, so you can use it. Let me know if you have any more issues.
@@thecodingbranch2479 great thanks a lot man 👌
what do you mean when you say " don't get in trouble." ?
You don't have a terrible voice! Just so you know
nice video man. have you had any luck running this on your own hardware?, Ive been trying for weeks :(
I haven't run it on a local machine but if I find the time throughout the week. I will give it a shot and let you know.
@@thecodingbranch2479 LOL
Do I have to run the installation process every time I want to use this? Also can I upload an hour worth of training material?
I tried to get it to work without the frequent install but due to the nature of Google colab I wasn't able too. If you ran the same code locally you wouldn't have to.
If you read their docs they recommend uploading 10 second clips. So idk if uploading an entire hour long clip would end well. You may have to break it up which is pretty pain staking. Although it may be worth trying just to see if it works. 🤔
@@thecodingbranch2479 I'm gonna try this with voice clips from anime and video game to see if I can make my own deepfakes. It would be great if this could replace Elevenlabs after it got paywalled.
Yeah heads up, ye might run out of RAM from the runtime.
i love you keep doing that
When I click on the second cell I get this:
NameError Traceback (most recent call last)
in ()
9
10 custom_voice_folder = f"tortoise/voices/{CUSTOM_VOICE_NAME}"
---> 11 os.makedirs(custom_voice_folder)
12 for i, file_data in enumerate(files.upload().values()):
13 with open(os.path.join(custom_voice_folder, f'{i}.wav'), 'wb') as f:
NameError: name 'os' is not defined
Any help would be appreciated.
Help, It doesn't work for this time.
It says "Cannot rely on checking for EEXIST, since the operating system"
How do I do?
How much text characters can i put in? How much is the limit?
Im trying to clone my voice in english and spanish but the result is not good.
Thank You 😊
is it necessary to create 10 secs of wav files? can't I upload a single 1 hour length file?
The documentation says roughly 10 second clips but try an hour long clip and let me know how it works!
The session keeps crashing frequently. Is there a way to fix this? + How can it be done faster? Thanks for the amazing video!
is this also suitable for other languages?
Can you actually do something like set up a new TTS System tho? This one doesn't seem to do more than just the dialogue you ask for it to say, I need something that can actually generate a voice to use in the tts I have
I made one voice clone and when im trying to make another one it doesnt work
🙏 Bro please help me the way you r showing on your tutorial is not showing when we click on Google collab link in ur description what to do.... Please 🙏🙏🙏🙏 I really need this.....
Did you try to mix several actor voices to create unique voice which sounds natural?
Many Thanks. What was the text you spoke for the audio. Were all 10 clips different text ?
Good question. I read the entry lines for a number of local news articles. But tbh I don't have the best mic and had a warm fire going in the background. So it was far from the perfect setting.
@@thecodingbranch2479 Many Thanks for responding.
Does it work with foreign language? Or just copy the english accent?
Great question 👍 from the research I've been able to do it seems like it doesn't copy foreign accents very well unfortunately. Hopefully they come out with a better version that does!
Hi, I’ve got a question can you use multiple sentences when generating speech?
Im looking for a website or software that changes my voice to Arnold Schwarzenegger so i can sing with his voice.
Hey i have a question, can it clone any other language other than english ?
thanks
Does this work with foreign languages or only English?
thank you it was quick but not high quality i would say mine around 70%
umm mine wont work. well it did work at first and gave some funny sound i couldnt understand then when i tried re_running it from the top it keeps saying i need some utl8- f locale required, please help
Im having the same problem. Did you ever figure it out?
Hey bro. Please where are we going to go to play the voice when we are doing running it. Or it's going to play automatically?
The play button is on the bottom of the page where your text input is
gr8 job! i want a customized cloned voice for my AI comapnion. how can i add that to my python project?
For a project like that I would probably use eleven labs AI. Check out my latest video, you may find it easier to use.
Can the VOice Model it generated be downloaded to be used in other TTS?
I tried to put a longer script and it gave a syntax error. do you have any suggestions?
I dont understand like a year or two ago i was looking in to tacotron2 with nvidia and googles efforts and they were producing incredible cloned voice engines....
but now i decided to have a look in to how its coming and im seeing this tortoise ttl... and sure it looks easy to use and fast but sorry the accuracy and what im hearing is decades backwards from what i was hearing with tacotron2 .... what happened ?
Machine learning is as good as the data it can learn from. Google has the most data in the world but they are greedy so they won't open source anything. TTS was made by an individual who did his best with the data he had. It can actually be alot more accurate than what I showed in the video. You just need to give it more (and better quality) samples to clone from.
Any speech to speech on Colab?
nice
What the practical limit of the inputted text? One sentence? 255 characters?
It worked I downloaded a wav file, but how do I create a voice file for reuse
i dont know why but its not prompting me to input any audio because an error keeps occuring. i noticed some parts looked different on the collab than in the video too
didnt work for me, getting the voice that was originally used. what am I doing wrong? After I run the 'edit output' section and all the WAV's are processed, it says 'saving WAV as .wav', but nothing changes from there. then I run last section and get original voice. I used 8 WAV files.
Does this work on other languages (mainly French, Spanish, Italian, and Russian)
how do i make more than 1 audio? the first one came up great, and now that i wanted to change the input i tried running the 2nd code and the 3rd one again and it generates the first input.
Then i tried to refresh all and try the first block again and a error pops up U_U
Does it only work for a few words? Or i can paste as much words i want?
Is there an Ai tool that will let you change the tone of someones voice in to your own voice ? instead of using the method of cloning a voice.
amazing, can i use it for arabic text?
Thank you. Inscribed! Does this work only for English? I suppose so, but... you never know.
Yeah, unfortunately only English for now
The blocks are different for me, I can't get to the part where you upload your files:(
edit: it wont let me upload custom files or generate text if i touch the block where you type what you want the output to say or the quality. even ignoring that i get this error when i try to generate:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
in
1 # Generate speech with the custotm voice.
2 voice_samples, conditioning_latents = load_voice(CUSTOM_VOICE_NAME)
----> 3 gen = tts.tts_with_preset(text, voice_samples=voice_samples, conditioning_latents=conditioning_latents,
4 preset=preset)
5 torchaudio.save(f'generated-{CUSTOM_VOICE_NAME}.wav', gen.squeeze(0).cpu(), 24000)
NameError: name 'text' is not defined
amazing video but it only lets me do it once after it i only get errors you may know a fix for it?
idk what im doing wrong when I press run on the second block the upload button disappears and the code fails
do you know if it can do speech to speech? for ex: everything you narrated in this video but in a grandma's voice?
No you can't unfortunately
Can someone help? i stuck at second tab. it says : "Cannot rely on checking for EEXIST, since the operating system" and after that i cant add files. how to fix it?
What does “ maximum call stack size exceeded” mean ???
what is the maximum number of characters that can be inserted in the text?
I just wanna make Layne Staley sing Scream of the Butterfly
Getting a error for no module named einops on the first code box.
It looks like he changed something in the past week inside that notebook , so it that link is not working properly anymore ! he even didn't announce anything about it. now it is full of errors when you want to work from it. it might not be his fault. But things got changed in it.
@@klaurcschwackerberg1880 its been an error for quite some time now,
@@KirksCountDownsM I gave up on this link, the guy is not responding on comments anymore, but I found this other useful similar link online from someone else that offers the same clone result:
ruclips.net/video/uQ9PYhAsd6w/видео.html
I fixed the error, so you can use it. Let me know if you have any more issues.
Will it work on other languages
ok i copy the document afdter that i press install , upload 16 voice sample and after that generate audio file and is not my voice
Please help, is there a copy of a voice and not from a text to another voice that supports the Arabic language on Google colab? Please help is necessary
How high quality can you make your audio dataset? Also, what's the recommended amount of data or minimum amount?
I'd recommend 10 minimum
@@thecodingbranch2479 10 minutes?
At least 10 different tracks of data. And the docs recommend 10 second clips.
cassidy sparta.
@@thecodingbranch2479 Can we put in 10,000?