Updated AI Voice Cloning with RVC Inference - Tortoise with RVC Local Installation
HTML-код
- Опубликовано: 4 окт 2024
- Links referenced in the video:
AI Voice Cloning Repo - github.com/Jar...
How to get RVC Voice Models - • How to Get AI Voice Mo...
How to Train a Tortoise Voice - • Local AI Voice Cloning...
RVC/Voice Changer Playlist - • AI Voice Changer
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/Jar...
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoff...
The tts program that becomes rvc gives me great happiness.
just when i was starting to miss the gcollab rvc now i finally have it downloaded on my pc thank you so much
PARTY TIME!
Jarod you’re the best! The hero we needed 🎉
Thank you thank you Artholos :)!
Awesome! I've taken a pause on AI for a bit to focus on some other things, but I'm excited for all the neat things waiting for me when I come back to it. Love your content!
All good, gotta do what you gotta do! AI will be here, bigger and better whenever you return to it. Appreciate it!
Thanks mate..!! You are one of the youtuber..who knows about AI Voice inside out. I'm a pro dev but not into AI space. I find all these stuff exciting. Want to explore all voice related stuff but have time constraints. It will be great if you can make an Udemy course covering all aspect of Text to Speech..will definitely like to purchase it..
Appreciate it! I've got something in the works so will probably announce it on my channel whenever I get things sorted!
I used to make FL Studio tutorials on my main channel like this, straight to the point and super helpful and effective. Well played
Thank you, I learned a valuable lesson from this video. Don't do something yourself when someone else can do it better.
I feel you... Been struggling with this deepspeed cuda shit trying to make tortoise work lol
This is perfect for me as I am a Filmmaker that is working on a new project that will use some characters from a game that has a Wiki where I can download the voice lines and turn into a model and use this RVC to make them say the story without spending hours in editing myself. Thanks!
I had the idea if this didn't exist to just record myself doing the character's lines I wanted and using a RVC to make it sound like the characters but that'd be very time consuming and worse. So once again a huge time saver. And the RVC I used to clone my own voice, the training has stopped working and I couldn't add any more voices which sucked. So I hope this doesn't have any of the issues I've occurred so far yet.
That is a great idea guys. I’m a filmmaker too. Great vibes!
@@BenjaminTemplar No problem, hope the best for you!
what are requirement to achieve this, can this be done with 2018 Mace mini ?
Oh JEEZ. This is simply incredible! I can't stress how impressed I am. You did an amazing job by combining these two technologies together! I mean... damn, god bless IT guys like you. This is outstanding.
I have a one question though... is it possible to make a proper sound-based text-to-speech on languages other than English? Like, I don't know, if it would be possible to select or type the language locale in a new input field before clicking generate, so the system will recognize it's not the basic English. Just wondering what some of the favorite characters would sound like in translation, with their voices. :)
2:07 yes I need all those models thank you. It's like when I purchase an audio fx plugin and if it doesn't come with presets... I'm mad. I need some presets to help with my workflow.
Voice models means I have choices for some of my videos where I don't have to train a model if I don't want to.
Oh my god this is so impressive. I've been playing around with MRQ repo for a year now, and I've got some pretty good models out of it but this, I think, is going to take it to a whole other level.
That was a great idea to link the two technologies! I can't wait for a Linux version too :)
I was a bit impatient waiting for the command lol. Thanks! You're awesome!
Thanks for the amazing work you've put in. Im loving the results of your RVC update. Is there a way to turn off rvc in your audiobook maker. And use the output from the new tortoise rvc instead of the audiobook makers built in rvc.
I appreciate it and I'm glad you're finding these things useful. It does not unfortunately. I will have to update the audiobook maker to put that into account as several things have changed since the initial release, just in case you want to disable/enable RVC.
Thanks for the support 🙏!
Very useful, may you always be healthy and blessed. Thank you for the tutorial, I am using it
Is it possible if you can make a google colab or paperspace version for people who don't have high end GPUs?
I hope soo
I did made it
@@The-Game-Making-Guild Where's that ?
@@JakaJaka-kx2qj I can't send it here due to RUclips blocking links
九千 colab.research.google.com/drive/19MWwqyiHSyOtQB8hqrVyajHvDdzUnshd
C:\Windows\System32>runtime\python.exe infer-web.py --pycmd runtime\python.exe --port 7897
The system cannot find the path specified.
Pls how to fix this
Would be awesome to see a video where you break down all the slider options and what do they do. Some of them are self-explanatory, but stuff like Filter Radius or CVVP weight or Diffusion Temperature etc. are pretty unclear and it's hard to determine what exactly it is they do. Documentation for RVC and Tortoise TTS also isn't exactly the best, so most of the stuff isn't explained there either, not well, at least.
tysm for this istg ive been wanting to update my model and make new ones for a while- - a very happy vocaloid user
Great work. Thank you. But I have another question: What would have to be changed to be able to use other voice languages?
*Bro your job is outstanding...* I have one question please. What i need to do to have Greek Language? Even i loaded my Greek Samples and wrote in Greek in Text box.. all sounds wrong. 😂😅
I need it too.. @jarods journey
whoa........WHOA! This is hot. I was looking for elevenlabs alternatives and I think this is it. I love local training and having a 4090 (we have the same one! It rules!) helps a ton and I just dont want to use services! This rules! Instant SUB! Cant wait to try this later!
Would be cool to see what can be done if you re-trained your voice model, as you said, and see to what level it's possible to take it in terms of pure quality. Like how good can it get with today's "tech".
Tortoise and RVC already gets really, really close on other voices I've tried and tested. I personally haven't tried mangios still as RVC added rmvpe to it
this is good enough that you mostly don't need to use paid models imo. rvc really fixes the worst aspects of tortoise.
it's then only a matter of training good models for both (specially tortoise)
7zip won't extract it, says unsupported compression type.
Hey man, thanks for this, finally one that works, But, there a video where you teach how to customize emotions for TTS?
Awesome system, thank you! Definitely useful for my RPG, Knights of the Chalice 2.
I've noticed that the RMS setting is significant and the result may be improved with a value higher than 0.5. In one sample, 0.7 removed the 'metallic' aspect of the voice, compared to 0.5. However, there was hardly any difference between 0.7 and 1.0. Not sure yet if 0.7 is the 'best' value for all voices or if that depends on each voice.
I have one big request: please let the name of the output file contain the seed, and maybe also the RMS value. It's nice to be able to recover the seed when you want to tweak a sample. Thank you!!
'runtime\python.exe' is not recognized as an internal or external command,
operable program or batch file.
Press any key to continue . . .
I downloaded the zip file, extracted it, clicked start and this popped up. If I press a key it closes. What am I doing wrong?
same issues here, did you downloaded the ver 3?
ModuleNotFoundError: No module named 'bark'
ModuleNotFoundError: No module named 'vall_e'
RVC options are not showing in the UI ..??
Some guidance please.
doesnt work for me. when i hit generate it gives error message
error No RVC configuration found, check configs folder. If rvc.json does not exist, please change a setting in the RVC area to create one.
First of all Thank you so much for all of these which you are sharing with us without any cost. So now please suggest me best and the most best voice cloning AI for local use. i am RTX3060 12GB and core i7 14th gen. Thank you in advance.
Does it work for Spanish?
Thank you for you hard work!
This program is trully amazing. Lterally the best of locally running TTS setups. The quality of the result is outstanding.
I just wonder if one could have some control over speed of the reading. I didn't find this in the interface. Do we have some built in text tags like , or sinething like this?
Hi Jarod Friend. Please is this a multi language PC online app or only English? I followed you earlier and I found that the IP address generated takes me to only English language. I tried training and Arabic Wav file, but it did not work. Please tell me, and how to place "ar" Thanks.
Guys, is there a video where a famous voice actor tries this for a long time? I couldn't find a long video like elevenlabs where I could hear a few examples.
Hey Jarods, amazing video! I have a question, is the training for RVC and TTS same? Or do I have to train a model seperately for RVC?
I really appreciate the effort, but for me there are too many tangents and references to old out of date clips that break the workflow making it not possible to get a result.
Great tutorial, thanks !
YOU ARE DOING GODS WORK...
2:44 hot tip. you don't have to do this if you don't want to. But through all my RUclips research about how to make videos, the successful YT channels say the ideal spot to put your self-promo section is at about 2/3 of the way through the video so that you retain more of your audience to get more watch time.
I also enable mid-roll ads and put a timestamp in my editor, and so I know what time to put my single mid-roll ad
This looks great, quick question: does it work with English speech only?
File "ctypes\__init__.py", line 374, in __init__
FileNotFoundError: Could not find module 'C:\Users\ahmad\OneDrive\سطح المكتب\ai-voice-cloning
untime\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
how i can fix it?
have the same error
Nice work! Great detailed info!
How many languages it support?
I am interested to Greek also .
Great video, thanks a bunch!
Thank You Jarod very smooth.
Great work, thanks for sharing your knowledge. Can you make this into a colab version, my gpu is low spec. Thanks
mine says the .wav isnt being recognized
Error opening './voices/me//downloa.wav': Format not recognised.
seems to give me all kinds of errors, first not installing a venv, then OOM, and not loading RVC correctly.
Hey Jarod! This is awesome, thanks for sharing. I'm working on cloning some singers' voices, and I was wondering if it's possible to clone the style of singing (i.e. vibrato) as well? On RVC, it seems to only layer the voice quality itself as opposed to the style of singing so was wondering if that is a viable option.
Tortoise doesn't transfer singing features so not possible there. As for RVC, that's where the index file comes in. It should help to reintroduce aspects of the original training files back into the output
@@Jarods_Journey Got it, thanks!
Awesome work man! The only thing that I don't understand is at 7:26. Is that a separate model you trained using TTS? What's the difference in that and the wav file you selected on the main tab?
If you've trained a voice model in tortoise before, this is for selecting that voice/autoregressive model. If you haven't, then you could disregard this section
Really nice tool, really help full video. Thanks for all.
I followed the instructions carefully but it fails to load the forms😢😢
This is awesome, excellent work! Sorry for the dumb question, but how can I access this from other computers on my network? ip address:7860 is refusing a connection, and I can't seem to figure out why. Disabling firewall does not fix the problem and I"m a bit stumped.
Many thanks in advance!
hi. is this the best voice clonig app out there? i would like clone my grandmas voice (she has passed away) to make my mom a virtual card. i would like the voice to be perfect though. thanks.
Hey there, sorry to hear about that. This software is pretty good, but it's not perfect and requires a lot of audio data. If you don't want to bother with getting all of this setup, I would recommend eleven labs as the best paid option for voice cloning in this case.
i am ok with following the steps. i was wondering if this is the best out there to learn at the moment. one other question please, how do i convert mp3 or wav file into the PHP extension? i have a ton of audios from her in those formats. i didnt find any software online to do that. thanks.@@Jarods_Journey
how to train it to speak like morgan freeman ?
Thanks for the video! Where can I find documentation about this tool so that I learn what each setting is intended to do? I'm having a hard time trying to control the emotions.
"CUDA out of memory. Tried to allocate 43.81 GiB. GPU 0 has a total capacty of 6.00 GiB of which 2.72 GiB is free. Of the allocated memory 2.33 GiB is allocated by PyTorch, and 16.43 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"
Getting this error when i click generate. Any suggestions?
I tried the "Low VRAM" option, but still the same. Also, it correctly detects my GPU during startup with 6 GB RAM.
I got this error too, I struggle to find a solution ! did you find out ?
@@chatsmignons5466 Yes. First, i updated my 7zip to the latest version. Unzipped everything again. Then ran the program. Generated my first prompt by choosing 'random' voice. Then i tried generating with other voices...and it ran fine.
Its because your files in voice folder is not .wav. Also upload parts and not a whole audio file
Great ! Can we upload a singing vocal recording and render it with a .pth file (RVC model), or is it for speech only ?
why there is need of your voice sample if you already put model of your voice
I got error extracting, I've downloaded 7zip. Any idea why?
Me too... Use winrar to extract it.
Hello Jarod, thank you for the sharing, I need a tool like this with posibilities to do an API request to do that in bulk. does this tool allows it ?
I have a question. Instead of bundling all the models and creating a file tens of gigabytes in size, why didn't you simply allow the user to select and download what models you want after installation?
do you have a suggestion for language model that runs fast? I use amethyst-13b-mistral.Q8_0 and it is by far the best local model i've tested, by a landslide, completely different dimension. it is actually comparable to gpt4. but it takes like 90 seconds to generate each reply. it's like a person typing at ~70 wpm. maybe there's a model that's 10 times faster and 50% as good?
You might want to look at 7B parameter models and then look at what model loader your using. If my memory serves me straight, exllama2 was there fastest I think in my testing
I've been trying to get this up and running, but I keep running into an issue where it keeps trying to use an absurd amount of ram the GPU simply does not have. " Tried to allocate 323.47 GiB. GPU 0 has a total capacty of 12.00 GiB of which -", and so far I'm not having any luck with a fix.
I got the same error
@@Footballmadppl same problem here, any fix?
@@anthony.stepvoy Still not bro
@@Footballmadppl i fix it, its because voice samples are too long and heavy. CUt it into smaller size audios
@@anthony.stepvoy You are a fucking champion.
I get this error when attempting to generate something "RuntimeError: Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 2 in the list." - but how can the size be equal to 1 if the least minimum allowed is 2?
you are a gem!
up. really great bro tnx
I'm currently testing workflow with openai generated TTS to RVC. openai seems to create superior base TTS to tortoise
Google Colab web ui?
With all these voice changers....does it make a difference using a cheap mic vs a super expensive mic?
How are you getting the vocal samples though, for fictional characters? This part is the major chore from what I've noticed. What am I missing to make this easier?
I dont think anyone mentioned this, but the huggingface link for RVC is dead... Oh wait, My google wasn't showing that it was downloading. Nevermind!😁
Cool! What about emotion custom models? Where can you find those?
Hey thanks for offering your repo. How much Gram is required to run this? I'm running a GTX 1650 NVIDIA card, it's only got 4 gb. Is that enough, and if so it will run be slowly?
i'm having this problem: "Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: torch.cat(): expected a non-empty list of Tensors"
what it could be?
Great happiness,
I want to tell you that can we make our own voice which is Hindi like this, How to do it, it would be great if you make a tutorial.
when i click start nothing happens on the cmd
Hi, how to use it for German or hindi Language? Trained my voice for hindi. But by generating Text to speech I am receiving Config Error. Thanks in advance for your support.
Hey, great tuto. I'm getting a CUDA out of memory error after I go through everything and hit generate. Ive got a 4080. Any suggestions? EDIT: I've got 16GB of dedicated GPU mem, running start.bat allocates 4.8GB and then running Generate blows past 15GB then throws up the error. 2ND EDIT: Reduced the samples to 8 and now it works...lol...no more edits I promise.
I installed rvc W-Okada and my voice doesn’t change, it’s the same. How can I fix that?
I've got a bit of a unique issue. Training doesn't seem to do anything. It'll run for a bit then seemingly pause. I gave it 12 hours to see if it was just running in the background and still nothing. The graphs don't even show up, so it's kinda hard to tell what's going on. Nothing in the terminal says there's an error, so I figured I'd bring it up here. Am I missing dependencies or something? Is it cause I installed it on an external hard drive instead of the D or C drive? There's not a lot to go on, but any help would be nice.
Maingat kong sinunod ang mga tagubilin ngunit nabigo itong mai-load ang mga form ☢☢☢☢
@Jarods_Journey just a question: did this compatiable with book generator?
and that is really good man.
Yes, it'll work. Just make sure to turn off RVC in tortoise as RVC is already built into the audiobook maker
@@Jarods_Journey ok thanks alot by the way to give fast feed back about book maker its work really good. just feel like when it finishes full eneration or regeneration of sentences it will be cool to give a sound indication as its finished.
but thanks alot for this kind a tool 😁
I noticed that the audio references are all very small wave files, is this the best way to do it? or is it just what you have? Would a single long file also be suitable reference and does it have to be in Wave format?
What if I already installed a previous tortoise model from your other tutorial. Is there a way to update or download the needed rvc extension myself
You need to redownload this package to get the RVC inference functionality
this is awesome bro!
hello how to replace a signed actor voice from a video file with another person's voice
Looks great.
Jarod you are awesome
I’m just now starting to research AI voice cloning and I’m curious as to why the GPU matters? Does the model learning use the GPU to process the AI? Sorry if I seem ignorant, as I am 😂
I'd recommend reading this: www.quora.com/Why-are-GPUs-well-suited-to-deep-learning and then timdettmers.com/2023/01/30/which-gpu-for-deep-learning/
what are requirement to achieve this, can this be done with 2018 Mace mini ?
So, you will always need a chuck of the original voice for the model trained on this very data to work. Is it correct?
it looks to be only effective for english language if i tried my native language it cannot make a proper word
If i use random/pre-installed voice how can i choose the gender?
Tanks jarod you are the best, how to use for different language?
there are many rvc like mangio-rvc, applio and more to cloning voice model but which one is the best for cloning ? i do have nvidia gpu
No one size fits all. It's mostly context dependent based on my experience
Great
i just made a voice but no .index file comes out, im trying to use RVC GUI for ai covers. any fixes?
Generations very slow on a laptop 4050 gpu, any fixes?
cmd: "KeyboardInterrupt ^CTerminate batch job (Y/N)?"
Help?
Which software name is the RVC ? Where can we get it
There's a way to prevent the output to have those "I'm really" before the generated sentences?