My TOP 3 Tips for Training Better AI Voices - RVC Voice Cloning
HTML-код
- Опубликовано: 15 июл 2024
- Links referenced in the video:
Tensorboard video: • Get the BEST AI Voice ...
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/JarodMica
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoffee.com/jarodsjo... Наука
Nice set up. Better lightning and new background. A lot of improvements there.
Thanks Luz, I've moved to a different portion of my room xD!
Good video, good info stuff Iv came across in my own trial and error. Keep up the good work man
Appreciate it!
Thanks man 💪
Solid tips all around! You're right to put the dataset first because "garbage in, garbage out" is probably the first thing you're going to learn through trial and error. Appreciate this vid is mostly geared towards AI voice changing, but if you're doing any AI music where you want to change the vocals, my tip is to not go overboard with UVR in trying to "clean up" the target voice (singer in the song you're wanting to replace the vocals for). Lots of times just a single pass through of Kim Vocal 1 sounds miles better than doing that + de-echo, dereverb etc. It's easy to end up losing some of the little qualities of the song that make it sound good if you clean it up too much.
Thank you! A little bit of trial and error and GIGO will be your new motto 😂. This is a good tip as well for the inferencing side of things, definitely though you don't want reverb much on the training side still imo.
great tips
Thanks sir
Thanks for the tips. I'm swapping singer voices. I have good data of about 20 minutes. 200 epocs. I used Harvest and RMVP_gpu for both training and processing. The results are ok but I still hear too much of the original singer's voice. What can I adjust to make the cloned voice totally replace the original voice?
Would you know if it is already possible to make a training which allows me text to audio from acapella's , but I want to avoid the nightmare training from Tacotron 2 , and use n RVC v2 kind of training nice and easy. So I mean I want to train a model by adding acaopella's to the model, in an easy way like you can do in RVC v2 , without having to transcript every sentence as that is needed for tacotron2 training, , and then when inferencing the model , use the type text to audio ! is that not possible yet ? Wouldn't that be great ? Or did I miss something ?
Hey I'm still new to AI (especially RVC) training, how many epochs does it take for each varying duration of datasets? Like a dataset that's either 1 and a half to 5 minutes, 5-10 minutes, 10-15 minutes, 15-20 minutes, 30+ minutes, etc. I have varying datasets that are very short to very long. For example, my shortest model is 1 minute 11 secs, my longest one is 43 minutes 57 secs. I hope you understand how I explained it since I'm on the autism spectrum and I love how AI is progressing. Hope you reply soon (cause I know you're a busy guy lol), thanks for reading!
Greetings. I have been steuggling with this stuff for weeks. I am at a point now where i can train models with RVC, however... i am having a problem i l'm not really finding ANYTHING about, anywhere. :(
I will say one thing, and the model steaight up, will say a different word.
It was based off a voice recording from an anime character of which there is not a whole lot of audio to begin with...
Is it possible to say, I make a model from scratch using myself, and just talk and talk and talk, then after, graft the voice tone of the character onto that? Would that solve tue linguistic issues, or add new ones?
For now though, i think i'll restart the data set from scratch using some tips here. =)
Your help is amazing.
How to minimize the delay when streaming? Get better vga? Is 4060 laptop has better delay than 3060 12gb?
Is it possible to start small first and then improve the created sound model with more datasets or more epochs, I mean that we don't have to do the modeling from the beginning again. If it's possible please tell us how we can do that?
could you do a tutorial connecting this to OBS?
Hello! The new RVC update makes training with CUDA faster. With my RTX4070ti it takes 30 seconds per epoch
This is awesome to hear! I'll have to check what they adjusted
Hi, glad to read that. Where did you get the "RVC update"? Thanks and regards from Argentina
@@diegolopez-xz8pgcomo tenia problemas de configuraciòn, fui a ver al github y hay una actualizaciòn de hace un dia atras.
Can We Retrain a Model? or Do we have to Train it from the starting?
When i try to train voice .
in preprocess section its shows this error
start preprocess
['trainset_preprocess_pipeline_print.py', 'D:\\RVC0813Nvidia\\Dataset\\Myvoice\\Myvoice.wav', '40000', '24', 'D:\\RVC0813Nvidia/logs/Myvoice', 'False']
Fail. Traceback (most recent call last):
File "D:\RVC0813Nvidia\trainset_preprocess_pipeline_print.py", line 111, in pipeline_mp_inp_dir
for idx, name in enumerate(sorted(list(os.listdir(inp_root))))
NotADirectoryError: [WinError 267] The directory name is invalid: 'D:\\RVC0813Nvidia\\Dataset\\Myvoice\\Myvoice.wav'
end preprocess
I dont understand why i cant get any decent result.
No matter what i do it will sound like a robotic noise.
great tutorial mate! I realized a lot of your videos have your voice (audio) outta sync with your visual, i.e. seems like your video couldn't catch-up with your voice.
:O, my voice is not in sync I'll have to check lol
I've banged my head against this for two solid days.
I think the noticeable AI sound is a combination of things. #1 on the list is compressed source audio. #2 is leaving silence pre-processed bits in the dataset. #3 is not enough variety in a dataset. #4 Parameters and turning knobs etc.
I have found making convincing RVC is really really fucking hard. You can do it with other noise in the background and no one notices, but once it's "alone in the room" it always seems to fall on its own face.
Not 100% there, some models I've trained though sound 80-90%, though on scrutiny, it's possible to tell.
Data is 100% key here.
Hello Jarod's Journey. I would like to know if you would like to train a module for me, where I have set it to false `You can get up to a higher batch size I can only get up to 26 It sounds like there is an autotuner on, when I have trained over 200 epoches. but it could well be, if you train with 35 batches, that it became more precise. How can I send you my dataset set the pitch to false thanks.
can I use this software on fortnite like live?
If you wanted to and had a powerful enough PC, yes.
How do you use these models in real time?
Is it recommended to train voice samples (talking) and singing voice samples together, or would that compromise the overall quality? Would it be better to train only singing voice samples to make an AI song cover?
I have yet to make an explicit comparison, but you can get really good models still with datasets mixing the two (I've done several this way). It might make for an interesting comparison to split that data setup and see what results in the best model 🤔
The UVR tool is effective at separating music vocals from instrumentals; however, in certain instances, there may be some static noise present in the background of the UVR Vocal output. Therefore, it is not guaranteed to work flawlessly for removing background noise in general audio recordings. To ensure clean recordings, it's advisable to use a microphone with noise cancellation capabilities in conjunction with Krisp, a noise-canceling AI app, during the recording process. Additionally, I wish I had known to "start with small datasets" earlier, as I've already set 1000 total epochs for my voice model and it is still training as of now🤣. 15 more hours is my estimated time of completion, I just hope it will turn out well🙏
🤟 appreciate the tip and hope it turns out as well too 🙏!
Eww krisp
Does it help if you clean up your device? I’m having the voice changer with cpu (and AMD Radeon graphics card) and what ever I do, on discord it’s extremely slow. Also i can’t use CUDA because when I check if it works in the console it keeps saying “false”
That's mainly a hardware limitation, you can try using the directml version of it but CPU is slow and AMD is unstable sometimes. CUDA is Nvidia proprietary so that is why you aren't able to use it
@@Jarods_Journeyif I try to use Cuda it unfortunately says falls when I try to check if it works (by checking in powershell) so that’s currently my biggest hazard
@@Avax84 You can't use CUDa because you need an Nvidia GPU, so that's why you'd have to check out the directml version to see how that works
Also, idk how to fix that
I hear myself with voice changer but very badly, like 10% quality of what I hear when testing in client
Does anyone know a good model for UVR5 that can extract acapella's from music but now without the backing vocals ? I Know X-minus can do this but I want to use UVR5. I just don't know what model I need to choose, thanks
Have you tried MVSEP (online) ?
this is my question too, did you found anything?
How to download RVC?
where can i find the guitar model? How can i get the mpeg working on the mac side? i can not train my voices ore make my Model Inference. Should me and my frend use Xformers ore not?
RVC's guitar model can be found here: huggingface.co/spaces/lj1995/vocal2guitar/tree/main/weights
Unfortunately, I don't know whether or not RVC uses xformers or not and mac I'm not sure since I don't own a Mac.
The index file is missing thanks
I found the index file. I have to retype logs not weights
Hey man is there a rvc ai download with working tensorboard?
Assuming its a clean windows install
I believe with the folder that you download, it includes the package. But if not, check out this video here: ruclips.net/video/P0M7PAsG1fk/видео.html
@@Jarods_Journey ive tried bro, i cant get it working, in the runtime folder in the rvc download theres phython and the tensor stuff i just cant get it to work.
I tried yoyr guide also but it breaks the rvc latest release if i install another phython.
Can you take a peep at the latest release after the beta as its like 2 weeks old
i have 15+ mins of studio quality vocals, but I always get effed up S and T sounds and foggy vocals. I've tried lower batch size but nothing changes.....what can I do??
I have the same issue, I think at recording we must to emphasise on S and T and for the foggy areas record a variety of high notes
@@DJDJisMusic Setting the batch size all the way up seems to help a little. But still it is not perfect.
Any tips for the real time voice changer? I can't get it to sound right :/
Depends on graphics card, but you need a good model and then you need to optimize your settings as well. Biggest thing though is the GPU.
@@Jarods_Journey Yeah I don't know if it's my mic but for me it sound kinda robotic and not smooth. I have a 3070ti laptop which should be like 3060(ti) desktop ish level.
Any voice changer for mobile phone
For RVC voices, I haven't run into any because it requires too much compute power. As well, realtime voice changing takes a lot of power so I don't see it being something on phones yet.
@@Jarods_Journey ok thanks maybe in future you find a way for phones btw I love your video keep going I'll always support you...
Great tutorial! Also, will you marry me? 🥺