RVC WebUI Voice Cloning Tips, Tricks and Experiments!

Nerdy Rodent

Просмотров 35 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 июн 2023
While the basic process for using the RVC WebUI for voice-to-voice clones is pretty straightforward, there are also some other tools you may need to use such as Ultimate Vocal Remover and Audacity. In this video I share a variety of extra tips and tricks including:
* Sharing models
* Cleaning up old data to save disk space
* The Mangio RVC WebUI for experiments
* Some experimenting
* Tensorboard graphs and logging
* Ultimate Vocal Remover for improved audio splits
* Audacity tips
* Voice to Voice Example mix
== Links ==
* Original RVC WebUI Install & Usage - • Use AI to Clone ANY Vo...
* Mangio RVC Fork - github.com/Mangio621/Mangio-R...
* UVR - github.com/Anjok07/ultimatevo...
* Installing Anaconda for MS Windows Beginners - • Anaconda - Python Inst...
== Stable Diffusion Links! ==
* Automatic1111 Web UI - github.com/AUTOMATIC1111/stab...
* ControlNet Extension - github.com/Mikubill/sd-webui-...
* How do I create an animated SD avatar? - • Create your own animat...
== Stable Diffusion Playlists! ==
* Everything Stable Diffusion - ruclips.net/p/PLj...
* Dreambooth - • Stable Diffusion Dream...
* Textual Inversion - • Stable Diffusion Textu...
== Attributions ==
My Favorite Regret by Josh Woodward is licensed under a Attribution License.
freemusicarchive.org/music/Jo...
Наука

Комментарии • 172

@MiguelGomez-qx7qc Год назад ⁺³⁰
Ive been using RVC for the past couple of days with better than expected results. With these tips I feel like im gonna take my projects to the next level. Thank you for all the hard work!
@NerdyRodent Год назад ⁺⁴
Best of luck!
@kusog3 Год назад ⁺¹⁸
RVC is damn good. Not only can it clone voice to another, it can do so in another Langauge.
Heck, I've been using it to change the voices of some anime I watched to my favorite actors.
@NerdyRodent Год назад ⁺³
Fun, huh? 🍿
@dthSinthoras Год назад ⁺³
You are able to do other languages? I try it with german and it gets my voice pretty good, but the pronounciation of words is kinda terrible. Do we need more training data for other languages? Or do you have any additional tips for me?
@user-ik8ew7bz8o 11 месяцев назад ⁺²
@@dthSinthoras you need a voice in English for music in English, or a voice in German for music in German. if you take a German voice for English music, it makes an accent. I tested with several languages, you have to keep the same language for the voice and the music
@dissonanceparadiddle 11 месяцев назад ⁺¹
woah that's so cool!!
@kusog3 11 месяцев назад
@@user-ik8ew7bz8o I think the accent is pretty cool
@user-iu7in7oo9t 11 месяцев назад
Thank you for your hard work making this video! Looking forward for a sample cover song with your model.
@blackvx Год назад ⁺²
That's nerdy to the core, thanks!
@dennisliebelt3951 Год назад
I always thought crepe will give the best results. Thanks for the research and the useful tips how to get the best results!
@ris_kis 11 месяцев назад
Good advice about sharing, because I somehow almost managed to understand that, but you confirm my theory.
@aa-xn5hc Год назад ⁺¹
You are the coolest geek ever....
The final duet is gold
@NerdyRodent Год назад ⁺¹
Thanks! Glad you liked the things 😀
@marcfruchtman9473 9 месяцев назад
Super interesting. I really like the invert audio trick!
@NerdyRodent 9 месяцев назад
Glad you liked it!
@kait3n10 Год назад ⁺⁶
Great practical advice. I've personally used the free Adobe AI voice enhancer to make the extracted voices clearer, but UVR seems very promising. Will have to try that out. Thanks!
@--JYM-Rescuing-SS-Minnow Год назад
Sandesu.-Hen!
@Charlton222 Год назад
I use UVR all the time and its great.
@RHYTE Год назад
Hell yeah waited for this!
@jurandfantom Год назад ⁺³
thank you for your hard work NR
@jurandfantom Год назад ⁺¹
on that note, what would be better at this stage? text2Voice with custom voice trained or text2voice with build in BARK voice and then voice2voice ?
@toph_beifong Год назад ⁺¹
Ay this is awesome! After getting the hang of so-vits, this is looking real promising.
Just wondering though, is there a easy way to use a model for TTS or would I need to create a separate model specifically for that?
@--JYM-Rescuing-SS-Minnow Год назад
Sandesu.-Hen!
@StiffPvtParts Год назад ⁺¹
That's insane! Consider my mind blown.
@calvin54 11 месяцев назад
And changing the learning time, what am i suppose to change that to? can this be done if a model is already training or would i need to retrain again with these changes made
@denblindedjaligator5300 3 месяца назад
could we make a shared dropbox? i will like to know if you trained one of my dataset on the false settings on a batchsize of 40 will the autotune effect go away? my dataset is 1 Hour long? and train it for 1000 epochs thanks. how mutch can i clock my GPU in the EVGA Precision X1 my graphic card is 400
@H4AZ 11 месяцев назад
hey, how can i use the harvest-diffgrad model to train? I can't find them in google colab
@JohnSundayBigChin Год назад
Hi Nerdy, thanks for the tips!. Is it necessary to update the RVC that I installed a couple of weeks ago?
I ask because I only have pm, harvest and crepe in my RVC.
@NerdyRodent Год назад ⁺¹
Personally I git pull every time, but whatever version I’d best got you!
@fadyibrahim8649 11 месяцев назад ⁺¹
The sound is incredibly amazing, but unfortunately, it's not working for me, even though the Retrieval-based-Voice-Conversion-WebUI is working perfectly.
I keep getting an error indicating that there is an issue with Torch, even though I am certain that it is installed and present.
I’m using windows 10
Can you help me ?
@RolanRoyce 7 месяцев назад
Why when I download voice models from the weights site and do an inference on a vocal track from a song it sounds like crap? It'll sound normal until a certain part comes up and then it'll sound like somebody getting strangled.
@workflowinmind Год назад ⁺³
Impressive!
@NerdyRodent Год назад ⁺²
Glad you liked the things!
@denblindedjaligator5300 7 месяцев назад
using translater: When I train a voice in the new version of RVC, it looks as if it has chosen an algorithm even though I have chosen not to use pitch during the training. try training a module in version RVC0813Nvidia and then version RVC1006Nvidia. I mean when you train you can choose whether you want to use pitch or not. the sound is more natural in the old version when you don't use pitch. there is a synthesizer input module, bug how can we use this?
@denblindedjaligator5300 3 месяца назад
can i send you my dataset it is 1 Hour 7 min. and 6 sec. the index file can ony get up to 600.2 mb. could you train it on 1000 epochs, 48 khz and pitch set to false? I would like to hear the difference between a batch size p¨26 and a batch size p¨40 we could make a dropbox folder.
@macronomicus 10 месяцев назад ⁺¹
This app is so much fun!
@NerdyRodent 10 месяцев назад ⁺¹
Sure is!
@ratside9485 Год назад ⁺¹
I think that's all too nerdy for me. I'm pretty happy with the normal version. But the sometimes does not find the index file, that sucks. Right now it is much too warm in my apartment. I'm taking a break from training new models. Thanks for the tips
@dthSinthoras Год назад
Hi there, I am playing around with Tortoise for text to speach and RVC for speach to speach, but with both I have a pretty similar problem: They sound like me, but they can pronounce german sentences.
Are they just not trained for that? Are there other base-models for other languages? Or is my training data of ~45 minutes of speaking still not enough?
@NerdyRodent Год назад ⁺²
you may need to ensure that you have a balanced range of phonemes in your dataset
@dthSinthoras Год назад
@@NerdyRodent So.. it should be possible yes? :)
@YumeLabs 11 месяцев назад ⁺²
Quality, not quantity I say. I have an english voice speaking vietnamese, indonesian, japanese, french, russian, cantonese, mandarin, and dutch, so of course it can handle many known languages. It can even pronounce Middle English! I just wish there was a middle english text to speech so we could hear Chaucer style parodies of current events... But german should be very easy for it. Try singing a little maybe, or varying the enunciation of common words.
@choppergirl 10 месяцев назад
I have 128gb system ram, but only an 8gb GPU I bought yesterday on Ebay.
Ist his going to work for me with only a 3070 8gb card?
@calvin54 11 месяцев назад
what does changing the log interval from 200 to 100 do or benefit? also if im using v2 models then i would need to change the 48k_v2 json config file ?
@NerdyRodent 11 месяцев назад
It just logs more often. Who doesn’t love extra datapoints? 😉
@douglasteixeiradeabreu Год назад ⁺²
Eu não entendi muito bem o que isto faz, mas senti exatamente a dimensão da experiência. Você fez uma grande demostração, estou acioso para acompanhar o desenrolar disso, mais demostrações serão bem-vindas 😮!
@dthSinthoras 11 месяцев назад ⁺¹
I played around a lot with the Tensorboard-Graphs now. Why are you setting the smoothing this high? Because you want to see what seems to be the overall better curve? For finding the best epoch it would be better to set it to 0 right?
@NerdyRodent 11 месяцев назад ⁺²
It shows the trend, which can be difficult to tell without smoothing
@dthSinthoras 11 месяцев назад ⁺¹
@@NerdyRodent Do you have something (a Discord, GoogleDrive, whatever) to discuss curves? I would love to deepdive into that theme :)
@Seany06 Год назад ⁺²
Thanks for this sir, btw is the voice you use a voice model? lol
Does anyone know of any communities where people are sharing their vocal models for RVC?
@agiverreviga4592 Год назад ⁺²
The outputs seem to be the best but the annoying quirks are that each time you want to train or retrain a model, you have to input the settings all over again. And there's no quick or convenient method to share and publish metods, so that others can use them
@timeTegus 10 месяцев назад
when i use Ultimate Vocal Remover the vr models are very bad every voice sounds like a mikimosue and the seperation is very bad. maby i installed someting wrong idk. but the mdx models sound good. but they seem to runn on my cpu insted of my gpu
@ris_kis 11 месяцев назад
Hello and thanks for your video. I suffer a little with getting voice for training and training time (don't have it)
@denblindedjaligator5300 Месяц назад
and what batch size should i train on?
@Dante02d12 Год назад ⁺²
This is all very interesting, but I'd like a direct answer to these questions if you don't mind :
- does crepe or mangio-crepe make a huge difference in voice quality compared to harvest? I mean, is it really night and day, like comparing pm to harvest? I have a version of RVC where I don't have crepe, and I'd like to know if it's worth upgrading ^^.
- I've been using UVR5 to isolate voices, and it is indeed great... except for the quality loss. What's the current best workflow with UVR5 to not lose voice quality? Basically, which models and processing methods work best to isolate a voice from most songs?
- My voices tend to not articulate as well. I assume I'm not doing something right when I create the model. I'm using 7 minutes of voice samples extracted from a videogame, so it's professional audio quality. The voice is very recognizable, but I do feel it sometimes doesn't articulate. Is it current limitation of voice cloning tech?
Thank you for your hindsight!
@Jojo2 11 месяцев назад ⁺¹
I'm also struggling with the last one. I personally assume it's due to such a low amount of training data? But I haven't tested it. My other theory is I'm probably not extracting the vocals of songs correctly.
@limpopo171 9 месяцев назад
That is impressive quality I have to say. I wonder does it work for text to speech? or is it only for singing?
@NerdyRodent 9 месяцев назад ⁺¹
You can do tts then convert that to a new voice. If it’s not singing, probably best to keep singing out of your dataset
@MistahJ100 10 месяцев назад
I really need some help, my software decided to stop working, i made a few cover songs and they turned out cool, but now for some reason when i click convert on model inference it just stops and will not process. i had another app that converted the sounds with my models RVC-GUI-pkg
And it just does not work anymore and gives me a blank mp3 file.... Why would this happen? is there some time limit for these apps??? i see nothing on the internet about this i changed nothing with my directories as to why this would occur. PLEASE help i want to make some more covers i dont know why its not letting me
@NerdyRodent 10 месяцев назад
You may need to git pull again to get the latest version (assuming you did a normal install, not a zip file)
@lazerusmfh Год назад ⁺¹
Freaking awesome
@NerdyRodent Год назад
😀
@nathanbanks2354 Год назад ⁺¹
I haven't listened to this song since I played Steno Arcade! (Which is free to play either for download or on Steam.) Presumably you both used it because of the license; I can't remember if it was commissioned or not.
@NerdyRodent Год назад ⁺¹
Creative Commons licenses are cool - especially the ones that allow derivatives 😉
@hungluu8404 11 месяцев назад
how to change the optimizer?
@wettissue8340 10 месяцев назад
does using a better gpu with the uvr give you better results?
@NerdyRodent 10 месяцев назад
I think faster is better, so yes!
@H4zuZazu 10 месяцев назад
Is there a TTS tool that can use the self trained models?
@NerdyRodent 10 месяцев назад
As this is voice to voice, and tts generates a voice, pretty much any tts generated voice will do!
@denblindedjaligator5300 Месяц назад
if i make a module that doesn't respect pitch how can i get it to change pitch when i transbone it i have a daft punk vocoder module and when i transpose it will work the module even if it has no pitch
@NerdyRodent Месяц назад
Try including more pitch variety in your training data
@denblindedjaligator5300 Месяц назад
@@NerdyRodent can i send you the daftpunk vocoder modul
@ArtificallyIntelligentAi 11 месяцев назад
Wow!
@shabadouwilou 11 месяцев назад
is there an ai to change the lyrics of a song ? like in goes a singing voice and out comes same voice with the lyrics changed
@NerdyRodent 11 месяцев назад ⁺¹
Probably best done by humans at the moment!
@QHawk7 8 месяцев назад
*I need to translate srt file (subtitle) from English To Arabic , please , any gooood solution? with the best quality human like translation , not google translate or anything like that?*
@coldbeyond 11 месяцев назад
Can i do Text to speech with this? How can I do that? I really need TTS :)
@denblindedjaligator5300 7 месяцев назад
if I choose that a module should have no tone and I train it in the new version of RVC, I can still choose which tone algorithm to use. This means that it still uses RMVPE, i.e. the new version and the quality is not particularly good either. Hope it gets fixed. try to choose false in the old and in the new version.
@denblindedjaligator5300 7 месяцев назад
what equipment do you have? I mean how much have you paid for your computer, can you use a 12 GB graphics card, or do you have to have a 16 GB to be on the safe side.
@NerdyRodent 7 месяцев назад
The more VRAM the better, though it all depends what one does with one’s computer
@denblindedjaligator5300 7 месяцев назад
So i can train the most on 16gb?
@denblindedjaligator5300 5 дней назад
how can people train without v1 or v2'i have a model where it says none
@frattuncbas 10 месяцев назад
How can i continue my pre-trained model later? I want to train it 20000 epochs for most realistic quality but i need to run day by day my same pre trained model..Is it possible on Easygui Colab? I have Pro colab.
@NerdyRodent 10 месяцев назад
Sure, you can do that on colab though 20,000 epochs is a bit high…
@frattuncbas 10 месяцев назад
@@NerdyRodent How can i?
@flonixcorn Год назад
Great Videoi
@NerdyRodent Год назад
Thanks! Glad you liked it
@flonixcorn Год назад
@@NerdyRodent im getting a pretty robotic voice, with a lot of random high pitch "artifacts" and quick tips you can give me?
@exidion54 11 месяцев назад
Sorry if this is a stupid question, but how can I resume the training at a checkpoint, if possible?
@NerdyRodent 11 месяцев назад
Yes, you can just resume
@exidion54 11 месяцев назад
@@NerdyRodentShould I just change the pretrained models in step 3 with the checkpoint model to do that?
@NerdyRodent 11 месяцев назад
@@exidion54 simply increase the number of epochs and press train again
@MarkSKristensen 11 месяцев назад
Does anyone know if it's possible to use trained voices with text to speech somehow?
@NerdyRodent 11 месяцев назад
Yup! TTS-> voice -> voice
@TerminatorSAW2k 11 месяцев назад ⁺¹
I need a ObS Studio Live Tutorial with this voice cloning tool, please :) .
@denblindedjaligator5300 7 месяцев назад
I'm just trying to explain something more clearly via google translator: where you have to choose during the training whether you want Whether the model has pitch guidance (required for singing, optional for speech):

threaten

false. When you choose false and you have to do a Model Inference, it has chosen one of those rmvpe. that is, if I load Ai_Hoshino_TTS.pth it doesn't ignore RMVPE, but the old version does.
@toasteroven6761 Год назад ⁺¹
What does the X-axis on the graphs represent?
@NerdyRodent Год назад ⁺¹
Steps
@toasteroven6761 11 месяцев назад ⁺¹
@@NerdyRodent Thanks, one more question, does the file format used for the training (wav, mp3, flac) affect the resulting model's output accuracy and quality in a noticeable way or not? If so, what export setting do you recommend in audacity?
@NerdyRodent 11 месяцев назад
@@toasteroven6761 go for wav
@toasteroven6761 11 месяцев назад
@@NerdyRodent Welp, looks like I'm going to have to wait over 4 hours just to upload my 50 minute wav file to colab----rural internet...
So mp3 must really tank the quality then...
@elarcadenoah9000 11 месяцев назад
teach us how to use bark and install
@patricksweetman3285 Год назад
Wow
@denblindedjaligator5300 6 месяцев назад
Can I easily train on the 40 80 or do I have to have a 40 90
@NerdyRodent 6 месяцев назад
Any modern Nvidia GPU will do!
@denblindedjaligator5300 6 месяцев назад
@@NerdyRodent i mean the performance on¨the 2 altso 4080 ore 4090
@NerdyRodent 6 месяцев назад
4090 is better
@ericanderson5139 7 месяцев назад
Can it change voice on live for videos making or calls
@NerdyRodent 7 месяцев назад
Yup, it’s just about fast enough for real time
@ericanderson5139 7 месяцев назад
@@NerdyRodent How do you get the real-time up I can't see it please
@hosniabouzahra 11 месяцев назад
Hi Nerdy, thanks for the tips, would you please share a zip file of (Mangio-RVC-Fork) same as the (RVC-beta 7z) so i run it directly in my windows. Thank you
@camspider9887 9 месяцев назад
can u tell us where to change that seed?
@NerdyRodent 9 месяцев назад
It’s in the config file for your training, as shown
@denblindedjaligator5300 3 месяца назад
what batchsize do you train on
@NerdyRodent 3 месяца назад
40
@denblindedjaligator5300 3 месяца назад
@@NerdyRodent so if I can only train with 24 then I'm not getting a good quality
@NerdyRodent 3 месяца назад
@@denblindedjaligator5300 24 is fine 👍🏽
@banzai316 Год назад ⁺¹
Tips, Tricks and Bookmarks
@dontmindmejustwatching Год назад
best
@mahmood392 9 месяцев назад
Was wondering the batch size u used for this?
@NerdyRodent 9 месяцев назад
As much as your GPU can handle 😉
@mahmood392 9 месяцев назад
@@NerdyRodent I have 24gb 3090, but idk why it’s training soo slow.. I set it to a batch of 40, I left for many many hours it just hit 40epochs.. very weird on epoch 34 it took 1 hour and 44 min to do 1 epoch.. i wasn't home to notice when that happened.. now I'm on epoch epoch 44 took 2 hours and 45min to finish.. and another on took 58 min to finish.. i am very confused
@NerdyRodent 9 месяцев назад
@@mahmood392 it could be that you have a two or three hour long dataset. The bigger your dataset, the longer it will take. Usually 30 minutes is absolutely fine.
@mahmood392 9 месяцев назад
it was 10min dataset.. something was broken because some epochs took 3 hours and then the epoch before it or after it was just 2min.... anyways i ran it again with a batch of 28.. it finished in 16min, 5seconds per epoch@@NerdyRodent
@b1ll1on_ai 11 месяцев назад
hello Nerdy MASTER! why Index first, Training second ? thnks
@NerdyRodent 11 месяцев назад
Index trains fast 😉
@dthSinthoras 11 месяцев назад
I try to reproduce your optimizations, but I get this Error: AttributeError: module 'torch.optim' has no attribute 'DiffGrad'
How to fix that?
@dthSinthoras 11 месяцев назад
AdamW and RAdam are working, I just dont get DiffGrad to run as optimizer.
@dthSinthoras 11 месяцев назад
The Output with RAdam is just silence for me :O
@NerdyRodent 11 месяцев назад ⁺¹
DiffGrad isn’t a default optimiser, it’s part of pytorch optimisers
@dthSinthoras 11 месяцев назад
@@NerdyRodent I have questions regarding this, but they always disappear, so this comment just wants to see, if youtube doesnt like my questions, or if everything disappears...
@dthSinthoras 11 месяцев назад
@@NerdyRodent Ok, my testanswer seems to stay, so RUclips probably dont like code snippets here? I tried to show what I tried, because I still get the same error. I will try to show what I tried without using something that looks like code...
I did a pip install torch_optimizer and I replaced AdamW with DiffGrad.
What else is needed?
Ohh and thank you for the other answer, that seemed to be at least a good hint!
@denblindedjaligator5300 7 месяцев назад
What nvidia card is supported?
@NerdyRodent 7 месяцев назад
You’ll get the best performance with something like a 4090. Nice and fast with plenty of VRAM!
@denblindedjaligator5300 7 месяцев назад
@@NerdyRodent what about 4080 it is god? and 16 gb of ram
@NerdyRodent 7 месяцев назад
@@denblindedjaligator5300 16gb VRAM can do lots of things for sure
@denblindedjaligator5300 7 месяцев назад
@@NerdyRodent what is the lirning rate and the seed? Could you make a video about pipper tts
@009badboy 10 месяцев назад
Hi Nerdy Rodent.. do you have a discord comunity?
@--JYM-Rescuing-SS-Minnow Год назад
Ittai dare ga watashi no doa o nokku shite iru nodeshou ka? Sā, mō koko ni wa konaide kudasai yoru osoi no ga miemasen ka? Totemo tsukarete ite kibun ga yoku arimasen watashi ga nozomu no wa hitori ni naru koto dake chikadzukanaide, watashinoie ni shin'nyū shinaide kudasai soto de burabura shite itara saikōdesu haitte konaide, nigete kakureru dakedesu imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka? Imanara dare ni narerudeshou ka?
@timeTegus Год назад
very nerdy :)
@GATUK1773R 11 месяцев назад
DAMM, WHY NO MAKE A WINDOWS NODE CODE A SCREEN A BUTTONS ,STAR,COPY,CHANGE,OK,CONVERT?
@PurpleRhymesWithOrange Год назад ⁺¹
Pretty impressive that it can make a rat talk with a British accent!
@NerdyRodent Год назад
😉
@RedDragonGecko Год назад
"Download the weights" Weights? What weights? Where?
@NerdyRodent Год назад ⁺¹
From the link to the weights on the github page ;)
@QHawk7 8 месяцев назад
*Open Source & Free: Tenacity , instead of Audacity*
@NerdyRodent 8 месяцев назад
RIP Audacity 🪦
@Ravisidharthan 2 месяца назад
Why my trained sound not sounds not as intended? What can go wrong?
@NerdyRodent 2 месяца назад ⁺¹
One thing that could go wrong is to have a very poor quality data set
@imagesas 11 месяцев назад
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capacity; 3.03 GiB already allocated; 0 bytes free; 3.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@NerdyRodent 11 месяцев назад ⁺¹
You’re better off using colab if you’ve got 4GB VRAM. Even games today will struggle on just 8GB!
@user-ik8ew7bz8o 11 месяцев назад
Is it possible to use the files without having the index? Because many people have shared without this file and it seems to work?
@NerdyRodent 11 месяцев назад ⁺¹
Yup, it just sounds a lot better with the index file too in my experience
@user-ik8ew7bz8o 11 месяцев назад
@@NerdyRodent Ok thanks. I tried several times to copy voices, but without the index it never worked. I also tried to use RVC-beta which launches quickly and has a simplified interface. But you have to use zip files, with inside you need the "index" file. Is there a way to create this "index" file?
@NerdyRodent 11 месяцев назад
@@user-ik8ew7bz8o not without the original dataset

Следующие

Автовоспроизведение

Omost = Almost AI Image Generation from lllyasviel