so much respect to everyone who is making this work.. the amount of problems I'm running into is insane, haha. I hardly know where to start after hours of being into this.
Nothing is placed here after training a model though. Do I manually copy the D_*.pth or G_*.pth over from logs, or something? If I try that and click Refresh Voice List and Index Path, the new model appears in the Inferencing Voice list, but when I select it I just see a red 'Error' all over the UI: i.imgur.com/QNQUpmq.png
@@pingusmcdingus5 In my case, the .pth file is placed there automatically if it successfully finished the training without any errors. If it's not the case for you, there might be something wrong in the middle of the process. You might want to try retracing the steps or redo it from scratch. The one thing I did differently from this video is that my audio file for training is not split up into multiple short .wav files, but I just combine them into a single 20-minute file. I've compared both the cut and uncut audio and the result is much better with the uncut 20-minute audio.
@8:03 if your process fails when you try to process the input data one possible explanation is that the path for your folder includes a space. That is what hung up my first couple of attempts. make sure your file path doesn't include any spaces for easiest handling.
Still not working for me. It says :( ['trainset preprocess_pipeline_print.py', 'C:\\RVC-beta-0528\\RVC- beta0717\\voice\\me', '40000', '12', 'C:\\RVC-beta-0528\\RVC- beta0717/logs/me', 'False'] C:\RVC-beta-0528\RVC-beta0717\voice\me/myself.m4a->Suc. end preprocess C:\RVC-beta-0528\RVC-beta0717\voice\me/myself.m4a->Suc. end preprocess
I wish i could say the same. I'm just a singer. I want a program that installs, I hit the .exe file, it opens, I put the source files in and voila, new voice. Don't know why that should be so hard.
@@Jarods_Journey I can't stress enough how important it is to absolutely tell people that the training process will take a long time. I thought my progress was just stuck but no, it's just taking a long while!
12:42 1. Open file explorer to the folder that has a file who's path you want 2. Press Alt+D 3. Press End 4. Type a backslash \ 5. Start typing the name of the file, look for the autocomplete with the correct name, press down arrow until the correct file is highlighted 6. Press Ctrl+C
For those who would like to know the harmony bit in 5:11 Harmony is when there's more than one note being sang at the same time It's kinda like chords but for vocals. HP5 Helps with separating harmony but it will be less clear on the voice compared to HP2. The newer RVC2 also has dereverb & deecho which I also highly recommend using to make the vocal separation even more clear for songs where the voice has a lot of reverb / echo. I'd say just mess around with it a bit and choose to your liking depending on the song. Anyways have nice day :D
Thank you bro, this was the best tutorial so far on how to train the voices, so many other tutorials were not clear on how to set everything up and also get the index file which was something that I had trouble with for the longest.
@13:28 you pick an index file v2, in my drop down box it only has 3 different v1 files to choose from? It doesnt seem to create an index file when I train my voice.
thank you sooo much, all the other tutorials were so confusing and this was simple and fast, encountered some problems while running the rvc command prompt since i dont have a gpu, but i installed cuda and python and that fixed it. its like now you need to know programming and stuff but this tutorial was easy, fast and simple. keep up the good work.
Hello thanks for the video. It piqued my curiosity and now I want to try RVC myself. Unfortunately, I'm running an AMD GPU (6800xt) but upon checking the releases an option for AMD users is present in updated0814v2. My problem now is that when I try to follow your steps, RVC does not detect my GPU. For example, at step 2b as in 8:21, the options to select a GPU are not present. The option to input a GPU index is there and I've tried putting in "0", "1", "2" and "0-1-2" but when pressing one-click training it says: "NO GPU DETECTED: falling back to CPU - this may take a while" Do you know a way for it to detect my GPU?
9:33 when I train embeddings for stable diffusion (image generation) I have it save an embedding file every 50 steps so I can check the loss and strength of them with scripts and test a few
I've been finding with these speech models that the intermediary saves don't really exhibit abilities better than the final model, so I really just save the last one only in order to save space. I haven't found one yet that has been overtrained.
i have some questions. When I download other people's voice modules, there is a file called something like traint.index, it's a file you have to use. The same goes for total_fea. I have also seen that there are pth files in the log folder itself.
These should go into the log folder underneath the "experiment" or "speaker" name that you want to use. So if the name is john, the john.pth goes into weights and the index goes into the log/ where you have to create a john directory and place the index into.
There is a little mistake in this video which I want to point out, after you finish preprocessing and feature generation, you tell to click on One-Click-Training. This is unnecessary because that button will do the preprocessing and feature generation AGAIN, which you already did before. So when it's done, click on "Train Model".
Thanks for the great tutorial! I found a couple things that might be helpful to others. For extracting the archive I use the official 7Zip software, its free and open source and will save you some hassle. Next thing, is regarding the batch size. I have a 3090ti which has 24GB of VRAM I find a value of 32 makes use of 21.7GB of the VRAM and leaves a bit for OS related stuff. You don't want to go overboard with batch size of 40, or the gpu will start swapping to system RAM, and significantly affect the time it takes to train even if you have fast RAM, it's still an I/O cycle you can avoid between GPU / System RAM. I recommend looking at task manager or using a tool like nvidia-smi to check the GPU VRAM use and experiment with batch size to find the best value for your card in order to get much faster training.
if you want to move the folder faster, just rename the top folder, then cut and paste the lower into the top-level. When you cut and paste the contents, explorer knows it only MOVES the folder, so no copy wait.
can someone plz fix this error , jarods plz tell thisError : ValueError: invalid literal for int() with base 10: 'voice' this error i get when i do process data its step2a error : when i put my local URL into path folder
9:14 So after trying this by myself I found out that if you select "no" at "Whether to save only the latest ckpt file" then your disk may be full after a while if you don't have much space and train many models.
Hi, thanks for your video. Are there already some pre-trained models for RBC? Also, is there a reason you prefer to train locally rather than on collab?
I'm not sure about fully pre-trained models, you'll have to take a look around the internet to see. Colab is a nightmare to work with for debugging, etc and unless you made the code, trying to debug it isn't that fun. If I can work locally, I much prefer it and my hardware allows for it.
If i don't have any problems but i want to keep training my model i just do the same thing that you said on the minute 10:50 but increasing the epoch, right?
Great tutorial! Unfortunately, I seem to be having a problem with step 2a: My attempt to process the data was unsuccessful, and the output message came up blank! What am I doing wrong?
im having a lot of trouble... opening the go-web file doesnt show the language option, and then has lots of stuff and at the end says to press any button to continue. After i do that it closes, and when i searched up the localhost:7897 it says i cant reach the page..
thanks mate, all of the other people i looked up as tutorials were too complicated, a month ago i viewed your so vits svc fork tutorial too, you are one of the best teachers in the world, i can understand your videos perfectly and my native language isnt even english!
Since i’m relatively new to this, how would you use this rvc for just cloning a voice? Do I just leave out the parts in model inference about the pitch and music related things?
That was very nice to follow along, thanks! Any interest in showcases bark ai? I think it's a pretty interesting way of doing tts but I don't think it's very well explained in many places or left out a lot that kinda confused me. Especially when it comes to getting decent results. Do think the prompting idea is really intriguing though
My quick experience with bark is that it's still in very early stages, excited to see where it goes though! I might have to do a more throughout test of it, but tortoise tts by far is the most promising and easiest to use
@@Jarods_Journey That's definititely true. Tortoise is incredible!. Really hope bark will update or get some cool successors with a similar but more stable approach. Making it generate laughs sighs etc. is spooky and very fun.
@@krysidian I'm definitely interested in the laughing part. That's one additional touch to AI that is lacking in voices and when that gets fleshed out, things are gonna get interesting xD!
Great job!. I have a question for you... How many audios do you recommend me to generate the model, and they are not problem if the audios have some background sound?
Thanks for your very good tutorial Jarod. I still have a question. What do the values "loss_disc", "loss_gen", "loss_fm", "loss_mel" and "loss_kl" mean when training? Which values are indicating a good trained model? Are lower values better?
10:23 After completing the training process, I received an error message saying that the specified file or directory could not be found. Specifically, the error stated that the file named "trained" (or a similar file) could not be located. What did I do wrong? Edit: It's ok now, I figured it out.
Thanks a lot for the video! One question: 40kHz is a pretty unusual sample rate so I want to use 48kHz (which now also seems to work with v2). Also I slice up the training vocals manually with a DAW (Cubase) into up-to-10-seconds snippets. Do I have to export the snippets in 48 kHz already from the DAW or would the usual 44,1 kHz be alright and only the output (the resulting file) would be in 48 kHz?
It'll be fine, I believe RVC resamples your audio already using ffmpeg to the correct SR. I actually haven't verified this, but since it handles my datasets when using either 40k or 48k, that means it doesnt really matter :)
@@Jarods_Journey Thanks for your fast reply! So there's a tiny bit of hope that if you feed it 48kHz already it might skip the resampling which could probably result in higher quality oputput 🙂.
While making dataset, if i am taking vocals from a singer! Do i keep to keep key of vocals same? Or i can add multiple audios of different songs to train model of particular singer?
Hi, thanks for the tutorial. I got stuck at the training process. I received a message saying this: RuntimeError: The expanded size of the tensor (12800) must match the existing size (4040) at non-singleton dimension 1. Target sizes: [1, 12800]. Tensor sizes: [4040] Before I got this message, I was getting the "Cuda out of memory", even though I have 32GB of RAM. I cut the audio samples into smaller bits under 10 seconds, and now I have the expanded size of the tensor error. What did I do wrong?
I'm having problems at the 10:01 part, it says: "FileNotFoundError: [WinError 3] The Sytem cannot find the specified path C://Users//User//Downloads/RVC-beta-v2-0528/logs/Character/3_feature256" It's frustrating, honestly.
This means there was an issue with the feature extractinon step and it didn't finish before trying to train. This step can take anywhere up to an hour to complete depending on system specs and sample size.
@@denblindedjaligator5300 Yeah, i did this some time after i made this comment, it's just stops as if nothing happened. To be honest, i'm just training the models in colab, and making the audios with the local version, i'm fine with that.
Thank you very much! My Inferencing voice list is empty. Where to put the downloaded voice models? And epochs. It's it worth to use 1000 epochs instead of 200 to increase the quality?
I believe downloaded voice models should go into the weights folder, as long as they're from RVC. As for epochs, if you get good results at 200, I don't see much reason to go to 1k. If you have enough voice samples, 200 should be relatively good. I would listen to them per 100 epochs and see what you think is best (as it's always dependent on your data and how much of it you have)
Thank you for this video!! When i trying to train i get this error: sr = int(sys.argv[2]) ValueError: invalid literal for int() with base 10: 'Yona\\Desktop\\RVC-beta\\RVC-beta-v2-0528\\voice\\Me' You know howwhat im doing wrong?
For the voice me folder, is it just audio recordings of my own voice? if so how many do i need to include and what length? Thanks In advance youre a massive help dude
Followed the guide but ended up with completely different files and I have no clue how to install the damm program. At 02:00 minutes I couldn't follow along anymore.
thanks for your video, there are a few other videos on the subject and I find that yours is better explained nevertheless I still have to deal with several errors. First I had "Cuda Out of memory", so I lowered the batch to the minimum, now I have another error which is: "RuntimeError GET was unable to find an engine to execute this computation". My audio samples are a bit long (a few minutes) and they are in 32Bits float at 44.1Khz but I only have 4 samples... should I divide them into several parts? thanks in advance. Editv1 : I tried many time and also to cut in differents parts, reduce the size and i still get the RuntimeError even with 2 small sample (16bit 44.1khz) than less than 10 secondes… i don’t understand Editv2: Also i wonder if you know how to text-to-speech with this tool ?
Ah, depends. If it says cuda out of memory then yeah, you'll need smaller batch sizes, shorter data, or a larger GPU. I actually don't remember if I responded to you somewhere else though lol
thx for this good tutorial Unfortunatly I had a an error after 2 s and I don't understand why I did wrong. if data.dtype in [np.float64, np.float32, np.float16]: AttributeError: 'NoneType' object has no attribute 'dtype'
Another commenter had this issue but I haven't encountered it yet and haven't found a way to reproduce it. You might be able to find others who are looking to get this issue resolved here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues?q=is%3Aissue+AttributeError%3A+%27NoneType%27+object+has+no+attribute+%27dtype%27+is%3Aopen Could be related to the training process, trying to find files, etc
@@Jarods_Journey I applied your remark on your good short File Not Found: feature_768, and I succeed to avoid this eror anymore. Thx alot - I start to follow your channel last few days put your subject are very interesting - great job
Thank you for taking the time to make this tutorial! It was so easy to follow. :) Could I ask you to make a comment or tutorial on how to re-train a previously trained voice? I can't find that information anywhere.
4:13 I can't seem to get into the localhost page, also, is localhost necessary to make the custom vocal models? I haven't really gone through the whole video more or less skimmed it just to see how to get the custom voice models. -_-
when i click process data it doesnt fill anything and cmd comes back with "valueError: invalid literal for int() with base 10: 'Changer'" what do I do?
i got something like this result = torch._C._nn.leaky_relu(input, negative_slope) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.00 MiB (GPU 0; 6.00 GiB total capacity; 5.27 GiB already allocated; 0 bytes free; 5.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF what does it mean> how can i fix it?
what does 3:01 saying "then what I did was just move all of these from the zip folder into the folders that I installed with vs code" mean? Where can i find the folder?
That was just an insert for people who want to install it manually. I don't really recommend you do it this way, just download the .7zip file that is shown in the vid in the releases area and that includes everything you need. Here is another comment I left for another user who wanted to do this: "So when you do git clone etc., the raw repo doesn't have any pytorch files inside of pretrained, pretrained_v2, uvr5_weights, or hubert_base. So what you have to do is move those models into those folders (doesn't really matter where you get them from, but I just took them from the zip folder because it was easier than downloading each individually.) For hubert_base, it literally just sits in the parent directory or the cloned repo. The other ones go into their respective folders."
I failed to get the myself python file in weights folder after one-click training. Should I reset the process at 7:19 and if so do I need to delete certain files? Correct me if I'm wrong but I think this is the error? I'm not familiar with coding. RuntimeError: Calculated padded input size per channel: (2). Kernel size: (3). Kernel size can't be greater than actual input size 98_1.wav-contains nan 9_2.wav-contains nan all-feature-done
I would rerun the preprocess again for all of your data and then try again, but check this out here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/484
I have a weaker GPU (GTX 1660 Ti) and its taking about half an hour for each epoch, i put the settings to match the reccomended starting settings (at 9:13 ), is this normal? Thanks
13:40 Im getting an error when I try to convert my audio, does anyone have an idea why? I have the vocals in a wave file and I have the correct location, as well as my inferencing voice as well.
If you want to install it with python, you run those lines and then you'll still need to download all of the pretrains from hugging face. Because I don't like downloading things 1-by-1 as I couldn't find a download entire folder option on hugging face (I'm a hugging face noob), I just downloaded the zip and moved them over lol.
@@Jarods_Journey yeah same here when getting the updated ControlNet models for Stable Diffusion, there's over a dozen, all of them GIGS of data. I can't figure out how to batch download them. But that one part of the tutorial is lost on me.... Tell me, do I just unzip the contents of the ZIP: RVC-beta.7z is that it? Or do I have to move it's folders around?
So when you do git clone etc., the raw repo doesn't have any pytorch files inside of pretrained, pretrained_v2, uvr5_weights, or hubert_base. So what you have to do is move those models into those folders (doesn't really matter where you get them from, but I just took them from the zip folder because it was easier than downloading each individually.) For hubert_base, it literally just sits in the parent directory or the cloned repo. The other ones go into their respective folders.
Why the "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 4: invalid start byte" error appeared in cmd.exe after I clicked "One-click training"?
Is 1000 epochs overkill? Will it have diminishing returns compared to just keeping it up to 300? I really don't see a standard recommended epoch total anywhere, the answer varies. I usually use 500, but I honestly don't know if that's fine since I just use RVC for SillyTavern and haven't tried it just on itself yet, hence I don't know how to evaluate if the results are better or not .___.
Hey I am getting "cuda out of memory. tried to allocate 20.00 mib (gpu 0; 4.00 gib total capacity; 2.88 gib already allocated; 0 bytes free; 2.90 gib reserved in total by pytorch) if reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. see documentation for memory management and pytorch_cuda_alloc_conf" as an error. Can you help me navigate through this.
When I click in the bat file it says it was not possible to find the determined path so when I try to search the localhost on the web it gives an error...
For some reason it doesn't always create a .pth file so just in case click on Train Feature Index for the model you're creating and it might solve your issue
Yeah bro, I recently have this problem. This happens because you surely try to use other data to train your AI that is probably not the original one that the AI started training. To solve this problem you only need to create other AI renaming the experiment and then you charge the new data you want to use at step 2a and then process the data. Then go to step b and extract the feature normally. Then go in your disc where you save your RCV documents, search logs and look for the carpet of the old AI, and copy all the archives that say D and G, then paste them in the new carpet of your new AI. When you have all this done you could train as normally and everything should be right
Thanks. I just have to make voice samples. I guess I am supposed to sing something. Is that correct? The UVR software works great. I was able to stem Suno ai. m also looking at Jen music ai and Lalal ai which uses celebrities. This was more intense than what I expected. I see that Mac has a download app. I just found an App of Google App Store. I will look through your other videos for more lessons. Thanks.
Check out this issue here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/529 As well, you.migbt be able to find others that have had this issue on the main issues tab here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues?q=is%3Aissue+is%3Aopen+nonetype
Jarods Journey Why cant we just upload for example an existing split song file from inside the folder that is just the singing voice with no music. Also Why copy and paste the whole address? Please answer, because I dont usually get a response when i ask a simple question.
Hello, I would really appreciate if you help me with this issue I'm having. Every time I try to convert the music file into the vocals and instrumental (the process you start here 5:55), I always get this error message at the end. Can you please help me resolve this issue? RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@@sigh7731 Ah gotcha, well the possible fix for this is a little too involved and I'm not even sure if it would work for RVC, so you may be out of luck. The only option you have is to run on CPU, upgrade your GPU, or run RVC via Google Collab. Here is the article that references the out of date GPU issue: discuss.pytorch.org/t/solved-pytorch-no-longer-supports-this-gpu-because-it-is-too-old/15444
Hello again! Do you know what the issue could be when the preprocess stops going in the middle? No error message or anything shows up, after successfully processing many of the vocal samples it just stops going.
Check the logs folder, should have 0 and 1 and if there are contents in there, it finished. Not sure if it stops in the middle, it would output an error.
@@Jarods_Journey Ty for taking the time to reply again! The logs folder did have 0 and 1 in it and the feature extraction worked, so I tried to train the model but it never progressed past step 1 (ie it never reached the epoch count). I've trained several models before with no problem, so I'm not sure what the issue is. I'll try to experiment a bit and see if I can figure out what the issue is, and if I figure it out I'll report back.
Hey man, thanks for the tutorial. I was wondering how to match the key of the instrumental to the output voice? I converted a male song to a female cover, but I don't know how to change the instrumental pitch to match with the female voice.
Hello Jarods, after i put the directory for the voice, for voice training and clicking "process data" i got an error in cmd.exe : ValueError: invalid literal for int() with base 10: i have a a 3070ti
thanks bro generous sharing! one quick question, when we restore the previous model, how can we continue the training? do we need to go through all step 1 to step 3 ? should we update the "Load pre-trained base model G path" ?
6:27 Did I miss something I use my own record voice (Harvard sentences) but I can not train it in Gradio {{Do I need to use my voice that I train in so-vits-svc-fork?}} if yes I need to download files from my folder "me" yeah? or just use my record voice
In the video, you have to place the path to where all of your audio files are, in this case, your recorded voices. In that folder you place to path to, all your files need to be in there.
You'll have to install it manually by cloning the github repository and manually downloading the weights. A way on how you might go about it here is here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/README.en.md but I recommend you don't go with poetry and just stick with conda or venv.
thank you for the video, it's really informative but i have an issue: when training the voice it doesn't generate ".pth" files in the weights folder, any way to fix that?
Thanks for sharing. But now I run into a problem. Could I just use pretrained models instead of training models myself? But on RVC WebUI, I couldn't figure out how.
@@Jarods_Journey I left it an hour but nothing changed, not sure what settings I should have tried I have Ryzen 7 5800x 8 core processor, Rog strix x570-f gaming baseboard, and an rtx 3070 GPU?
@@Hestia3332 ah this should be able to run it. Try cutting your dataset so that it only totals 10 minutes or less. You might also wanna check the GitHub issues tab too to see if there are any present issues
Hi, i'm not getting the .pth file in the "weight" folder, so therefore my "Inferencing voice" remains empty with nothing listing up. I don't know what could be wrong as I've made all the steps as you listed in the video...
so much respect to everyone who is making this work.. the amount of problems I'm running into is insane, haha. I hardly know where to start after hours of being into this.
For those of you looking for the "weights" folder in the main RVC directory, as of RVC1006, it's inside the "assets" folder.
Nothing is placed here after training a model though. Do I manually copy the D_*.pth or G_*.pth over from logs, or something?
If I try that and click Refresh Voice List and Index Path, the new model appears in the Inferencing Voice list, but when I select it I just see a red 'Error' all over the UI: i.imgur.com/QNQUpmq.png
@@pingusmcdingus5 In my case, the .pth file is placed there automatically if it successfully finished the training without any errors. If it's not the case for you, there might be something wrong in the middle of the process. You might want to try retracing the steps or redo it from scratch.
The one thing I did differently from this video is that my audio file for training is not split up into multiple short .wav files, but I just combine them into a single 20-minute file. I've compared both the cut and uncut audio and the result is much better with the uncut 20-minute audio.
Thx for advice
thanks! :)
@@raykrislianggiplease how did you combine your wav files into one? Thank you
@8:03 if your process fails when you try to process the input data one possible explanation is that the path for your folder includes a space. That is what hung up my first couple of attempts. make sure your file path doesn't include any spaces for easiest handling.
thank you! I took the spaces out of the song name and it worked for me!
Thank you m8
Still not working for me. It says :(
['trainset preprocess_pipeline_print.py', 'C:\\RVC-beta-0528\\RVC- beta0717\\voice\\me', '40000', '12', 'C:\\RVC-beta-0528\\RVC-
beta0717/logs/me', 'False']
C:\RVC-beta-0528\RVC-beta0717\voice\me/myself.m4a->Suc.
end preprocess C:\RVC-beta-0528\RVC-beta0717\voice\me/myself.m4a->Suc. end preprocess
Finally a tutorial that even I can understand. It's so stupid that most of the tutorials are made as everyone was that tech savvy. Thank you so much.
Appreciate it 🤟🤟
I wish i could say the same. I'm just a singer. I want a program that installs, I hit the .exe file, it opens, I put the source files in and voila, new voice. Don't know why that should be so hard.
@@Jarods_Journey I can't stress enough how important it is to absolutely tell people that the training process will take a long time. I thought my progress was just stuck but no, it's just taking a long while!
@@linuxtuxvolds5917I will wait as long as it takes. If it means I get to sound like someone's voice I really enjoy!
@@Jarods_Journey it says "No supported Nvidia GPU found, use CPU instead" but it still opened
12:42
1. Open file explorer to the folder that has a file who's path you want
2. Press Alt+D
3. Press End
4. Type a backslash \
5. Start typing the name of the file, look for the autocomplete with the correct name, press down arrow until the correct file is highlighted
6. Press Ctrl+C
Or you could Shift-Rightclick to unhide "Copy As Path" option
@@Optimus97 I prefer to use the mouse as little as possible
@@RobertJene I can kinda see what you're saying, especially with the delay of the context menu in Windows 10/11.
@@fluffsquirrel any keyboard sequence you do will save time not reaching for the mouse
@@RobertJene I think this is generally true, although the less sequences the better, if possible.
For those who would like to know the harmony bit in 5:11
Harmony is when there's more than one note being sang at the same time
It's kinda like chords but for vocals.
HP5 Helps with separating harmony but it will be less clear on the voice compared to HP2.
The newer RVC2 also has dereverb & deecho which I also highly recommend using to make the vocal separation even more clear for songs where the voice has a lot of reverb / echo.
I'd say just mess around with it a bit and choose to your liking depending on the song.
Anyways have nice day :D
Thanks for sharing ! Small tip: using cut/paste instead of copy/paste allows moving the folder instantaneously ;)
And saves HD space.
Thank you bro, this was the best tutorial so far on how to train the voices, so many other tutorials were not clear on how to set everything up and also get the index file which was something that I had trouble with for the longest.
Quick tip in minute 13:03 you can do shift+ right click then a other menue pops up and you can click on copy path in the poped up menue
I'm going crazy with Jarod's channel 😂
I'm that off the cliff with it that I'm running into rewatching old videos😂
11:23 the pth thing doesn't appear for me I have no clue what could be wrong
@13:28 you pick an index file v2, in my drop down box it only has 3 different v1 files to choose from? It doesnt seem to create an index file when I train my voice.
Nice one, keep up the good work. Your instructions is very clear and helpful compared to others. 👍 ✨
thank you sooo much, all the other tutorials were so confusing and this was simple and fast, encountered some problems while running the rvc command prompt since i dont have a gpu, but i installed cuda and python and that fixed it. its like now you need to know programming and stuff but this tutorial was easy, fast and simple. keep up the good work.
Hello thanks for the video. It piqued my curiosity and now I want to try RVC myself. Unfortunately, I'm running an AMD GPU (6800xt) but upon checking the releases an option for AMD users is present in updated0814v2. My problem now is that when I try to follow your steps, RVC does not detect my GPU. For example, at step 2b as in 8:21, the options to select a GPU are not present. The option to input a GPU index is there and I've tried putting in "0", "1", "2" and "0-1-2" but when pressing one-click training it says: "NO GPU DETECTED: falling back to CPU - this may take a while" Do you know a way for it to detect my GPU?
I'm not too sure unfortunately, you might have to check on their githubs issue area to see if anyone else is running into it.
Hi, did you got any solution for that?
8:11 it will not process data. It starts but then stops and there is an orange line around the output any reason why?
9:33 when I train embeddings for stable diffusion (image generation) I have it save an embedding file every 50 steps so I can check the loss and strength of them with scripts and test a few
I've been finding with these speech models that the intermediary saves don't really exhibit abilities better than the final model, so I really just save the last one only in order to save space. I haven't found one yet that has been overtrained.
damn as a complete beginner coming to this channel to have it being explained like this was really..... interesting....
i have some questions. When I download other people's voice modules, there is a file called something like traint.index, it's a file you have to use. The same goes for total_fea. I have also seen that there are pth files in the log folder itself.
These should go into the log folder underneath the "experiment" or "speaker" name that you want to use. So if the name is john, the john.pth goes into weights and the index goes into the log/ where you have to create a john directory and place the index into.
@@Jarods_Journey but i meen the traint.index. And why is there a modul in the log folder and a detail file
RVC is best for me thanks Jarod take care
Well convinced after the preview. Hope you can have video on text to own vocal speech soon.😊
I have succesfully trained voice but there is no index file in voice name folder, in weights folder pth file is there what to do...nice video
There is a little mistake in this video which I want to point out, after you finish preprocessing and feature generation, you tell to click on One-Click-Training. This is unnecessary because that button will do the preprocessing and feature generation AGAIN, which you already did before. So when it's done, click on "Train Model".
Thanks for the great tutorial! I found a couple things that might be helpful to others. For extracting the archive I use the official 7Zip software, its free and open source and will save you some hassle. Next thing, is regarding the batch size. I have a 3090ti which has 24GB of VRAM I find a value of 32 makes use of 21.7GB of the VRAM and leaves a bit for OS related stuff. You don't want to go overboard with batch size of 40, or the gpu will start swapping to system RAM, and significantly affect the time it takes to train even if you have fast RAM, it's still an I/O cycle you can avoid between GPU / System RAM. I recommend looking at task manager or using a tool like nvidia-smi to check the GPU VRAM use and experiment with batch size to find the best value for your card in order to get much faster training.
Finally, a tutorial that doesn't fly 5 miles over my head
If you can read this:
.pth files go in the folder "weights"
.index and others go to "logs" under the voice name ex: Logs\EthanWinters
if you want to move the folder faster, just rename the top folder, then cut and paste the lower into the top-level. When you cut and paste the contents, explorer knows it only MOVES the folder, so no copy wait.
can someone plz fix this error , jarods plz tell thisError : ValueError: invalid literal for int() with base 10: 'voice'
this error i get when i do process data
its step2a error : when i put my local URL into path folder
9:14 So after trying this by myself I found out that if you select "no" at "Whether to save only the latest ckpt file" then your disk may be full after a while if you don't have much space and train many models.
Hi, thanks for your video. Are there already some pre-trained models for RBC? Also, is there a reason you prefer to train locally rather than on collab?
I'm not sure about fully pre-trained models, you'll have to take a look around the internet to see. Colab is a nightmare to work with for debugging, etc and unless you made the code, trying to debug it isn't that fun. If I can work locally, I much prefer it and my hardware allows for it.
If i don't have any problems but i want to keep training my model i just do the same thing that you said on the minute 10:50 but increasing the epoch, right?
Correct :)!
What is that song playing @2:15? Sounds like it was created on a 32bit snes synth.
The artist is しゃろう (sharou) with the song here: ruclips.net/video/JAC2KCbbvmc/видео.html
@@Jarods_Journey thank you
I’m going to try that voice cloning though, sounds interesting. So if I can modify the voice a bit and not make it exactly like the artist
@@guytisdale yeah, if you pitch changes it or did your own filtering, you could get a completely new voice
@@Jarods_Journey awesome
Great tutorial! Unfortunately, I seem to be having a problem with step 2a: My attempt to process the data was unsuccessful, and the output message came up blank! What am I doing wrong?
This happened to me too but it worked when I removed spaces from my experiment name. If it's not that then idk
i recived this
Unfortunately, there is no compatible GPU available to support your training.
how to solve this problem.
For those who are having trouble choosing where the download goes you can right click it and choose save link as
im having a lot of trouble... opening the go-web file doesnt show the language option, and then has lots of stuff and at the end says to press any button to continue. After i do that it closes, and when i searched up the localhost:7897 it says i cant reach the page..
thanks mate, all of the other people i looked up as tutorials were too complicated, a month ago i viewed your so vits svc fork tutorial too, you are one of the best teachers in the world, i can understand your videos perfectly and my native language isnt even english!
Since i’m relatively new to this, how would you use this rvc for just cloning a voice? Do I just leave out the parts in model inference about the pitch and music related things?
That was very nice to follow along, thanks!
Any interest in showcases bark ai? I think it's a pretty interesting way of doing tts but I don't think it's very well explained in many places or left out a lot that kinda confused me. Especially when it comes to getting decent results. Do think the prompting idea is really intriguing though
My quick experience with bark is that it's still in very early stages, excited to see where it goes though! I might have to do a more throughout test of it, but tortoise tts by far is the most promising and easiest to use
@@Jarods_Journey That's definititely true. Tortoise is incredible!. Really hope bark will update or get some cool successors with a similar but more stable approach. Making it generate laughs sighs etc. is spooky and very fun.
@@krysidian I'm definitely interested in the laughing part. That's one additional touch to AI that is lacking in voices and when that gets fleshed out, things are gonna get interesting xD!
Am i missing something? Did he go over how to add the newly converted vocals back to the instrumental?
Great job!. I have a question for you... How many audios do you recommend me to generate the model, and they are not problem if the audios have some background sound?
10 minutes or more of high quality audio. You need to split the background from the audio samples and can check my latest video on that
This abosulte legend amongst men
Thanks for your very good tutorial Jarod.
I still have a question.
What do the values "loss_disc", "loss_gen", "loss_fm", "loss_mel" and "loss_kl" mean when training? Which values are indicating a good trained model? Are lower values better?
A downloads slope on the graph is better, or lower values. You wanna look for total loss and train till that's as low as possible preferably
10:23 After completing the training process, I received an error message saying that the specified file or directory could not be found. Specifically, the error stated that the file named "trained" (or a similar file) could not be located. What did I do wrong?
Edit: It's ok now, I figured it out.
what did you do?
@@jofejofeson9932 I reinstalled the whole file again and repeated the process, and then it was fine.
Thanks a lot for the video! One question: 40kHz is a pretty unusual sample rate so I want to use 48kHz (which now also seems to work with v2). Also I slice up the training vocals manually with a DAW (Cubase) into up-to-10-seconds snippets. Do I have to export the snippets in 48 kHz already from the DAW or would the usual 44,1 kHz be alright and only the output (the resulting file) would be in 48 kHz?
It'll be fine, I believe RVC resamples your audio already using ffmpeg to the correct SR. I actually haven't verified this, but since it handles my datasets when using either 40k or 48k, that means it doesnt really matter :)
@@Jarods_Journey Thanks for your fast reply! So there's a tiny bit of hope that if you feed it 48kHz already it might skip the resampling which could probably result in higher quality oputput 🙂.
While making dataset, if i am taking vocals from a singer! Do i keep to keep key of vocals same? Or i can add multiple audios of different songs to train model of particular singer?
As long as its the same singer, you can add as many songs from them as you like
@@Jarods_Journey thank you for clarifying 🙏🏻
Hi, thanks for the tutorial. I got stuck at the training process. I received a message saying this:
RuntimeError: The expanded size of the tensor (12800) must match the existing size (4040) at non-singleton dimension 1. Target sizes: [1, 12800]. Tensor sizes: [4040]
Before I got this message, I was getting the "Cuda out of memory", even though I have 32GB of RAM. I cut the audio samples into smaller bits under 10 seconds, and now I have the expanded size of the tensor error. What did I do wrong?
same issue
it means that the if it finishes its going to take up too much space so just turn batch size down to fix
Cuda Memory is VRAM Its different than regular RAM
i don't see a pitch extraction algorithm, everything else is there. Any solutions?
I'm having problems at the 10:01 part, it says:
"FileNotFoundError: [WinError 3] The Sytem cannot find the specified path C://Users//User//Downloads/RVC-beta-v2-0528/logs/Character/3_feature256"
It's frustrating, honestly.
This means there was an issue with the feature extractinon step and it didn't finish before trying to train. This step can take anywhere up to an hour to complete depending on system specs and sample size.
wride 0 in the qpu indexes
@@denblindedjaligator5300 Yeah, i did this some time after i made this comment, it's just stops as if nothing happened.
To be honest, i'm just training the models in colab, and making the audios with the local version, i'm fine with that.
Have you tried the realtime voice changing? Ive been trying to get that working but had some issues, i think its an svc fork though
Have not gotten to try that yet on either repos unfortunately :/
Thank you very much! My Inferencing voice list is empty. Where to put the downloaded voice models?
And epochs. It's it worth to use 1000 epochs instead of 200 to increase the quality?
I believe downloaded voice models should go into the weights folder, as long as they're from RVC. As for epochs, if you get good results at 200, I don't see much reason to go to 1k. If you have enough voice samples, 200 should be relatively good. I would listen to them per 100 epochs and see what you think is best (as it's always dependent on your data and how much of it you have)
Thank you for this video!!
When i trying to train i get this error:
sr = int(sys.argv[2])
ValueError: invalid literal for int() with base 10: 'Yona\\Desktop\\RVC-beta\\RVC-beta-v2-0528\\voice\\Me'
You know howwhat im doing wrong?
RVC: Invalid Literal or File Not Found error
For the voice me folder, is it just audio recordings of my own voice? if so how many do i need to include and what length? Thanks In advance youre a massive help dude
Yup, as shown, make sure the folder contains all of the audio files without subfolders. Then just use that path for those and you should be fine
Followed the guide but ended up with completely different files and I have no clue how to install the damm program.
At 02:00 minutes I couldn't follow along anymore.
thanks for your video, there are a few other videos on the subject and I find that yours is better explained nevertheless I still have to deal with several errors. First I had "Cuda Out of memory", so I lowered the batch to the minimum, now I have another error which is: "RuntimeError GET was unable to find an engine to execute this computation". My audio samples are a bit long (a few minutes) and they are in 32Bits float at 44.1Khz but I only have 4 samples...
should I divide them into several parts? thanks in advance.
Editv1 : I tried many time and also to cut in differents parts, reduce the size and i still get the RuntimeError even with 2 small sample (16bit 44.1khz) than less than 10 secondes… i don’t understand
Editv2: Also i wonder if you know how to text-to-speech with this tool ?
You might have to reinstall or make sure the CUDA being stalled is compatible with your GPU
I'm having the same issue.
At 13:39, when I need to hit convert, it just says error. Could that be because I have a 4gb gpu? If so, is there anything I can do to make it work?
Ah, depends. If it says cuda out of memory then yeah, you'll need smaller batch sizes, shorter data, or a larger GPU. I actually don't remember if I responded to you somewhere else though lol
thx for this good tutorial
Unfortunatly I had a an error after 2 s and I don't understand why I did wrong.
if data.dtype in [np.float64, np.float32, np.float16]:
AttributeError: 'NoneType' object has no attribute 'dtype'
Another commenter had this issue but I haven't encountered it yet and haven't found a way to reproduce it. You might be able to find others who are looking to get this issue resolved here:
github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues?q=is%3Aissue+AttributeError%3A+%27NoneType%27+object+has+no+attribute+%27dtype%27+is%3Aopen
Could be related to the training process, trying to find files, etc
@@Jarods_Journey I applied your remark on your good short File Not Found: feature_768, and I succeed to avoid this eror anymore. Thx alot - I start to follow your channel last few days put your subject are very interesting - great job
3:52 My macbook M2 cannot open it with terminal app. Can you help me check this. Tks you so much.
Why the "Unfortunately, there is no compatible GPU available to support your training." appeared in "GPU Information"?
Thank you for taking the time to make this tutorial! It was so easy to follow. :) Could I ask you to make a comment or tutorial on how to re-train a previously trained voice? I can't find that information anywhere.
Let me know if this was what you were thinking about: ruclips.net/user/shortseO0gvi_RXTc?feature=share
@@Jarods_Journey That's exactly what I was looking for, tysm!
4:13 I can't seem to get into the localhost page, also, is localhost necessary to make the custom vocal models? I haven't really gone through the whole video more or less skimmed it just to see how to get the custom voice models. -_-
-_- To get the local host page, you'll need to instantiate it via the python script
It worked. Thanks so much!
when i click process data it doesnt fill anything and cmd comes back with "valueError: invalid literal for int() with base 10: 'Changer'" what do I do?
Remove spaces from your path or folder named, this causes this issue
i got something like this
result = torch._C._nn.leaky_relu(input, negative_slope)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 10.00 MiB (GPU 0; 6.00 GiB total capacity; 5.27 GiB already allocated; 0 bytes free; 5.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
what does it mean> how can i fix it?
man same, im facing this same problem and i have no idea how to fix it
same
it means that the if it finishes its going to take up too much space so just turn batch size down to fix
@@gabrielmorgan3369 I turn it down to literally 4 bro it still ain't working.
Same help please !
what does 3:01 saying "then what I did was just move all of these from the zip folder into the folders that I installed with vs code" mean? Where can i find the folder?
That was just an insert for people who want to install it manually. I don't really recommend you do it this way, just download the .7zip file that is shown in the vid in the releases area and that includes everything you need.
Here is another comment I left for another user who wanted to do this:
"So when you do git clone etc., the raw repo doesn't have any pytorch files inside of pretrained, pretrained_v2, uvr5_weights, or hubert_base. So what you have to do is move those models into those folders (doesn't really matter where you get them from, but I just took them from the zip folder because it was easier than downloading each individually.)
For hubert_base, it literally just sits in the parent directory or the cloned repo. The other ones go into their respective folders."
@@Jarods_Journey Thanks for your video and answering.
I failed to get the myself python file in weights folder after one-click training. Should I reset the process at 7:19 and if so do I need to delete certain files?
Correct me if I'm wrong but I think this is the error? I'm not familiar with coding.
RuntimeError: Calculated padded input size per channel: (2). Kernel size: (3). Kernel size can't be greater than actual input size
98_1.wav-contains nan
9_2.wav-contains nan
all-feature-done
I would rerun the preprocess again for all of your data and then try again, but check this out here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/484
I have a weaker GPU (GTX 1660 Ti) and its taking about half an hour for each epoch, i put the settings to match the reccomended starting settings (at 9:13 ), is this normal? Thanks
I can't find the .pth file...
13:40 Im getting an error when I try to convert my audio, does anyone have an idea why? I have the vocals in a wave file and I have the correct location, as well as my inferencing voice as well.
3:00 sorry don't know what you did there. if you extracted the archive, aren't the folders already where they are supposed to be?
If you want to install it with python, you run those lines and then you'll still need to download all of the pretrains from hugging face. Because I don't like downloading things 1-by-1 as I couldn't find a download entire folder option on hugging face (I'm a hugging face noob), I just downloaded the zip and moved them over lol.
@@Jarods_Journey yeah same here when getting the updated ControlNet models for Stable Diffusion, there's over a dozen, all of them GIGS of data.
I can't figure out how to batch download them.
But that one part of the tutorial is lost on me....
Tell me, do I just unzip the contents of the ZIP:
RVC-beta.7z
is that it? Or do I have to move it's folders around?
So when you do git clone etc., the raw repo doesn't have any pytorch files inside of pretrained, pretrained_v2, uvr5_weights, or hubert_base. So what you have to do is move those models into those folders (doesn't really matter where you get them from, but I just took them from the zip folder because it was easier than downloading each individually.)
For hubert_base, it literally just sits in the parent directory or the cloned repo. The other ones go into their respective folders.
@@Jarods_Journey thanks, I copied this to my notes
how can ı change the localhost site to english? (it opens in turkish for me)
Why the "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 4: invalid start byte" error appeared in cmd.exe after I clicked "One-click training"?
Can you do a video on how to update RVC to newer versions when applicable
Is 1000 epochs overkill? Will it have diminishing returns compared to just keeping it up to 300? I really don't see a standard recommended epoch total anywhere, the answer varies. I usually use 500, but I honestly don't know if that's fine since I just use RVC for SillyTavern and haven't tried it just on itself yet, hence I don't know how to evaluate if the results are better or not .___.
Hey I am getting "cuda out of memory. tried to allocate 20.00 mib (gpu 0; 4.00 gib total capacity; 2.88 gib already allocated; 0 bytes free; 2.90 gib reserved in total by pytorch) if reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. see documentation for memory management and pytorch_cuda_alloc_conf" as an error. Can you help me navigate through this.
same error, did you find any fixes ?
When I click in the bat file it says it was not possible to find the determined path so when I try to search the localhost on the web it gives an error...
7:10 do these audio files that you train with have to be 10 seconds or shorter like SVC and Tortoise need?
I haven't tested it, but I'm assuming that if not, you'll run out of vram as it'll have to load the entire audio file onto memory
When I conver its error
I don't see any model.pth in weights folder...
For some reason it doesn't always create a .pth file so just in case click on Train Feature Index for the model you're creating and it might solve your issue
Great one! Finally works!
"The expanded size of the tensor (12800) must match the existing size (0) at non-singleton dimension 1."
Anyone know a fix to this problem?
Yeah bro, I recently have this problem. This happens because you surely try to use other data to train your AI that is probably not the original one that the AI started training. To solve this problem you only need to create other AI renaming the experiment and then you charge the new data you want to use at step 2a and then process the data. Then go to step b and extract the feature normally. Then go in your disc where you save your RCV documents, search logs and look for the carpet of the old AI, and copy all the archives that say D and G, then paste them in the new carpet of your new AI. When you have all this done you could train as normally and everything should be right
Thanks. I just have to make voice samples. I guess I am supposed to sing something. Is that correct? The UVR software works great. I was able to stem Suno ai. m also looking at Jen music ai and Lalal ai which uses celebrities. This was more intense than what I expected. I see that Mac has a download app. I just found an App of Google App Store. I will look through your other videos for more lessons. Thanks.
At Model inference when I try to convert, I get AttributeError: 'NoneType' object has no attribute 'dtype'
How do I fix this?
Check out this issue here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/529
As well, you.migbt be able to find others that have had this issue on the main issues tab here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues?q=is%3Aissue+is%3Aopen+nonetype
@@Jarods_Journey thanks so much!!!!
@@Rainbowgunsh were you able to fix this issue?
@@Jurian0 no :(
@@Jurian0 I fixed the issue. You need to add ".wav" to the end of your file (if wav) EXAMPLE: \instrument_Balling song.mp3_10.wav
Jarods Journey Why cant we just upload for example an existing split song file from inside the folder that is just the singing voice with no music. Also Why copy and paste the whole address? Please answer, because I dont usually get a response when i ask a simple question.
What are your thoughts on Applios RVC Fork? Ever consider making a video with it?
Hello, I would really appreciate if you help me with this issue I'm having. Every time I try to convert the music file into the vocals and instrumental (the process you start here 5:55), I always get this error message at the end. Can you please help me resolve this issue?
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Hmm, your graphics card might not be compatible with the version of pytorch running, it might be too old. What GPU do you have?
@@Jarods_Journey I have an NVIDIA GeForce GTX 650
@@sigh7731 Ah gotcha, well the possible fix for this is a little too involved and I'm not even sure if it would work for RVC, so you may be out of luck. The only option you have is to run on CPU, upgrade your GPU, or run RVC via Google Collab.
Here is the article that references the out of date GPU issue: discuss.pytorch.org/t/solved-pytorch-no-longer-supports-this-gpu-because-it-is-too-old/15444
@@Jarods_Journey Ok, thank you!
Hello again! Do you know what the issue could be when the preprocess stops going in the middle? No error message or anything shows up, after successfully processing many of the vocal samples it just stops going.
Check the logs folder, should have 0 and 1 and if there are contents in there, it finished. Not sure if it stops in the middle, it would output an error.
@@Jarods_Journey Ty for taking the time to reply again! The logs folder did have 0 and 1 in it and the feature extraction worked, so I tried to train the model but it never progressed past step 1 (ie it never reached the epoch count). I've trained several models before with no problem, so I'm not sure what the issue is. I'll try to experiment a bit and see if I can figure out what the issue is, and if I figure it out I'll report back.
Hey man, thanks for the tutorial. I was wondering how to match the key of the instrumental to the output voice? I converted a male song to a female cover, but I don't know how to change the instrumental pitch to match with the female voice.
Thanks for the videos, they are fascinating.
Hello Jarods, after i put the directory for the voice, for voice training and clicking "process data" i got an error in cmd.exe : ValueError: invalid literal for int() with base 10:
i have a a 3070ti
same man
Delete any spaces in your path, this usually comes from this
@@Jarods_Journey what do you mean by path ? i'm really noob into this
thanks for the reply
@@viking6985 check this out: ruclips.net/user/shortsUbPMhzZuE9I?feature=share
thanks bro generous sharing! one quick question, when we restore the previous model, how can we continue the training? do we need to go through all step 1 to step 3 ? should we update the "Load pre-trained base model G path" ?
Check this short to see if it answers your question!
ruclips.net/user/shortseO0gvi_RXTc?feature=share
@@Jarods_Journey thank you so much!!
6:27 Did I miss something I use my own record voice (Harvard sentences) but I can not train it in Gradio {{Do I need to use my voice that I train in so-vits-svc-fork?}} if yes I need to download files from my folder "me" yeah? or just use my record voice
In the video, you have to place the path to where all of your audio files are, in this case, your recorded voices. In that folder you place to path to, all your files need to be in there.
hey jarods, any instructions on how i could get this installed on my mac?
You'll have to install it manually by cloning the github repository and manually downloading the weights. A way on how you might go about it here is here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/README.en.md but I recommend you don't go with poetry and just stick with conda or venv.
Dude, you are amazing! Thanks for your great work!
thank you for the video, it's really informative but i have an issue: when training the voice it doesn't generate ".pth" files in the weights folder, any way to fix that?
The checkpoints are under logs\[YourModelName] however if you copy them to assets\weights it won't load them properly, so ¯\_(ツ)_/¯.
For me the first epoch stays at 0% and does not continue. why can this be? It gives messages like "max value is tensor(1.0717)"
Thanks for sharing. But now I run into a problem. Could I just use pretrained models instead of training models myself? But on RVC WebUI, I couldn't figure out how.
Stuck at 8:10 it's blank when i click process data. All my data files are .wav and short files.
UPDATE: After converting the files (even though they were already wave) then putting in the pathway and process data it worked
Darn, my I was trying to train the voice but it seemed stuck on Reducer buckets have been rebuilt in this iteration
Hmm, I've never come across this, if you waited, did it ever finish through an epoch?
@@Jarods_Journey I left it an hour but nothing changed, not sure what settings I should have tried I have Ryzen 7 5800x 8 core processor, Rog strix x570-f gaming baseboard, and an rtx 3070 GPU?
@@Hestia3332 ah this should be able to run it. Try cutting your dataset so that it only totals 10 minutes or less. You might also wanna check the GitHub issues tab too to see if there are any present issues
@@Jarods_Journey I've only just downloaded this software so not too sure what you mean by cutting database ?
Hi, i'm not getting the .pth file in the "weight" folder, so therefore my "Inferencing voice" remains empty with nothing listing up. I don't know what could be wrong as I've made all the steps as you listed in the video...
This means the model never finished. Either something happened during training or it never began, check your console to see if there were any errors.
@@Jarods_Journey it gives me no errors , I ever give the program enought time and by the end of every process it says that all is done