Complete Guide: AI Voice Training with So-Vits-SVC - Part 1: Google Collab
HTML-код
- Опубликовано: 23 авг 2024
- Voice Recorder - github.com/Jar...
Audiosplitter - github.com/Jar...
so-vits-svc-fork - github.com/voi...
UVR - github.com/Anj...
Download Python - • How to Install Python,...
Come join The Learning Journey!
Discord - / discord
Github - github.com/Jar...
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoff... |
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/3NBAsIq
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and minimum specs recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
There's currently an issue with the software as of 07/18/23 rn:
github.com/voicepaw/so-vits-svc-fork/issues/837
Building wheel for pyworld (pyproject.toml) ... error
ERROR: Failed building wheel for pyworld
Building wheel for tqdm-joblib (setup.py) ... done
Created wheel for tqdm-joblib: filename=tqdm_joblib-0.0.3-py3-none-any.whl size=1631 sha256=c1e4112b1cc3303c89765fc01f30d8fa5c916c8ad143972ab87e9640c7a758ee
Stored in directory: /root/.cache/pip/wheels/68/bc/33/47a70346aa7f8953ca185e3485dc2b5b5dc5f233b4d6c6e8f1
Successfully built tqdm-joblib
Failed to build pyworld
ERROR: Could not build wheels for pyworld, which is required to install pyproject.toml-based projects
Is it fixed yet? Mine is not working on any version.
Your guide has been the best of what I've seen so far. What I'd like to see in a video:
1) a little more detail about how you manage stopping/continuing training, managing colab (given it seems there are limitations in how much time you're allotted, idle time):
2) what the files are (why is there a g, and a d? ), directory structure - relative to dialogue in the github script;
3) how to migrate trained files to another Jupyter service? what to save/download to backup trained data?
4) *** how to setup locally *** - in methodical detail; what files do you need from github, where to put them for a complete Python noob;
5) what are the parameters of looking for a graphics card; does it have to be NVIDIA; what are the minimums; does CPU/mobo memory matter; CUDA. NVIDIA branding matter;
5) prospects for so vits for Mac M1
You may or may not already know these things, but I'm guessing you may be motivated to research them for video purposes, so I'm maybe outsourcing researching to you. 🙂
The local training of it is out on part 2 if you are looking to run some local stuff, but all good points! I'll be creating (hopefully soon) a Q&A of the most frequently asked questions etc for sovits so appreciate this 🤟
@@Jarods_Journey If each of the Harvard Sentences are individually recorded to a separate .wav, will silences at the beginning and end of a recording significantly impact the training? Also, if the Harvard Sentences are recorded in blocks of ten and all of them are saved in a single folder, will the audiosegmenter process every recording in the folder?
Love the way you explain complicated things with simple language in depth, thank you very much!
I'm literally stuck at the Last step "user trained model" it says name error "vocals is not defined"
same, don't know what to do about it.
You really covered all the bases here, This was a great video! I subbed.
Appreciate it 🤟🙏!
Bro this is just lit man !
Love the content ❤!
This guide was truly such a lifesaver, you explained each and every step so clearly and I really appreciate that you gave me a better idea of how everything works, too. Thank you so much. ⸜(。˃ ᵕ ˂ )⸝
how do i fix /bin/bash: svc: command not found when i try and play the automatic proccessing
Thx for this Jarod really appreciate the detail. I've got two "issues" that occured but Im still running the training so we'll see how it turns out. 1. Step 6 preprocessing gave me this: "Preprocessing: 75% 73/97 [00:37
Interesting... The first one is just a warning, I'm not actually sure what that one indicates. If the code is training, then its working 😅!
The permissions part.... Is odd because it should be your Google account. You can try copying over the Collab notebook to your own Google drive, and then run it that way to see if that resolves it.
Lmk!
18:48 OK I see what log_interval does now, thanks.
Because I watched the 2nd video first, because I'm not interested in google collab
but in the 2nd video you also change the eval_interval
what does eval_interval do? Because you change it in the 2nd video
28:09 eval_interval divided by amount of steps gets your evaluation model? what? and where do I find the number of steps?
i cringed at myself after i tried replacing my voice to some popular song 😭😂
I feel ya 😅 it's actually a thing though that we tend to not like the sound of our own voice
Error at : Automatic preprocessing i get error /bin/bash: svc: command not found !
When I press train, it loads for like 6 seconds and then it stops and I don't think it's training at all...
Thanks! As a musician with exactly zero experience coding, this was extremely helpful.
I saved a copy of the collaboration sheet, and I also discovered that when I stopped training, and started the next day, it seemed to resume the "Epoch counting." Does that mean it's adding on and will result in better sounds?
It should, but you'll wanna pay attention to the loss values on the graph. You might wanna check my video where I go over getting the best possible trained voice models
@@Jarods_Journey Thanks. I got about 3000 epochs (?) in before I ran out of the free Google Collab limits, and it still turned out pretty good (at least for what I’m going), so thanks again
Great video man, but i didnt get how to make voice samples for someone elses voices for example a streamer. Can you answer please?
How do I work with more than one speaker or more than one model? I'm trying to keep the model I've already trained but start a new model or speaker and be able to select between the two and maybe add more later. Do I need to clear out any files from the first training before I begin the new one, etc.? I tried to add a second speaker but it seems that the "training" went way too fast to have actually worked and the vocals.out.wav is the voice from the first speaker.
Hey great stuffs, what would be in your opinion the best training models for cloning metal vocals? how can i contact you?
30:30 I got so close but this just wouldn't work for me. I just kept getting "NameError: name 'vocals' is not defined" no matter what I renamed the file to.
Hi. Thanks for the cool guide) You are a nice teacher!
Appreciate it 🙏
So when you are at ruclips.net/video/xgvT7UnUTng/видео.html (29:42) and you have your audio file folder showing, mine never produced an audio file folder. Do you have any way I can fix this?
"train": {
"log_interval": 100,
"eval_interval": 200,
"seed": 1234,
"epochs": 10000,
Are my settings, thank you.
Do you know of or could imagine a possible way where you could have AI create an original vocal performance given an instrumenta? It would just be more data to get trained on, that being, say, 50 instrumentals, 50 acapellas from said instrumentals before separation and finally the voice you want to be singing, like what everyone is doing now. Does this sound remotely possible?
30:45 I didn't understand this part in order to continue training the models, because the instructions talks about wav. files while all files I have from the saved checkpoints are python files in logs folder.
Also, I miss something, how can I use my trained voice model to cover any song I want? Pls man, help me with this.
Thanks for the tutorial its really helpful! Whenever I select audiosegment it keeps creating an empty folder. Am I doing something wrong?
at how many epochs will be the best outcome/quality for my AI voice? i have no idea about this and i dont wanna "over-train" it as u mentioned.
It's impossible to tell with each dataset, you just have to watch as the loss goes down on the tensorboard graph. The lower, the better for when to stop. It just requires a lot of testing around
Hey I'm a newbie to all this stuff. My question is do you need to have 10GBs of VRAM to start and get in to this whole thing? I have a GTX 1070 Ti (8GB VRAM). Can I do all things needed to develop a fully compatible singing model of me?
8gb will work, just make sure your audio files are cut and split. I would recommend you use RVC instead of sovits SVC though, a little easier to get going
Can someone help?, It worked perfectly before but now when I try to use the audiosplitter, I press the first or second option then i name the folder, and it just pops up for one second then command prompt closes?. Please help
can I continue training my model after I stoped training it ? I'm really bad at this stuff will you enlighten me the way to solve this problem of mine.
Yes, you have to set epoch higher and rerun all of the cells previously to continue training
Hey I must say it's a very good informative video! I still have a question tho, when starting the training I get a 403 error, but it still continues the training, the actual question is how many Epochs? Should I wait till it hits 9999?
There no clear answer for epochs, but try as many as time permits for your data and try out how they model sounds after that point in time
as soon as a press enter after naming the output folder on audiospiltter i get an error message what am i doing wrong?
Thank you so much for creating this content. I had a question -- do you know if the results are generally good if the target is not a song but speech? So I could change the voice of someone's speech to the voice that I've trained on?
maybe you have already tried it out by now but it should work just fine since singing isnt a lot different than talking, think of it as if you were singing with a monotonic voice.
Thank you for the good tutorial. It worked.
This was AMAZING. Thank you.
Appreciate it Dylan!
How long do I train it for? I don't want to overtrain it, but apparently 134 is under training it.
Can you tell me please why i have this mention when i run the training: No dashboards are active for the current data set ? Probable causes:
You haven’t written any data to your event files.
TensorBoard can’t find your event files
I've only seen this on initial start of the training. There's no data for the tensorboard to read until later. Once training gets up and going, if you refresh it, it should populate.
Other than that, I'm not too sure
I played around with tortoise-tts, I also did 10 seconds or shorter samples.
instead of having one big file that I split, I just made multiple 10-secon samples manually.
What do you think of tortoise? I'm looking at coqui tts to create vits models ATM as I've heard tortoise is slow
@@Jarods_Journey it _is_ freaking slow, but what it outputs is amazing. I'm going to use it for some characters in an upcoming video
huh for some reason the voice recorder is recording in slow motion on my mic, is there anyways to fix that?
Hello. Please tell me how to train the trained model further. I trained the model for 134 epochs and I got files D_134.pth, G_134.pth. Where should they be placed? I know that when retraining, I need to enter data again in the "Copy your dataset" cell, but if I enter D_134.pth, G_134.pth there, then I get an error when enabling the "!svc pre-config" cell: "ValueError: too few files in dataset/44k/Me". Can you please tell me where and what should be placed during re-training?
If you followed all of the steps, you shouldn't be moving the files around as that may cause issues. The cells should be ran in order once again in order to re-continue training. Other than that, unfortunately it's beyond me to try and debug collab any further unfortunately :/
Love ur bgm btw. Sharou bgms are bangers
Oh absolutely, I love the stuff he's produced and I got introduced to them by watching hololive streamers lol
the audio splitter application doesn't work for me. It launches and I typed in my option and select the audio file and then it would just closes immediately
It doesn't process anything other than wav files unfortunately as that would require having to install ffmpeg. I might make changes to the script to do this, but for the time being, you have to feed it wav files.
Thank you for the amazing guide! One question. Can I try the inference part without stopping to train the model? Or, if I stop it, can I continue from where I left, after testing?
I actually haven't tried this yet surprisingly 😅 (probably should) but I don't think there should be a problem. If you have it to where it saves checkpoints, it'll stop at the latest checkpoint as determined by save frequency
Hey, thank you so much for the tutorial, I wanted to know how do you add more epoch to an existing model using google colab (like if you wanted more than 134 epochs on your own model)
You should be able to run all of the cells and then just increase epochs in the config file
@@Jarods_Journey Oh ok , thank you so much for answering so fast !!
Please make a tutorial for doing all tasks on a local machine with a GPU. Please do from initial steps. I dont want to use google colab since its asking for money for computes
I have the video, just need to finish editting 🤟
Hi, thanks for the awesome tutorial, the only problem I have is when launching training, i get a Google page with error 403 instead of displaying TensorBoard. How may I fix this ?
Hmm, not sure. Could be a wifi issue that's causing it and if you're running a VPN or on a server, it could be getting that forbidden access. Though, you might just have to restart it.
@@Jarods_Journey I couldn't resolve the problem but it is still operating normally despite the fact that the page isn't loading, so It's not a big deal !
hey when i try to segment it, it finished 100% but then the files dont show up in the folder that was made
same
What about if i want to train another model? Do i need to delete all files and configure everything again? or just adding new folder in dataset file and keeping anything old?
For colab, delete all of the old stuff or move all of the old stuff to somewhere else.
i did everything you said to the letter, and when i got here :
@title Train
%load_ext tensorboard
%tensorboard --logdir drive/MyDrive/so-vits-svc-fork/logs/44k
!svc train --model-path drive/MyDrive/so-vits-svc-fork/logs/44k
it shows :
The tensorboard extension is already loaded. To reload it, use:
%reload_ext tensorboard
Reusing TensorBoard on port 6006 (pid 7336), started 0:06:09 ago. (Use '!kill 7336' to kill it.)
[10:32:08] INFO [10:32:08] Created a temporary directory
and stops and doesn't do any training at all, what did i do wrong ???
This most likely means that something went wrong during the pre-hubert step and didn't finish through. You can try adding -n 1 to the end of the line in the cell that has the pre-hubert stage.
ValueError: rate must be specified when data is a numpy array or list of audio samples
how to fix this. someone help
did you ever find a solution?
When t try to check GPU I get the following error ( I am running windows 10 64bit with an RTX 2080ti
/bin/bash: nvidia-smi: command not found
Are you running locally or with colab? This is the colab version of the video, so that means your colab runtime isn't connected to a GPU. You have to change that in the settings.
Hi Jarod, i think u forgot to mention how you create the folder "config" which not exist in my so-vits-svc, and also why your folder "Me" is not in your drive like previously mentionned but in the folder "dataset_raw" ?
Hey yusaf, the config folder is automatically created by the colab notebook, so you don't need to create it manually.
The Me folder is inside of "dataset" as Google Collab is expecting a certain file path. Once it's copied into the Collab environment, the names change, however, those ones you don't have to worry about how they're named
Hello, first thanks for the tutorial, no one explains like you. 2nd do you have a working link for so-vits-svc that works with Mac?
Appreciate it! As for Mac, I'm not too sure unfortuneatley...
Hello! I have a question, when I hit training button and the Tensorboard shows up, it says "No dashboards are active for the current data set". Even I refresh it, it still shows the same thing. And I had one time when it led me to a screen like yours, the graph was empty. I think the training is not working at all and do you know how to fix it? Also thank you for this tutorial!!!
+It says "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.14.0 which is incompatible."
on install dependencies
This may be an issue with Google colab, check the issues tab on the GitHub page to see if anyone else is getting these issues
Hi! I've been making a lot of progress using your video as a guide. However, the vocals are way out of tune when I'm done training. I think I need to adjust autopitch. Do you have any insights for how I can do this (or if there's a better way to fix out of tune vocals)? Thank you for any help!
When you run it, make sure the autotune part is off by following what I did at the end of the video. The GUI version has a way to adjust the value of the pitch on it, but I'm not sure how to do it on the Collab version.
THIS, THIS RIGHT HERE. This is what saved me like 100+ extra hours, I made like several voice models but none of them came out with proper pitch, this was a super simple fix. You're a big help
awesome brother thanks
The audiosegmenter is not working, only the audio splitter :(
Wonderful video!! I am trying to train my own model now!! meanwhile I am wondering why the audiosplitter doesnt work on my computer. The segmenter generated no audio files after execution. can anyone face the same issue?
The audiosplitter that I made only works with .wav files so you'd need to convert it to that format. That's my oversight, I'll try to get to fixing that soon so that it can take other file types as well.
Same issue, doesn't work even with wav files. Were you able to find a fix?
@@MajickRs Same issue i encountered. You can use Audacity for audio splitting there is a option "Analyse>Silence Finder>Set threshold to 5db" and then click OK. Finally in File Tab Export Mulitple Files
woah the amount of likes for the qulity of this video is criminal, new sub for the amazing effort you put into helping us🫡🫡🫡
how to retrain voice model i allready have ? ...how to make it better ? ...train new words or something please ? ^^
Hi, I want to ask about continuing the training for the AI. Cuz mine stop at around 4000 epochs then it just disconnected while I was slept. Do I have to repeat all those steps in order to continue or there's any other way? love ur video btw its easy to listen
Appreciate it! Yes, to begin training, you have to run all of the cells again in order 🤟, make sure the config is set up again as well!
@@Jarods_Journey so, I've repeated all your step but when I started training it, it just gave a checkmark although my current epoch is in 4000 meanwhile the goal is in 10000. Did I miss something or did I make a mistake
@@senomichaelsantosa4500 Unfortunately, this seems to be a common error that occurs and I don't know where it occurs. I believe it generally has to do with the pre-hubert stage, where you can add -n 1 to the end of the line and see if the pre-hubert works, the other is trying to rerun the copy configs cell multiple times.
I believe it could also be a load factor depending on how loaded googles servers are as it seems to work just fine at one time and not at another :/
@@Jarods_Journey okay i'll give it a try, thank for ur help
i have an issues about the audiosplitter.exe, when i entered the file folder name, the command is gone (force close) can u please tell me what to do?
Make sure all files are wav files, it cannot accept anything that is not
Hi! came from the "Realtime AI Voice Changer Using RVC (Retrieval-based Voice Conversion w./ w-okada)" Video in a journey of baiting my friends. I wanted to clarify coming from that video the marine.pth file used in 5:20 of the aforementioned video, is it the same file as any of the files shown in 29:13? If not how can I convert this trained file into a singular file like marine.pth?
If Im not mistaken the G file with the highest number is the more trained epoch .pth file, If this is the file I'm looking for the only question I might have left is... how can I make it convert audio smoothly with a non high-end gpu haha. Im currently a user of the laptop 1660 ti.
If you're using sovitssvc, I believe you have to select it instead of RVC when you go into the GUI and then use the G file. I completely recommend RVC over so vits though, and this tutorial you're following is so vits svc
@@Jarods_Journey oh I didn't read the title properly... LOL
@@Jarods_Journey tq! Ill give it another shot
does this software support nivida A4500? please let me know, thanks
Thanks! Super helpful. Best vid I've found. So, if I want to use a trained model over and over how do I keep my models separate and repoint to each model as I need it?
Appreciate it! I personally rename my models and config and change which folder they're in. Then you just have to select those in the GUI that opens up
How to use the programs you showed with someone else’s voice for example a k pop singer?
hii i i tried using your audio splitter but it is not working at all, file is creating but it is empty
Great Video but i have a problem
Automatic processes:
/bin/bash: svc: command not found
what did that means?
Since your running Collab, means there was an issue with the installation of the repo. Make sure all of the cells were ran without failure and it should work!
@@Jarods_Journey So I'm having problems with this too. On restarting the runtime environment I came across:
error: subprocess-exited-with-error
× Building wheel for pyworld (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for pyworld (pyproject.toml) ... error
ERROR: Failed building wheel for pyworld
Failed to build pyworld
ERROR: Could not build wheels for pyworld, which is required to install pyproject.toml-based projects
This. And
Automatic processes:
/bin/bash: svc: command not found
Still persist. I'm going to keep working and see if I find a solution.
So the errors above probably have nothing to do with the SVC command not found issue.
I've restarted it 3 times, doing it from scratch. Not sure what the issue was, wish I was techy enough to solve it.
I see I might have the wrong version of Python. I have version 8.14.0 which is apparently incompatible with 7.34.0. I'm going to uninstall and re-install python and see if that fixes it.
Hello! I had a problem with this application. How do I reuse it? It took me a while to get it to work, and the results were mixed (I think it had to do with vocal layering but will try 10 second increments), but when I reopened it later, I couldn't get it to work. I tried to reinstall it again (there's no way I have to do that every time?) but it just confused the system. I'd appreciate any help. I tried it on a Mac but may use a PC for my next go-around. Thanks!
Google Collab is a separate PC on googles servers, meaning it doesn't matter what OS you use. You will have to run everything each time. As for any colab issues, check the issues tab on GitHub to see if anyone else is having issues
How to continue training my model after google colab reached its time limit? Rerun all the cells you mention in the video next day?
Yep, correct!
It took 30 minutes for 5 epochs only.I tried paperspace not big difference
How many epochs until you felt it was studio quality? I’m training locally right now, over 200 high quality samples (about 45 minutes worth). I’m at 2800 epochs and it is getting better but still seems a long ways off.
Hmm, I haven't actually gotten my voice to the level of being studio quality yet, but I also need to do more testing. People say 25k steps is generally a good starting point, but I've found that it widely varies based on your training data. In your case, if you're doing something like 20 batch size, you'd already be there. You might want to vary your learning rate to see if that does anything, or run it for more epochs.
One other thing is it seems to do a much better job of infering the voice if the base vocals are closer to the pitch of the trained model. Not saying it can't do other pitches, as you can take a female voice and infer it with a male voice to get a good result, it just seems to be a bit more accurate.
@@RobertJene so I wasn't aware of the first two before, but doing a little investigation on it, you can definitely adjust the gradient accumulation by adjusting it in the configs file (default is 1) and adjust the training speed by lowering the learning rate or adjusting the other variables like batch size and eval interval.
The vectors per token I'm not too sure about. There are some mel spec variable that can be changed, but I'm not sure on their effects for final output.
@@Jarods_Journey cool
svc pre-config give me warning and 0% any help?
Hey, thanks a lot for the video! Also, is it still gonna work in case my voice samples are not in English language?
yes it will work
Yup, so-vits-svc doesn't care what language it is :)
Great! Do I need to install python in my PC? Thank you!
Not if you're doing the colab version, but if you're running on locally supported hardware, you do.
Bro, where did you find a copy of the harvard recommended practice for speech quality measurements doc? I spent over an hour today trying to track it down using google cache and every other trick I know, but I always hit a pay wall. Can you help me out?..
www.cs.columbia.edu/~hgs/audio/harvard.html
@@Jarods_Journey hell yeah bro - thank you!.. 🙇♂
Thanks for the video
You bet :D!
Is it possible to train a model and create a finished song on a Mac? it looks like all the vids I’ve seen (okay two lol) use a PC instead. Just trying to figure that out before I possibly waste my effort and time if it’s not possible.
Works universally for Collab users, as for local... It technically is possible if your specs match the equivalent windows PC. However, I know there are issues with AMD as most of these are reliant upon Nvidia GPUs, but I can't verify this 😅
@@Jarods_Journey Is there any audiosplitter for mac ?
Seem to be having issues with the gpu suddenly disconnecting whenever I try and use the model on audio. Everything else runs smoothly but it suddenly disconnects a minute or so in and the gpu refreshes
Not sure, might be a collab specific issue and I'm not sure how that works. Might have to run it on a different day
"ValueError: rate must be specified when data is a numpy array or list of audio samples." when infer :(
Grat video btw
when i try to train the model, it doesn't start a graph like yours. even with the .wav file in it doesn't start the little purple bars! everything worked like the video but maybe it's because i have too little samples?? im not sure how to make it train
Are there any outputs? I know many users have had issues with google collab and it's usually because the pre-hubert step never finished. You can try appending -n 1 to the end of the line at that cell to see if it might help.
bro please help me fixing an error.... its showning "NameError: name 'vocals' is not defined".. please help me i did every other step like u did.. guide me please
I believe you have to edit the name correctly according to your file name.
I received an error that reads as follows: " Building wheel for pyworld (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for pyworld (pyproject.toml) ... error
ERROR: Failed building wheel for pyworld
Failed to build pyworld
ERROR: Could not build wheels for pyworld, which is required to install pyproject.toml-based projects"
I fixed it by running the cell normally again, then editing the code in the cell to the below (basically just specifying a different version of Pip to use as it seems the new one is bugged):
#@title Install dependencies
#@markdown pip may fail to resolve dependencies and raise ERROR, but it can be ignored.
%pip install pip==21.1.3
%pip install -U ipython
#@markdown Branch (for development)
BRANCH = "none" #@param {"type": "string"}
if BRANCH == "none":
%pip install -U so-vits-svc-fork
else:
%pip install -U git+github.com/34j/so-vits-svc-fork.git@{BRANCH}
Hi! I have GTX 1660ti Mobile GPU and i5 processor. Is it enough for somewhat efficient training?
If it's 2gb, no, if 4gb maybe, just barely. Might wanna use the colab versions
compare with tortoise, which one has better quality in terms of voice cloning?
They're completely different architectures, Tortoise is Text-to-Speech and SVC is Speech-to-speech. Quality-wise, they're both pretty impressive.
I encountered an error in the trained model that stated: "ValueError: rate must be specified when data is a numpy array or list of audio samples." Can someone help me
Is this when your trying to generate a response with the trained model? What type of file are you using?
@Jarods_Journey yes this is when im trying to generate a response from the model. I am using .wav file, also it shows that line 5 is the cause of error
@@kiran_akamatsu hmm, well I'm not too versed in the technical coding side of so vits, but if you wanted to hop in my discord, it might be easier to send and share screenshots to maybe see if I can help out there
I would say try other wav files and see how those go first to see if it's only an issue with that one or all of the other ones
Since codelab is only able to run certain amount of time. Is it possible to resume last training ?
Yes, for SVC you just run everything again
How do I delete folders so I can do other voices? I cant delete folders
good job dude ;)
I keep getting "name error" at the used trained model section even though it is named correctly, not sure how to fix
this
Is it in the curly { } brackets still? Make sure those are deleted.
hii question, if i wanna continue training
but im at D_100
and G_100 for example
do i delete the D_0 G_0 stuff or no need to
and can i use the same method just like the first time or no? thankss
Leave all of the "D" and "G" files if you wanna continue training. For colab, you can just run all of the cells before train as you did to initially set it up and it should begin training from your most recent checkpoint (That's if you made it to the checkpoint, the log intervals)
@@Jarods_Journey Thanks !
hi jarod !
where can i download the voice of liam gallagher AI model?
I know there are discord groups and stuff out there, but I don't know of much unfortunately. You'll have to do a deep dive into google to see if people have pre-trained svc models and I know there are some youtube videos out there that go over this as well.
Thank you!
You bet!
10000 epochs (default in config.json) is too much. Right now the process runs about 2 min per epoch. Which means that estimated time of training is about ~330 hours...
Adjust based on sample size, so that sounds like more than an hour or more of audio, which you should be fine on about 10-20 minutes
pre-hubert never finishes even with -n 1 or -n 2. what do?
Might just have to try again unfortunately, colab is finicky here :(
I have a problem, when I start the training after 20 seconds it ends and nothing else happens.
Other users have had this issue, and it may be your pre-hubert stage or something else. Assuming it's pre-hubert stage, add "-n 1" to the end of the line making sure there's a space between the last character and try rerunning that stage. One user even reported rerunning the config cell helped them.
Other than that, I believe this is an issue with colab and the GPU availability. I haven't had the time to look into it more unfortunately.
Do we need to isolate vocal for singers before training model too?
Yep! The way the model trains is by comparing it's output to the audio you feed it, so if the audio has instrumental in it, well it's gonna try to match it all
When i get to ussing a trained ai i get the error
ValueError: rate must be specified when data is a numpy array or list of audio samples. I have no idea how to fix this. Can you help?
There might be a corrupt file or one that is too small, make sure no file is smaller than 200kb
@@Jarods_Journey I figured it out and got it working. Thank you
same error, how did you fix it?
Do you think you will get equal or better results if you the sample set includes singing instead of talking / read sentences?
I've had users report that singing samples produce a better output if you wanna mimic singing. I haven't tried it, but I don't think it's harmful!
@@Jarods_Journey Ok. Sounds good. Another question, if I get more samples that I want to feed it, do I need to start over or can I just add more samples to the same folder and continue the training that I already started?
Hey! I'm running it, but after about 6 hours it stoped the process. I have an error of "Cannot connect to GPU backend
You cannot currently connect to a GPU due to usage limits in Colab." I'm on a free collab account. Any tips on how to overcome this? I understand that there's a 12 hours limit - but the thing is to train the model looks like i'll need much more time. it took 5.5 hours to reach 1000 epoch. now it doesn't let me connect to a gpu . Thank you!
Due to google's policy, you'll have to wait till the limit is up! Unfortunately, that's the limitations they have on a free account 😅. You could technically move all your data over to another Google account and begin training there, but that's up to you (you'd just move the entire so-vits-svc-fork folder)
@@Jarods_Journey thanks! But looks like in order to train the model I'll need about 2 days of running it. If there's a limit of 12 hours how can I reach it? Also - how do I continue training from the same point?
@@Shacharzadik it'll just take longer for you to finish due to the limits. To continue training each time, just run all the previous cells before train, and make sure you your config file stays the same.
@@Jarods_Journey cool! i've finished the training, but when I try to use it, I get an error saying - "NameError: name 'vocals' is not defined", that's after I allready changed the file name. Do you maybe know why? thanks!
@@Shacharzadik If you're running on a different day, make sure to re-mount your drive so that it has access. If you have, double check that everything is named correct including .wav and everything!
this doesn't work at all on mac, and when i try it on windows, the audiosplitter just closes when its supposed to be removing silence and creates an empty folder.
The splitter doesn't work on files that aren't wav files unfortunately, I'll have to build it in as well and include ffmpeg to do this as it just uses a built in python library. The google colab shouldn't be limited between platforms though as it's all on google servers.
@@Jarods_Journey I tried the audio splitter with wave files and it didn't work for me either
i did everything you did and ended up with a G240.pth and a D240.pth, when i stick these 500mb files into my Reltime voicechanger they do nothing , all my other .pth files that i get elsewhere work but not yours , why?
Use RVC not so vits, I believe the are incompatibility issues.