RVC Tutorial - Speak in any voice! - Retrieval-based Voice Conversion - Easy AI Voice Tutorial

Ai Voice Tutor

Просмотров 87 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 15 июл 2024
#aivoices #aivoice #ai #aitutorial #rvc #rvcproject #rvcgui, RVC WebUI, RVC AI Tutorial, RVC GUI Tutorial, RVC Project Tutorial, AI Voice Tutorial, RVC V2, rvc voice changer
In this video you’ll be learning how to speak in any voice using nothing but your PC and a microphone. Everything will be running locally on your machine. First, we’ll prepare an audio file that will serve as an input to train an AI model. We then train the model using RVC-Project (Retrieval-based Voice Conversion) before using the model in a different and much simpler User Interface (RVC-GUI).
Once you have everything set up, you’ll be able to convert a recording of your voice to AI voice within seconds.
Notes:
- Even in early 2024, this is still the best method and tool to clone any voice with your own voice locally on your PC
- This works with any language and you just need to train the model with the same language that you want to clone (If you use a different language for training than for cloning, then you will get an accent that is close to real accents).
My other videos about AI voice-cloning:
- Real-time method for Discord (or Zoom, Skype, etc) to make your own voice sound like any voice: • Discord Voice Changer ...
- Use Text-To-Speech with any voice: • Free AI Text-To-Speech...
If you run into memory issues, try the following:
- Lower the batch size to "1".
- Cut the audio in clips shorter than 10 seconds.
- Reduce the size of the dataset.
For any other issues, make sure the folder path of your input voice and of RVC Beta does not contain any spaces or special characters!
/UPDATE 1/
Download Pretrained Voices
/ discord
huggingface.co/QuickWick/Musi...
rvc-models.com
docs.google.com/spreadsheets/...
/UPDATE 2/
Text To Speech Tutorial (The .pth models are not compatible to the app in this tutorial): • Free AI Text-To-Speech...
/UPDATE 3/
I have now trained the voice with a dataset of 30 minutes and used 600 epochs. The resulting voice sounds better but still is not perfect. Maybe I should go even higher on the epochs.
/ UDATE 4/
When you create the zip file with the .pth model, also include the .index file that starts with "added..." which you can find in the /logs/lecturer/ folder
/Update 5/
I have now trained a dataset of 40 minutes with 300 epochs and this seems to give me the best overall results so far
UPDATE 6/
I have now trained my original 10 minute sample with the RMVPE model (instead of Harvest) and this seems to have improved or reduced some of the robotic noises I was getting. RMVPE is available in this version of RVC: huggingface.co/lj1995/VoiceCo...
Using "Harvest" in RVC-GUI works great with an RMVPE-trained model.
What you’ll need
- NVIDIA GPU with CUDA support (at least 8GB VRAM needed for the training)
- About 30GB of free disk space
- About 10 minutes of the voice you want to train the AI model on
- A recording of your own voice that will then be converted into AI voice
Download Links
1. Prepare input voice with Audacity
www.audacityteam.org/download/
2. Train Model with RVC-Project (RVC-Beta)
www.huggingface.co/lj1995/Voi...
Note: "RVC-Beta.7z" always includes the latest version of the tool. If you want to use the exact same version as in the video, download this one: huggingface.co/lj1995/VoiceCo...
3. Use Model with RVC-GUI
github.com/Tiger14n/RVC-GUI/r...
Optional
If you want to dive deeper into RVC-Project, check out the documentation on Github:
github.com/RVC-Project/Retrie...
Thank you to everyone who has contributed to RVC-Project and RVC-GUI!
If you appreciate my videos, you can buy me a Coffee: www.buymeacoffee.com/aivoicet...
My PC Components:
(Disclosure: As an Amazon Associate, I earn from qualifying purchases. Clicking on and purchasing products through these links won't cost you any extra. They help support this channel and allow me to continue providing valuable content)
My GPU: ZOTAC Gaming GeForce RTX 4090 AMP Extreme amzn.to/3PZHNlm (Affiliate Link)
(Alternative: Zotac NVIDIA GeForce RTX 4090 Trinity amzn.to/3s2GERN (Affiliate Link))
My CPU: INTEL CORE I9-13900KF amzn.to/3MudSRp (Affiliate Link)
My SSD: WD_BLACK SN850X NVMe SSD 2TB amzn.to/46WFMgG (Affiliate Link)
My RAM: G.Skill 64GB 2x32GB DDR5 6400MHz amzn.to/3tAMn1M (Affiliate Link)
My Microphone: Razer Seiren V2 X USB Microphone amzn.to/46PkAtn (Affiliate Link)
Chapters:
00:00 Introduction
01:44 Step 1 - Prepare Input Voice
03:11 Step 2 - Train Voice Model in RVC V2
07:55 Step 3 - Use Voice Model in RVC GUI
11:00 Final Result

Комментарии • 439

@fadysameh7656 Год назад ⁺²⁸
So is that for english voice only you can i do it for any language like train it a german and speak german?
@AiVOICETUTOR Год назад ⁺¹⁴
Any language should work. So if you train a German voice and then use another native german voice as input to clone it, then it should sound perfect. If you use a different language input, compared to whatever was used to train the voice, then you should end up with an accent (similar to real accents).
@anyonecancode8329 9 месяцев назад
@@AiVOICETUTOR Hi man, will this guide work for any language? I want to clone my voice in Vietnamese. If it wont work, pls help me by link to any other guides!
@AiVOICETUTOR 9 месяцев назад
Sorry I missed your reply. Please post exactly what didn't work for you here so me and others can help you
@AiVOICETUTOR 9 месяцев назад ⁺¹
@anyonecancode8329 Train the voice with a Vietnamese one and it'll work
@bannedname5036 7 месяцев назад
@@AiVOICETUTOR I'm trying to do voices, but keep ending up with an Australian accent to anything I do. Is there a way to sound less Australian?
@grabani Год назад ⁺¹²
Very good detailed and clear tutorial. I understand you have just started your RUclips journey, but I encourage you to keep going I see you growing your channel because the content quality is excellent.
@AiVOICETUTOR Год назад ⁺²
Thank you so much for your kind words! Much appreciated!
@magenta6 Год назад ⁺¹
Good tutorial, clear, concise and well presented.
@AiVOICETUTOR Год назад ⁺¹
Thank you very much! Glad you like it
@gisfdlc9210 6 месяцев назад ⁺¹
You're fantastic; thank you so much 👍
@AiVOICETUTOR 6 месяцев назад
You're welcome and thanks for taking the time to comment 🙏
@bwheldale Год назад ⁺³
This "first video tutorial" got me subscribed. I'm waiting to try this but I'm waiting for another speech training program I'm running to complete. I'm interested in program/tutorial that utilizes Speaker Diarization to identify and separate speakers in an audio file and segments each speaker's audio into separate audio files. E.g., create a training dataset from video/movie of favourite characters and clone them (responsibly of course!).
@AiVOICETUTOR Год назад ⁺²
Thanks for your feedback! What you described would be really really useful and I'm sure it won't be too long before we'll be able to do this somehow
@imrankhan-ko4op 11 дней назад
the quick gui is very helpful, i spend 3 days in installl rvc with different command as i not programmer, diffult for me,! after everything works, main issue was to train model, when i start training my gpu temps wento 83+ within 5 minute, i live in hot area, I tried pretrain models by community to to my work but rvc never detects it. luckily this video helps me.
thank you man!,.
@NSG25 8 месяцев назад ⁺¹
awesome
@AiVOICETUTOR 8 месяцев назад
thanks
@nghiatong-pu1si Год назад
thanks!
@AiVOICETUTOR Год назад
Thanks for watching!
@georgelaskosofficial 2 месяца назад ⁺¹
Hey mate. First of all *GREAT JOB*. Everything works great. I have a question. What about to give an audio in English and convert it to other language? Any idea?
@PurpleWind64 7 месяцев назад
Thank you so much for this. I was stuck for hours on some other tutorial only because the boob who made it left out the instruction on the Train Feature Index button. I wish this was the tutorial I came across first.
@AiVOICETUTOR 7 месяцев назад
Awesome! I'm glad you found the tutorial and that it was helpful to you
@ProgrammerPenguin Год назад
oh snap i am definataly gonna animate using this!
@AiVOICETUTOR Год назад
Awesome!
@ProgrammerPenguin Год назад
@@AiVOICETUTOR i'll make phineas and ferb parodys
@AiVOICETUTOR Год назад
Would love to see them! If you want to, drop me a link once you’re done
@ProgrammerPenguin Год назад
@@AiVOICETUTOR* o h i w i l l *
@PotatoKaboom Год назад
it all worked flawlessly! thank you very much, i will look out for more content from this channel :)
one thing: can you give some references / papers to cite on the technology that is used by the repo?
@AiVOICETUTOR Год назад
Glad it worked for you without issues! I couldn’t find any papers (maybe because the authors of the tool are Chinese) but the keyword is “retrieval based voice conversion“. Maybe you‘ll be able to find something
@PotatoKaboom Год назад
@@AiVOICETUTOR Hey, the repo references HIFI-Gan which is a paper that is already a few years old. Just not used to have this "old" tech giving results like that all of a sudden because the community delivers in such a way. Audio is just a totally different world compared to text processing and LLMs where new papers are thrown out on a daily basis. Thank you for your help though!
@kollias_music 4 месяца назад
thanx! Any chance you do a tutorial for singing voice cloning?
@CoolDudeClem 10 месяцев назад ⁺¹
Am i doing something wrong? When i do "proccess data", i get this in the command prompt window:
runtime\python.exe trainset_preprocess_pipeline_print.py E:\AL Speech Stuff\RVC-beta\RVC-beta0717\sheapk 40000 11 E:\AL Speech Stuff\RVC-beta\RVC-beta0717/logs/just a test False
Traceback (most recent call last):
File "E:\AL Speech Stuff\RVC-beta\RVC-beta0717\trainset_preprocess_pipeline_print.py", line 8, in
sr = int(sys.argv[2])
ValueError: invalid literal for int() with base 10: 'Speech'
IS this right or is something wrong? I cant make head nor tail of this robot languadge but i dont think this is right.
@AiVOICETUTOR 10 месяцев назад ⁺¹
Remove the spaces in your folder path and it should work.
@polinazaitseva2016 2 месяца назад
still cant figure this out
@vithujan 11 месяцев назад
thx for tuto
@AiVOICETUTOR 11 месяцев назад
You're welcome and thanks for watching
@Estudiojoseph6102 3 месяца назад
Muito bom
@techgenius614 11 месяцев назад
thank you very much ım try to find this program
@AbelFelixStudio Год назад ⁺¹⁶
Thanks man. Now I can do animation without hiring someone else to record voices.
@AiVOICETUTOR Год назад ⁺¹
Glad it was useful to you!
@kingsofthering 4 месяца назад
I did not expect step 2 two take a literal 24 hours lol, but hopefully this all works in the end.
@AiVOICETUTOR 4 месяца назад
Sorry it took so long for you but I hope you got a good result in the end
@Mehdi0montahw Год назад
good
@AiVOICETUTOR Год назад
🙏
@petrcejpek3317 Месяц назад
Thanks for the tutorial. 🙂Unfortunately, I have a problem in Czech language. I have trained my model for 500 epochs on cca 26 minutes of training dataset. When I finally apply the model, the voice color is perfect, but the pronunciation is sometimes quite spoiled. The original and the target language are the same (Czech). Do you have any advice?
@brownjonny2230 9 месяцев назад
Thanks it's very easy follow. Btw what microphone are you using?
@AiVOICETUTOR 9 месяцев назад
Thanks. For this video I was only using my iPhone (can't really remember if I was using a headset microphone connected to the iPhone though). Since then I have upgraded to a Razer seiren v2 x
@brownjonny2230 9 месяцев назад
@@AiVOICETUTOR Cool thanks.
@sNaikoo 11 месяцев назад
Hmm i got error when trying input audio file? RuntimeError: Failed to load sound: [WinError 2] The specified file could not be found .. what i do wrong? Help me ASAP.
@AiVOICETUTOR 11 месяцев назад
Make sure you don’t have any spaces or special characters in the name of the folder path
@sinterm 10 месяцев назад ⁺²
Hi, first of all I would like to thank you for such a clear guide. Secondly I would like to ask one question. I made my voice AI model, but since I have a weak video card, I've only been able to make 25 epochs so far (each one takes me 20-25 minutes), so here's a question: if I use the same voice file, then specify my existing model in the "Load pre-trained base model (G and D) path" tab and make 20 epochs this time, will they add up? So in total my model will have 45?
@AiVOICETUTOR 10 месяцев назад ⁺²
Hi, thanks I'm glad you like the video. You can easily resume by going back to the tool and click "Train Model". But when you start training, you should set the total number of epochs you want to train and then you can interrupt it after the pth has been saved (after whatever you set in the "saving frequency"). More info about resuming here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/953
@sinterm 10 месяцев назад
Thank you !@@AiVOICETUTOR
@ER-ec4uq 6 месяцев назад
Thanks for the clear tutorial. Do you have any tips for getting a more realistic end result? I guess the quality depends a lot on the input source. But is it possible to do extra training, increasing epochs maybe? Or would using more than ten minutes help too?
@AiVOICETUTOR 6 месяцев назад
Thanks for your comment. You could try what my latest edit in the video description suggests and use RMVPE to train the model. Or you could try it with more or less epochs. I also found that it works better with some input voices than with others and some input voices need more epochs than others etc. It's really a science of itself
@ER-ec4uq 6 месяцев назад
Thanks, I'll experiment a bit. It's pretty cool stuff though, when it works it's very impressive. Although the end result really depends on the audio you use to convert because it retains all the mannerisms and even some of the accent@@AiVOICETUTOR
@AiVOICETUTOR 6 месяцев назад
Yes totally agree. So many factors that are influencing the end result. The good thing is that the tech will only get better from here :)
@fernanda3161 11 месяцев назад
Hi!
When deciding the number of epochs... What would be a rough scale to follow corresponding to the amount of voice lines we have?
@AiVOICETUTOR 11 месяцев назад ⁺¹
Hi, this is a tough one as it might depend a lot on the input voice. 40 minutes with 300 epochs gave me the best results overall so far but I'd love to hear from others.
@silverkey1733 Год назад
Please keep us updated if you get TTS working with this!
@AiVOICETUTOR Год назад ⁺¹
Absolutely will do! Its on top of the list of things I want to figure out
@AiVOICETUTOR Год назад ⁺²
Text-To-Speech Tutorial: ruclips.net/video/P1HIOvKg5Ko/видео.html
@orhangorek 9 месяцев назад
Can I change my current voice model online in the way you showed, with my computer having 4 GB of RAM?
@AiVOICETUTOR 9 месяцев назад ⁺¹
It's recommended to use a card with more than 4GB of ram but it might just work ( see: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/FAQ-(Frequently-Asked-Questions)#q8cuda-errorcuda-out-of-memory)
@cristobalmuller 3 месяца назад ⁺²
Is harvest the best quality? Or Crepe? Do you have a setting for the highest quality? Thanks!
@thebluecreeper2574 Месяц назад ⁺¹
I think harvest is the best quality.
@smaiderman2 7 месяцев назад
I can confirm that this works in Spanish.
Also, is there a way to improve the sound quality? Im using your suggested values, and I get a pretty good output, but you can tell is fake because its slightly "robotic". Would it get better results changind any parameter, even if it takes more time to proccess?
Thank you for the tutorial!
@AiVOICETUTOR 6 месяцев назад ⁺¹
Hi, you could try what my latest edit in the video description suggests and use RMVPE to train the model. Or you could try it with more or less epochs. I also found that it works better with some input voices than with others.
@SylvaTutoriels 6 месяцев назад
Merci !
y'a t'il pas une option d'utiliser du texte pour générer des audio avec ces modèles ia ?
@AiVOICETUTOR 5 месяцев назад
Sadly not. check this video for text to speech: ruclips.net/video/P1HIOvKg5Ko/видео.html
@crackinfo8888 Год назад ⁺⁶
I think I'm gonna make pororo ambatukam
@AiVOICETUTOR Год назад
Had to look it up. Go for it!
@Overneed-Belkan-Witch Год назад
Yes
@baraobuu 9 месяцев назад
Training with 01:35 is not good right? I cleaned the audio of the only video I have from my deceased grandmother, train with 300 epoch took the all day training, I'm using Nvidea Gforce GTX 1060 6GB, in GPU I used 10 value, the result was not good and I did find out the file size did not increased after, the training could not improve any further, I guess 01:35 is insuficient data.
@AiVOICETUTOR 9 месяцев назад
Yeah, I'm afraid the sample size is too small for this tool. You could try two tools (XTTS and Bark Voice Cloning) via Pinokio (ruclips.net/video/ln1qEglnpMo/видео.html) that require only a few seconds of input data. However the quality of the results is not comparable to RVC. Maybe the next version of RVC will work better with shorter input audio samples
@efecetinkaya4802 Год назад
When I just started training to voice model, during the 5th epoch, I received a warning that the disk space of the computer was full, and I immediately deleted my movies and games and freed up 40 GB of space, but until I deleted it (I do not know whether the epoch progressed or not), my disk space was full for a short time during the training (5- 10 minutes) will it harm the whole process? There is a longer time and it will go up to 300 and I don't want to start the training all over :(. By the way, I made space on my computer and the training is ongoing. One last question, what exactly does "Epoch" mean in the artificial intelligence voice model? it would be great for both me and your followers who may have the same problem if you reply, I am waiting for your answers in advance and thank you very much.
@AiVOICETUTOR Год назад ⁺¹
Hey, if it ran out of space, it should have stopped the training process. So I hope you ended up with a good voice model in the end. Good question about the epochs. I didn't have a clear answer in my head so here's what Bard says:
An epoch is one complete pass through the training dataset. This means that the model will see each audio file in the dataset once, and then it will start over at the beginning of the dataset and see all of the audio files again. The number of epochs that you specify when you train a model will determine how long the training process will take.
For example, if you have a training dataset of 100 audio files, and you specify 20 epochs, then the model will train for a total of 2000 passes through the dataset. This means that the model will see each audio file 20 times during the training process.
@jazzkaur3581 Год назад ⁺¹
hi please answer me , i am trying to train a model but i know that i cant do it in one sitting like the pc will restart. is there a way to continue training from where i left off , lets say i am at 120 epoch and pc shuts down , i dont want to continue again from 1. i know we can back up every few epoch but can we continue training from lets say 50th epoch or 100th if i am backing up every 50 epoch?
@AiVOICETUTOR Год назад ⁺¹
Hi. Yes this seems to be possible but it's not very straight forward. Check this thread on Git for more info: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/606. Hope it helps
@shirifshirif9506 Год назад
Everything works perfectly only one issue. The epoch are really slow despite the powerful PC. What could be the reasons?
@AiVOICETUTOR Год назад ⁺¹
Glad it works for you. If you’re not getting any errors, I think it’s normal that it’s slow. Even on powerful PCs
@khajask8113 9 месяцев назад
🌹👍
@AiVOICETUTOR 9 месяцев назад
😊🙏
@user-if7vj2mj8n 5 месяцев назад ⁺¹
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
How do I fix this error? Do you have any recommended videos?
@AiVOICETUTOR 5 месяцев назад
You can try the following things. 1.Lower the batch size to "1". 2.Cut the audio in clips shorter than 10 seconds. 3. Reduce the size of the dataset.
@NadirWadie Год назад
hello, thank you for this tutorial, if you can help me i got an error
RuntimeError: Failed to load audio: ffmpeg error (see stderr output for detail)
@AiVOICETUTOR Год назад
Hi, it sounds like you might have an issue with the ffmpeg installation. Could you check if it is installed properly by typing “ffmpeg“ in a command window? If not, try reinstalling ffmpeg and see if that helps
@ApriadiYadi 7 месяцев назад
Sir, after I trained the models voice, I can't find the index file in the logs or weights. What should I do?
@AiVOICETUTOR 7 месяцев назад
Make sure you don’t have any special characters or spaces in your folder paths or that its not a cloud or network drive
@magellanthecat 21 день назад
So, my samples aren't showing up in the "weights" folder, nor are there any new .pth files anywhere on my system. So... now what? Is there an update to this tutorial, because I am not familiar with this enough to troubleshoot on my own--that's why I'm using a tutorial in the first place.
@HarrisonBorbarrison 7 месяцев назад
I guess I can't do it on Mac since MacOS can't run .bat files.
@AiVOICETUTOR 7 месяцев назад
You can install Pinokio (ruclips.net/video/ln1qEglnpMo/видео.html) on Mac and install RVC easily. Hope it works for you
@gabriel-wc6lg Год назад
😀
@pixelismo 6 месяцев назад
Hello friend.. sorry for my english... you download the RVC-beta.7z file . Is it for macos too? is it another? or i can`t use it on mac?
Thank you!
@AiVOICETUTOR 6 месяцев назад
Hey, yeah this works on Mac too. I haven't tried it myself though but you can find some info on how to run it on Mac here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md
@waqarahmad-gm2rc 11 месяцев назад
JUST WHAT I WAS LOOKING FOR! i like you more than my crush
@AiVOICETUTOR 11 месяцев назад ⁺¹
Hahaha thanks but I hope your crush won't read this
@waqarahmad-gm2rc 11 месяцев назад
@@AiVOICETUTOR haha she wouldnt be here even if i paid her, btw do you know of any mac m1 friendly rvc models?
@AiVOICETUTOR 11 месяцев назад ⁺¹
Hmm not sure what you mean by Mac friendly. AFAIK any pth model should sound the same no matter which platform you run RVC GUI on
@waqarahmad-gm2rc 11 месяцев назад
@@AiVOICETUTOR i was able to download pytorch and set it up to run mps accelerate, since mac dosent support nvidia gpu, i downloaded this rvc but it seems to only run on windows
@AiVOICETUTOR 11 месяцев назад ⁺¹
Oh gotcha! Yeah the Git for RVC Beta and RVC GUI mentions Mac support but I can't find any detailed info on it
@ravkhangurra7522 10 месяцев назад
Great video, I have noticed when my gradio page comes up, it doesn't have the same options as yours. It might need to be updated, can you please advise how I can do this.
Also will you be creating a colab version of this too
@AiVOICETUTOR 10 месяцев назад
Thanks and sorry for the delayed reply. If you want to use the exact same version as I did, download this zip file: huggingface.co/lj1995/VoiceConversionWebUI/blob/main/RVC-beta-v2-0528.7z. I still haven't been able to look into colab yet but it's on my ToDo list.
@Markiz93 9 месяцев назад
@@AiVOICETUTOR I downloaded this version and I have in GPU information No supported GPU is found. Training may be slow or unavailable.
@AiVOICETUTOR 9 месяцев назад
Which GPU do you have?
@Markiz93 9 месяцев назад
I think it's most likely a problem with Pytorch, but I don't know, how to update it...
@AiVOICETUTOR 9 месяцев назад
Could be far fetched but maybe have a look here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/350
@toopanation 3 месяца назад
what should be the batch file for gpu ??
i have a 6g rtx 3060
@eskhatos 2 месяца назад
I have followed every step in this to a tee, but when I try to save the .pth file, nothing shows up in weights, but every other file is saved. I cannot find the pth through a search either. I'm not sure why this is.
@DrawTakenShorts 7 месяцев назад
i have a question
is this like... real time voice cloning or like you record audio and then you change it after and it comes out as a file?
@AiVOICETUTOR 7 месяцев назад
Yes this tutorial is for cloning prerecorded voices. If you want to clone your voice in real time, you can watch that video: ruclips.net/video/vFKm-G-dxHo/видео.html
@DrawTakenShorts 7 месяцев назад
@@AiVOICETUTOR thank you man i appreciate it
@bigguy7558 Год назад ⁺¹
Hi, for the Epoch step, is it necessary for it to have 300? It takes a very long time for it to process all of them especially if you don't have 12GB of VRAM for your GPU
@AiVOICETUTOR Год назад
Hi, it’s not necessary to use 300 epochs but from what I know, with the 10 minute samples, 300 works the best. You could try it with a lower value and see how well the voice performs. For testing purposes maybe start with 20 or 50 epochs
@bigguy7558 Год назад
@@AiVOICETUTOR Hi, after testing I found that 300 epochs sounds best and less robotic than lower values. What would happen if the epoch is raised more than 300? Would it sound more accurate or is it a waste of time?
@AiVOICETUTOR Год назад
There is a risk of “overtraining“ a model but I’d say you can definitely try going higher than 300. Some people went as high as 1000 but it also depends on how long your input data is
@bigguy7558 Год назад
@@AiVOICETUTOR How much input data would you need if it required 400 epoch? Also, is it recommended to put in audio clips of different tones of the voice (yelling, whisper, etc.)? Thanks
@AiVOICETUTOR Год назад
I wouldn’t go much higher than maybe 15-20 minutes for 400 epochs. And yes, I think ideally the samples should contain different emotions (whisper only if it’s clearly audible). It’s still on my ToDo list though so I can’t tell how well it’ll work
@jaanbrosmusicofficial Год назад
How much time this whole process need to train a model.....?
Because in my case this software is hang and stuck on step 2B what to do....?
Plzz help and guide me
@AiVOICETUTOR Год назад
Step 2b, the feature extraction, should be very quick unless maybe you have many audio files and are using a regular hard disk (and not an SSD)
@billerpjc 8 месяцев назад
hi . i just seen your channel and the video was easy to understand .
will it work on my victus hp . nividia geforce rtx 3050 4gb graphics card, core i5 . 11th gen .
please le me know
@AiVOICETUTOR 8 месяцев назад
Sorry for the late reply but I just noticed that your comment had been held back from RUclips and I had to approve it. Yes the tool will definitely work on your machine! Training will be slower as in the tutorial but it will work
@rydarit1948 7 месяцев назад
it dosent work
@@AiVOICETUTOR
@Bigjuergo 7 месяцев назад
can you pls make a tutorial how to clone a singing voice?
@AiVOICETUTOR 6 месяцев назад
Yep its on my todo list
@Asdedix 11 месяцев назад ⁺²
I have this error
ValueError: invalid literal for int() with base 10: 'Voice\\xxx
@AiVOICETUTOR 11 месяцев назад
Make sure you have no spaces or special characters in the folder path
@thedagothexperience Год назад
Would it be possible to use this process to change your voice live through a microphone?
@AiVOICETUTOR Год назад
Yes and I am working on a tutorial for that at the moment
@ari-jp8bb 4 месяца назад
I don't have "one click training button" any idea why?
@yem4679 6 месяцев назад
where does that "Lecturer" folder came from?
@AiVOICETUTOR 6 месяцев назад
I downloaded a video lecture off the internet and put it in a folder called "lecturer". Hope that helps
@dedsechacker7929 4 месяца назад
is there any ai tool where I can enter text it give me voice but before that I can train it to particular voice model and then start making aurdios through texts not actucally to record my voice to clone it
@AiVOICETUTOR 3 месяца назад
Yes, check out this tool for text-to-speech: ruclips.net/video/P1HIOvKg5Ko/видео.html
@szymonbalinski9766 Год назад ⁺²
Hi!
Is it possible to use voices trained with this method in real-time? For example in Discord app?
@AiVOICETUTOR Год назад ⁺¹
Hey! Technically, it should already be possible since the voice conversions are running faster than real time on some hardware. I haven’t seen any implementations so far but it must be only a matter of time.
@didierdunca Год назад
@@AiVOICETUTOR make a tuto?
@drygdryg2 11 месяцев назад
It's possible. Tha author made a video about this: ruclips.net/video/vFKm-G-dxHo/видео.html
@hrishikeshandurlekar2178 10 месяцев назад
What f0 method should i use in the RVC Gui if i use the RMVPE model to train?
@AiVOICETUTOR 10 месяцев назад
That's a great question. I haven't tried it in RVC GUI yet since when I last checked, the latest version didn't have RMVPE. What I did so far is use the "Model Inference" tab in the RVC-WebUI to clone the voice, since that lets you select RMVPE. Over the weekend I'm gonna have a look at RVC GUI and see what f0 sounds best with RMVPE trained models. Guess "Harvest" should work well with it.
@AiVOICETUTOR 10 месяцев назад
I have tried the RMVPE trained voice in RVC-GUI and it works great when using "Harvest" as f0 method.
@hrishikeshandurlekar2178 10 месяцев назад
Thank you. The model interface tab also works very well. Initially I assumed that it may be very complicated for a layman and RVCGUI offered a easier workflow. But now I feel the interface tab is pretty good too.
@AiVOICETUTOR 9 месяцев назад
Same here. I think we got used to the UI by using the training tab a couple of times :)
@patrickdilla Год назад
Does it matter if the training source file is mp3 and not wav or it's important to be in wav format?
@AiVOICETUTOR Год назад ⁺¹
MP3 should work too. At the time when I made the tutorial, I read that some were having an issue with MP3 and since I had to convert it from a video anyway, I used .wav
@patrickdilla Год назад
@@AiVOICETUTOR I'm currently processing data for training, does it really take this long? It's counting 13000/1.0
@AiVOICETUTOR Год назад
Sorry for the late reply. Hope you got it to work. Training took about an hour for me
@DungTranOfficial 11 месяцев назад ⁺¹
I encountered the “CUDA out of memory” error while performing step 3. Does it mean my GPU is to weak? I’m using RTX 2060 graphics card. Please guide me on how to solve this issue, thank you so much.
@AiVOICETUTOR 11 месяцев назад
Sorry it didn't work for you. This seems to be one of the most common issues/bugs. You can try the following things. 1.Lower the batch size to "1". 2.Cut the audio in clips shorter than 10 seconds. 3. Reduce the size of the dataset.
@DungTranOfficial 11 месяцев назад
@@AiVOICETUTOR I reduced the batch size to 1, restarted the computer and it's now functioning. Thank you for your help!
@AiVOICETUTOR 11 месяцев назад
Awesome! Glad it worked
@seslikitapevreni Год назад
I'm doing all the steps. There is no problem, but the latest pth file does not appear. There are only index files.
@AiVOICETUTOR Год назад
Does the output of the command video look the same as in the video? Like does it say "Saving model and optimizer state at epoch 300 to ..." ?
@AiVOICETUTOR 10 месяцев назад
@@r59456 Sorry I missed your reply. Yes you need a graphics card to run this tool
@user-ij1pw6eg9n 8 месяцев назад ⁺¹
can i using for singing
@AiVOICETUTOR 8 месяцев назад ⁺¹
Yes if you train it with a voice extracted from songs (use ultimatevocalremover.com to extract the actual voice) and the clone a voice that's singing, it should work fine.
@yashin1122 9 месяцев назад
Is this process compatible for nvidia GTX 1650 4GB?
And can you please tell me the Best Setting for 4GB VRAM
@AiVOICETUTOR 9 месяцев назад
4GB is critical for training a voice (8GB recommended) but if you're lucky it might work. You can try the following things if you're having memory issues. 1.Lower the batch size to "1". 2.Cut the audio in clips shorter than 10 seconds. 3. Reduce the size of the input audio.
@TheRealCoooookie 4 месяца назад
It said move model to cuda what dose that mean
@CyberAtlasX1 11 месяцев назад
One click training isn't loading the epochs, can you help me?
It ends at:
all-feature-65
all-feature-done
@AiVOICETUTOR 11 месяцев назад
Hmm that's a strange issue. Make sure that none of your folders have any spaces or special characters in them and maybe start over from step 1.
@sopix7761 Год назад
any particular reason why you set epochs to 300 and not 1000? i want to check out 1000 but i'm kinda scared i'll break my pc or smth
@AiVOICETUTOR Год назад
Yeah because it seems there’s the risk of overtraining a v2 model. 300 epochs worked well for me so I didn’t feel the need to go higher. If you got the patience, give 1000 a try and compare. Your pc won’t have to work particularly harder, but just a bit longer compared to the 300
@sopix7761 Год назад
@@AiVOICETUTOR ok, makes sense, thank you
@switchpp1266 Год назад
Hi i wonder should i use GPU or CPU processing mode
@AiVOICETUTOR Год назад ⁺¹
Hi you should use GPU. CPU should also work but will be slower
@yanik5480 28 дней назад
@@AiVOICETUTOR How slow is the CPU method? GPU is greyed out so I can't really use it and it would be nice to know how long this could take.
@khajask8113 9 месяцев назад
Hey..rvc-pkg can install locally forever free to use..?
@AiVOICETUTOR 9 месяцев назад
Hey. Yes you can use this locally free for forever
@nadj630 10 месяцев назад ⁺¹
I'm doing all the steps. There is no problem, but the latest pth file does not appear. There are only index files.
['extract_feature
e', 'v2']
D:\RVC-beta\RVC-b
load model(s) fro
move model to cpu
all-feature-17
all-feature-done
@AiVOICETUTOR 10 месяцев назад
Make sure that your GPU is detected by the tool. If it is detected then you could try lowering the batch size. More here: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/816
@remcee2 2 месяца назад
@@AiVOICETUTOR I'm having the same problem, on the issue github page you sent wasn't a solution. Do you know more about this problem?
@shekmohamed3217 24 дня назад
pth file is not created.... what is the issue..??
@kevinhuerterscousin 7 месяцев назад
when i open it and download python it gives me
Running with the system Python.
Traceback (most recent call last):
File "C:\Users\Laura\Downloads\RVC-GUI-main\RVC-GUI-main
vcgui.py", line 4, in
import soundfile as sf
ModuleNotFoundError: No module named 'soundfile'
Press any key to continue . . .
I cant figure out how to download soundfile to make it work, any tips?
@AiVOICETUTOR 7 месяцев назад
Check out this thread: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/523 Hope this solves it for you
@abdul-rahimmustafanafa3306 9 месяцев назад ⁺¹
Please how do I livestream this in a Zoom video call
@AiVOICETUTOR 9 месяцев назад
Works the same way for Zoom as for discord: ruclips.net/video/vFKm-G-dxHo/видео.html
@beboelfr3on Год назад
‏‪5:25‬‏ what i do to this massge
value error invaild literal for int with base 10
@klipfisch Год назад ⁺¹
I think you had space in Folderpath
@macdoctorsg 11 месяцев назад
anything for Mac users?
@AiVOICETUTOR 11 месяцев назад
I think it’s possible but I haven’t done it myself and can’t find a lot of good info about it. Maybe keep an eye on this: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/575
@Olmuooo Год назад
After training the sound, there is no .pth file in the weights folder, what could be the reason?
@AiVOICETUTOR Год назад
Are you getting the message in the command window saying "Saving model and optimizer state at epoch 300 to ..." ?
@sexsyalen Год назад
same problem, killed a lot of very time on this, did you solve it?
@angelmolinalopez9997 7 месяцев назад
I don't have Nvidia but AMD. When I get to the training it tells me that it has done it successfully but it has not done any epoch. Is it because I don't have the right graphics card?
@AiVOICETUTOR 7 месяцев назад
Yeah AFAIK RVC doesn't work with AMD GPUs on windows yet
@eventfakt 11 месяцев назад
Hello brother I have a problem in model inference.... none type object has no attribute dtype
@AiVOICETUTOR 11 месяцев назад
Hey check this out: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/1020. Hope that fixes it!
@Raytoplayvideo 11 месяцев назад
Is it normal that one click training takes a lot longer for me? I watched on the video it takes about 40min to process, whereas for me it's been 4 hours and I'm still at epoch 176 (so just over half). I have a 3080 Nvidia yet and I put 8GB
@AiVOICETUTOR 11 месяцев назад
Was your input data longer than 10 minutes maybe? I heard that NVIDIA did some driver updates at some point that slowed down things on certain hardware but not sure it’s related to this
@Raytoplayvideo 11 месяцев назад
@@AiVOICETUTOR I don't know, I noticed after launching that Nvidia was offering me an update. I will stay once what is done. but the treatment of the epoch are done on the disk or on the web? because I know that at the moment my disks can bug a little, I don't really know why. Otherwise I don't understand why it takes so long. I took 30min of audio in WAV to process and it bugged after 10h and 275 epoch I don't know why.
@AiVOICETUTOR 11 месяцев назад
Yeah it is running locally on your PC, so if you have issues with your disk then that might explain it. If you can, try running it off another disk (ideally a SSD)
@Raytoplayvideo 11 месяцев назад
@@AiVOICETUTOR i try it on my Nvme dude 😂 but i dont knwo why, but sometimes, my disks take a little time to launch. I sometimes have my files that load and take a long time to open. I dont know why 🫤
@AiVOICETUTOR 11 месяцев назад
Yeah your NVME shouldn't be the bottleneck then. Sometimes Windows does some strange things when it comes to managing the load of the disks. Not sure if that is to blame for sour issue though
@didierdunca Год назад ⁺¹
5:20 cmd windows i have this in the last phrase: ValueError: invalid literal for int() with base 10: 'Music\\RVC-beta-v2-0528\\Lecturer'
my audio is 9 minutes
@didierdunca Год назад ⁺¹
very stupid program but my directory was too long... I had to edit it from this Music\\RVC-beta-v2-0528\\Lecturer
to this M\\RVC-beta-v2-0528\\Lecturer (make it short)
@AiVOICETUTOR 11 месяцев назад
Interesting. Glad you figured it out and thanks for sharing the solution!
@sexsyalen Год назад
I did everything like in the video, but still nothing happens, writes index for writing: No such file or directory
@AiVOICETUTOR Год назад
Do you have any symbols or spaces in the folder path? If so, try removing them.
@nationsrelations8267 Год назад
does this work on mac?
@AiVOICETUTOR Год назад
Not that I know of but it must be only a matter of time
@khajask8113 9 месяцев назад
To run rvc..what pc specs.? 8gb ram, 2gb gpu ok..?
@AiVOICETUTOR 9 месяцев назад
A 2GB GPU is not enough for training a voice (4GB is critical, 8GB recommended) but you can clone the voice using pretrained models in RVC GUI with CPU alone but it will be much slower
@sharoleslam 10 месяцев назад
Hi, I'm training a model with 500 epochs for a 17 minutes audio file but I think it will take 3 full days to complete that. is it normal??
I have a strong Pc with rtx 3080 12g vram and I run it from my SSD
I set my batch size on 35
(every epoch takes about 8 minutes!!)
@DrunkenKnight71 10 месяцев назад ⁺¹
i saw on another video that anything over 300 epochs is a waste of time but how true this is in practice i don't know. anyway, do whatever you feel you need to do.
@DrunkenKnight71 10 месяцев назад ⁺¹
also they said set batch size less than graphics card memory
@sharoleslam 10 месяцев назад ⁺¹
I have 12 dedicated GPU memory and 28G GPU memory so which number should i use?
@@DrunkenKnight71
@AiVOICETUTOR 10 месяцев назад ⁺¹
Hi, it should not take that long for you with that GPU. As @DrunkenKnight71 suggested, you should lower the batch size to a value less than the RAM of your rtx 3080.
@AiVOICETUTOR 10 месяцев назад
And definitely experiment with the number of epochs. It seems to depend on so many factors
@Redcolor567 7 месяцев назад
Rvc gui is telling me "please select a model and imput audio file" when i click convert despite the fact that i definitely have both selected i even tried changing the models and testing that the audio file was not corrupted
@AiVOICETUTOR 6 месяцев назад
Sorry it's not working for you. I couldn't find any info on it but hope it will be fixed in a future version of RVC
@beardedbhais4637 Год назад
You said this isn't the most efficient method, can you guide me to the most efficient and highest quality method? I would like to generate a realistic model with fast inference.
@didierdunca Год назад
me too
@AiVOICETUTOR 11 месяцев назад
What I meant by “not the most efficient method“ was that I’m using a separate tool (RVC GUI) instead of doing the voice changing in RVC Beta. Therefore using twice the space. Although it’s not perfect, to my knowledge, training a model in RVC is the highest quality free method available currently.
@tommytomickey Год назад
Can I use Mac to follow these steps?
@AiVOICETUTOR Год назад ⁺¹
I think it’s possible but I haven’t done it myself and can’t find a lot of good info about it. Maybe keep an eye on this: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/575
@ahmedeidgaber Год назад ⁺¹
I have a problem with a command, I click on process data, it opens the command, and it says this error
ValueError: invalid literal for int() with base 10: 'folder\\RVC-beta\\sharmota'
@AiVOICETUTOR Год назад ⁺¹
Is there a space in your folder path? If so you need to remove it
@dannybee05 11 месяцев назад
fukn genious!!! @@AiVOICETUTOR
@mooster2095 11 месяцев назад
4:11 It doesn't open :( the .bat file doesn't work for me. How can I use it with CPU? It says "No supported Nvidia GPU, use CPU instead"
@AiVOICETUTOR 11 месяцев назад
As far as I can tell you need a GPU to run RVC. If you have a GPU and run into the issue, check this out: github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/issues/806
@didierdunca Год назад
process data takes a long time is this normal ? i have 50 minutes audio
@AiVOICETUTOR 11 месяцев назад
Yeah depending on your hardware it can take a while but shouldn’t be too long
@user-it7ob6vy2s 8 месяцев назад
is there any way to do this with amd graphic card?
@AiVOICETUTOR 8 месяцев назад
From what I can tell, you need to be on Linux to use AMD cards and not all cards are supported yet
@Salvat-pz9lz 6 месяцев назад ⁺¹
No suported Nvidia Cards. using CPU for inference
cpu . A solution Please
@AiVOICETUTOR 5 месяцев назад ⁺¹
You need to use a NVIDIA GPU
@thebluecreeper2574 Месяц назад ⁺¹
Why does it say "Please select a model and input audio file"???????????
@CreeperGuy1189 4 месяца назад
I was wondering why your training was going so fast then I realized you had a 4090
@AiVOICETUTOR 4 месяца назад
Yeah hopefully the next gen GPUs will offer much better performance for Ai at lower prices
@envision6331 11 месяцев назад
my freind. what is the retrieval rate? What does it do?
@AiVOICETUTOR 11 месяцев назад
Hey I couldn't find a lot of information on it but here's what Bard has to say:
The retrieval rate in RVC beta is not publicly available information. However, the developers of RVC have stated that they are working on improving the retrieval rate, and they believe that it will be significantly improved in the future.
In the meantime, you can use the following tips to improve the retrieval rate in RVC beta:
Use a high-quality audio sample as the query.
Speak clearly and slowly into the microphone.
Avoid background noise.
If you are having trouble getting RVC to recognize your voice, you can try adjusting the settings in the RVC webui.
Hope this helps
@langstonreese7077 8 месяцев назад
Is this possible on mac?
@AiVOICETUTOR 8 месяцев назад
Yes, check out Pinokio (ruclips.net/video/ln1qEglnpMo/видео.html) which lets you one-click-install RVC on a Mac. Hope it works for you!
@fux666 11 месяцев назад
Why I can't change the CPU to GPU in the RVC GUI ???
@AiVOICETUTOR 11 месяцев назад
Do you have a NVIDIA GPU that supports CUDA?
@magellanthecat 22 дня назад
Ug. One note says you need 30 minutes. Then it's only 10 minutes. Could we please make the information clearer up front?
WHat is the actual minimum amount of time needed? 2 minutes? 30 seconds?
@ProdByOWI 9 месяцев назад
You know why takes too long epoch part ? I have an RTX 4070 and i thin isi a good gpu. Any solution ?
@AiVOICETUTOR 9 месяцев назад
How long does one epoch take for you? Besides the GPU, it also depends on the length of the input data
@ProdByOWI 9 месяцев назад
@@AiVOICETUTOR and can i change the input data ? How can i do it
It takes 5 minutes per 1 epoch and for 300 epoches...
@AiVOICETUTOR 9 месяцев назад
You could reducing the size of the input voice (make it shorter). But I don't think it should take that long on a 4070. By any chance, have you changed the batch size?
@2054E 9 месяцев назад
i have 4050 and one epoch 8 minute for me :D

Следующие

Автовоспроизведение

Ai Influencers with Consistent Faces Made Easy - Fooocus Tutorial