My TOP 3 Tips for Training Better AI Voices - RVC Voice Cloning

Поделиться
HTML-код
  • Опубликовано: 15 июл 2024
  • Links referenced in the video:
    Tensorboard video: • Get the BEST AI Voice ...
    Hardware for my PC:
    Graphics Card - amzn.to/3pcREux
    CPU - amzn.to/43O66Ir
    Cooler - amzn.to/3p98TwX
    RAM - amzn.to/3NBAsIq
    SSD Storage - amzn.to/42NgMFR
    Power Supply (PSU) - amzn.to/430bIhy
    PC Case - amzn.to/447499T
    Mother Board - amzn.to/3CziMXI
    Alternative prebuilds to my PC:
    Corsair Vengeance i7400 - amzn.to/3p64r22
    MSI MPG Velox - amzn.to/42MnJHl
    Cheapest and PC recommended:
    Cyberpower 3060 - amzn.to/3XjtZoP
    Come join The Learning Journey!
    Discord - / discord
    Github - github.com/JarodMica
    TikTok - / jarodsjourney
    If you found anything helpful, please consider supporting me and the content I am trying to produce!
    www.buymeacoffee.com/jarodsjo...
  • НаукаНаука

Комментарии • 66

  • @M4rt1nX
    @M4rt1nX 10 месяцев назад +4

    Nice set up. Better lightning and new background. A lot of improvements there.

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад +1

      Thanks Luz, I've moved to a different portion of my room xD!

  • @SparkysTechCorner
    @SparkysTechCorner 10 месяцев назад +1

    Good video, good info stuff Iv came across in my own trial and error. Keep up the good work man

  • @jaimeleau
    @jaimeleau 10 месяцев назад

    Thanks man 💪

  • @greenockscatman
    @greenockscatman 10 месяцев назад +7

    Solid tips all around! You're right to put the dataset first because "garbage in, garbage out" is probably the first thing you're going to learn through trial and error. Appreciate this vid is mostly geared towards AI voice changing, but if you're doing any AI music where you want to change the vocals, my tip is to not go overboard with UVR in trying to "clean up" the target voice (singer in the song you're wanting to replace the vocals for). Lots of times just a single pass through of Kim Vocal 1 sounds miles better than doing that + de-echo, dereverb etc. It's easy to end up losing some of the little qualities of the song that make it sound good if you clean it up too much.

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад +3

      Thank you! A little bit of trial and error and GIGO will be your new motto 😂. This is a good tip as well for the inferencing side of things, definitely though you don't want reverb much on the training side still imo.

  • @MaisnerProductions
    @MaisnerProductions 10 месяцев назад

    great tips

  • @Antonsetiady
    @Antonsetiady 10 месяцев назад

    Thanks sir

  • @reedmoon3630
    @reedmoon3630 6 месяцев назад

    Thanks for the tips. I'm swapping singer voices. I have good data of about 20 minutes. 200 epocs. I used Harvest and RMVP_gpu for both training and processing. The results are ok but I still hear too much of the original singer's voice. What can I adjust to make the cloned voice totally replace the original voice?

  • @klaurcschwackerberg1880
    @klaurcschwackerberg1880 10 месяцев назад

    Would you know if it is already possible to make a training which allows me text to audio from acapella's , but I want to avoid the nightmare training from Tacotron 2 , and use n RVC v2 kind of training nice and easy. So I mean I want to train a model by adding acaopella's to the model, in an easy way like you can do in RVC v2 , without having to transcript every sentence as that is needed for tacotron2 training, , and then when inferencing the model , use the type text to audio ! is that not possible yet ? Wouldn't that be great ? Or did I miss something ?

  • @SaveTheGregoryHorrorShow
    @SaveTheGregoryHorrorShow 5 месяцев назад

    Hey I'm still new to AI (especially RVC) training, how many epochs does it take for each varying duration of datasets? Like a dataset that's either 1 and a half to 5 minutes, 5-10 minutes, 10-15 minutes, 15-20 minutes, 30+ minutes, etc. I have varying datasets that are very short to very long. For example, my shortest model is 1 minute 11 secs, my longest one is 43 minutes 57 secs. I hope you understand how I explained it since I'm on the autism spectrum and I love how AI is progressing. Hope you reply soon (cause I know you're a busy guy lol), thanks for reading!

  • @Molandria
    @Molandria 3 месяца назад

    Greetings. I have been steuggling with this stuff for weeks. I am at a point now where i can train models with RVC, however... i am having a problem i l'm not really finding ANYTHING about, anywhere. :(
    I will say one thing, and the model steaight up, will say a different word.
    It was based off a voice recording from an anime character of which there is not a whole lot of audio to begin with...
    Is it possible to say, I make a model from scratch using myself, and just talk and talk and talk, then after, graft the voice tone of the character onto that? Would that solve tue linguistic issues, or add new ones?
    For now though, i think i'll restart the data set from scratch using some tips here. =)
    Your help is amazing.

  • @hdhdhvdjgdjjdbjdb5541
    @hdhdhvdjgdjjdbjdb5541 10 месяцев назад

    How to minimize the delay when streaming? Get better vga? Is 4060 laptop has better delay than 3060 12gb?

  • @azadi9999
    @azadi9999 5 дней назад

    Is it possible to start small first and then improve the created sound model with more datasets or more epochs, I mean that we don't have to do the modeling from the beginning again. If it's possible please tell us how we can do that?

  • @heyheybackup
    @heyheybackup 10 месяцев назад

    could you do a tutorial connecting this to OBS?

  • @moriakiinamine1372
    @moriakiinamine1372 10 месяцев назад +6

    Hello! The new RVC update makes training with CUDA faster. With my RTX4070ti it takes 30 seconds per epoch

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад +1

      This is awesome to hear! I'll have to check what they adjusted

    • @diegolopez-xz8pg
      @diegolopez-xz8pg 10 месяцев назад

      Hi, glad to read that. Where did you get the "RVC update"? Thanks and regards from Argentina

    • @moriakiinamine1372
      @moriakiinamine1372 10 месяцев назад

      @@diegolopez-xz8pgcomo tenia problemas de configuraciòn, fui a ver al github y hay una actualizaciòn de hace un dia atras.

  • @EvanTunes
    @EvanTunes 8 дней назад

    Can We Retrain a Model? or Do we have to Train it from the starting?

  • @Nishartist
    @Nishartist 10 месяцев назад

    When i try to train voice .
    in preprocess section its shows this error
    start preprocess
    ['trainset_preprocess_pipeline_print.py', 'D:\\RVC0813Nvidia\\Dataset\\Myvoice\\Myvoice.wav', '40000', '24', 'D:\\RVC0813Nvidia/logs/Myvoice', 'False']
    Fail. Traceback (most recent call last):
    File "D:\RVC0813Nvidia\trainset_preprocess_pipeline_print.py", line 111, in pipeline_mp_inp_dir
    for idx, name in enumerate(sorted(list(os.listdir(inp_root))))
    NotADirectoryError: [WinError 267] The directory name is invalid: 'D:\\RVC0813Nvidia\\Dataset\\Myvoice\\Myvoice.wav'
    end preprocess

  • @northwestrepair
    @northwestrepair 2 месяца назад

    I dont understand why i cant get any decent result.
    No matter what i do it will sound like a robotic noise.

  • @macdoctorsg
    @macdoctorsg 10 месяцев назад +1

    great tutorial mate! I realized a lot of your videos have your voice (audio) outta sync with your visual, i.e. seems like your video couldn't catch-up with your voice.

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад

      :O, my voice is not in sync I'll have to check lol

  • @miinyoo
    @miinyoo 7 месяцев назад

    I've banged my head against this for two solid days.
    I think the noticeable AI sound is a combination of things. #1 on the list is compressed source audio. #2 is leaving silence pre-processed bits in the dataset. #3 is not enough variety in a dataset. #4 Parameters and turning knobs etc.
    I have found making convincing RVC is really really fucking hard. You can do it with other noise in the background and no one notices, but once it's "alone in the room" it always seems to fall on its own face.

    • @Jarods_Journey
      @Jarods_Journey  7 месяцев назад

      Not 100% there, some models I've trained though sound 80-90%, though on scrutiny, it's possible to tell.
      Data is 100% key here.

  • @denblindedjaligator5300
    @denblindedjaligator5300 6 месяцев назад

    Hello Jarod's Journey. I would like to know if you would like to train a module for me, where I have set it to false `You can get up to a higher batch size I can only get up to 26 It sounds like there is an autotuner on, when I have trained over 200 epoches. but it could well be, if you train with 35 batches, that it became more precise. How can I send you my dataset set the pitch to false thanks.

  • @motokorcle
    @motokorcle 10 месяцев назад

    can I use this software on fortnite like live?

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад

      If you wanted to and had a powerful enough PC, yes.

  • @LarsEsDoch
    @LarsEsDoch 2 месяца назад

    How do you use these models in real time?

  • @Joe-hp6jz
    @Joe-hp6jz 10 месяцев назад +4

    Is it recommended to train voice samples (talking) and singing voice samples together, or would that compromise the overall quality? Would it be better to train only singing voice samples to make an AI song cover?

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад +2

      I have yet to make an explicit comparison, but you can get really good models still with datasets mixing the two (I've done several this way). It might make for an interesting comparison to split that data setup and see what results in the best model 🤔

  • @jlobstertv
    @jlobstertv 10 месяцев назад +5

    The UVR tool is effective at separating music vocals from instrumentals; however, in certain instances, there may be some static noise present in the background of the UVR Vocal output. Therefore, it is not guaranteed to work flawlessly for removing background noise in general audio recordings. To ensure clean recordings, it's advisable to use a microphone with noise cancellation capabilities in conjunction with Krisp, a noise-canceling AI app, during the recording process. Additionally, I wish I had known to "start with small datasets" earlier, as I've already set 1000 total epochs for my voice model and it is still training as of now🤣. 15 more hours is my estimated time of completion, I just hope it will turn out well🙏

  • @Avax84
    @Avax84 10 месяцев назад +1

    Does it help if you clean up your device? I’m having the voice changer with cpu (and AMD Radeon graphics card) and what ever I do, on discord it’s extremely slow. Also i can’t use CUDA because when I check if it works in the console it keeps saying “false”

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад

      That's mainly a hardware limitation, you can try using the directml version of it but CPU is slow and AMD is unstable sometimes. CUDA is Nvidia proprietary so that is why you aren't able to use it

    • @Avax84
      @Avax84 10 месяцев назад

      @@Jarods_Journeyif I try to use Cuda it unfortunately says falls when I try to check if it works (by checking in powershell) so that’s currently my biggest hazard

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад

      @@Avax84 You can't use CUDa because you need an Nvidia GPU, so that's why you'd have to check out the directml version to see how that works

    • @Avax84
      @Avax84 10 месяцев назад

      Also, idk how to fix that

    • @Avax84
      @Avax84 10 месяцев назад

      I hear myself with voice changer but very badly, like 10% quality of what I hear when testing in client

  • @klaurcschwackerberg1880
    @klaurcschwackerberg1880 10 месяцев назад

    Does anyone know a good model for UVR5 that can extract acapella's from music but now without the backing vocals ? I Know X-minus can do this but I want to use UVR5. I just don't know what model I need to choose, thanks

    • @Poney01234
      @Poney01234 10 месяцев назад

      Have you tried MVSEP (online) ?

    • @amiraskari4055
      @amiraskari4055 7 месяцев назад

      this is my question too, did you found anything?

  • @LindaSummer27
    @LindaSummer27 10 месяцев назад

    How to download RVC?

  • @denblindedjaligator5300
    @denblindedjaligator5300 10 месяцев назад

    where can i find the guitar model? How can i get the mpeg working on the mac side? i can not train my voices ore make my Model Inference. Should me and my frend use Xformers ore not?

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад

      RVC's guitar model can be found here: huggingface.co/spaces/lj1995/vocal2guitar/tree/main/weights
      Unfortunately, I don't know whether or not RVC uses xformers or not and mac I'm not sure since I don't own a Mac.

    • @denblindedjaligator5300
      @denblindedjaligator5300 10 месяцев назад

      The index file is missing thanks

    • @denblindedjaligator5300
      @denblindedjaligator5300 10 месяцев назад

      I found the index file. I have to retype logs not weights

  • @synthmaster4959
    @synthmaster4959 10 месяцев назад

    Hey man is there a rvc ai download with working tensorboard?
    Assuming its a clean windows install

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад

      I believe with the folder that you download, it includes the package. But if not, check out this video here: ruclips.net/video/P0M7PAsG1fk/видео.html

    • @synthmaster4959
      @synthmaster4959 10 месяцев назад

      @@Jarods_Journey ive tried bro, i cant get it working, in the runtime folder in the rvc download theres phython and the tensor stuff i just cant get it to work.
      I tried yoyr guide also but it breaks the rvc latest release if i install another phython.
      Can you take a peep at the latest release after the beta as its like 2 weeks old

  • @sotiris6116
    @sotiris6116 2 месяца назад +1

    i have 15+ mins of studio quality vocals, but I always get effed up S and T sounds and foggy vocals. I've tried lower batch size but nothing changes.....what can I do??

    • @DJDJisMusic
      @DJDJisMusic 16 дней назад

      I have the same issue, I think at recording we must to emphasise on S and T and for the foggy areas record a variety of high notes

    • @sotiris6116
      @sotiris6116 16 дней назад

      @@DJDJisMusic Setting the batch size all the way up seems to help a little. But still it is not perfect.

  • @AdvancedGamingYT
    @AdvancedGamingYT 10 месяцев назад

    Any tips for the real time voice changer? I can't get it to sound right :/

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад +1

      Depends on graphics card, but you need a good model and then you need to optimize your settings as well. Biggest thing though is the GPU.

    • @AdvancedGamingYT
      @AdvancedGamingYT 10 месяцев назад

      @@Jarods_Journey Yeah I don't know if it's my mic but for me it sound kinda robotic and not smooth. I have a 3070ti laptop which should be like 3060(ti) desktop ish level.

  • @nickysingha39
    @nickysingha39 10 месяцев назад

    Any voice changer for mobile phone

    • @Jarods_Journey
      @Jarods_Journey  10 месяцев назад +2

      For RVC voices, I haven't run into any because it requires too much compute power. As well, realtime voice changing takes a lot of power so I don't see it being something on phones yet.

    • @nickysingha39
      @nickysingha39 10 месяцев назад

      @@Jarods_Journey ok thanks maybe in future you find a way for phones btw I love your video keep going I'll always support you...

  • @victorhugodasilva7285
    @victorhugodasilva7285 8 месяцев назад

    Great tutorial! Also, will you marry me? 🥺