A BETTER RVC Training Method for AI Voices?

Поделиться
HTML-код
  • Опубликовано: 23 авг 2024
  • Links referenced in the video:
    RVC Voice Training - • RVC (Retrieval-based V...
    Hardware for my PC:
    Graphics Card - amzn.to/3pcREux
    CPU - amzn.to/43O66Ir
    Cooler - amzn.to/3p98TwX
    RAM - amzn.to/3NBAsIq
    SSD Storage - amzn.to/42NgMFR
    Power Supply (PSU) - amzn.to/430bIhy
    PC Case - amzn.to/447499T
    Mother Board - amzn.to/3CziMXI
    Alternative prebuilds to my PC:
    Corsair Vengeance i7400 - amzn.to/3p64r22
    MSI MPG Velox - amzn.to/42MnJHl
    Cheapest and PC recommended:
    Cyberpower 3060 - amzn.to/3XjtZoP
    Come join The Learning Journey!
    Discord - / discord
    Github - github.com/Jar...
    TikTok - / jarodsjourney
    If you found anything helpful, please consider supporting me and the content I am trying to produce!
    www.buymeacoff...

Комментарии • 64

  • @prateekkumarsingh9529
    @prateekkumarsingh9529 8 месяцев назад +1

    Hey Jarods, just give us or make a video on how to fine tune the model. Hoping the positive response from your side.

  • @mirage_zoe
    @mirage_zoe Год назад +11

    I was thinking this is the default way to train 😂. I didn't really fed multiple snippets or things like that and just did 1/2/3 giant files as datasets and everything works well. Also I have a small question: if u were to guy a gpu to train & use, which one that would be decent and also affordable (relative mid to up entry?). I would like to buy a new gpu and idk which one to get yet. Seen that having a lot of vram seems to make it great but I've seen some video comparations before and sometimes depends. Another question would be: how does an AMD Radeon 7800 XT do when training?

    • @Jarods_Journey
      @Jarods_Journey  Год назад +4

      Probably to some, but my previous vids showed it with split datasets lol 😂.
      Speaking for "AI" stuff only, an Nvidia card is #1 if you want compatibility with every new tool that comes out. AMD solutions are coming, but you may run into issues where it doesn't always work as you'd expect because a majority of AI development has been done on Nvidia.
      I would honestly recommend a *used* 3060 12 GB. I just bought one myself and it does great on a lot of things. I think it's the best bang for you buck ATM but you just have to be patient. But if you can get a used 3090 at $600-700, that would also be a good choice due to the 24gb of VRAM.
      I would stay away from new 40 series cards ATM as they're generally just way more expensive for performance compared to their used counterparts.
      I can't comment on the 7800 XT as I don't have AMD. I would hold off on trying to buy AMD stuff for AI atm.

    • @mirage_zoe
      @mirage_zoe 11 месяцев назад +1

      Thank you very much for the answer. ❤

  • @AlexanderKuznetsovAKASergei
    @AlexanderKuznetsovAKASergei Год назад +4

    Unfortunately not sure what's being discussed here, cuz i just manually divide/export selected tracks of the dataset in Audacity to anything smaller than 10 seconds and avoid selecting silent parts.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      I'll be talking more about it in the follow up video, but the gist is that you can just truncate silences and feed that file into RVC and be fine. No need to split it

    • @AlexanderKuznetsovAKASergei
      @AlexanderKuznetsovAKASergei Год назад

      @@Jarods_Journey Tldr: feed your file to a program to remove the silences, feed the silence free file to RVC for it to automatically split it on it's own.

  • @Skeetawn2ndacc
    @Skeetawn2ndacc 11 месяцев назад +2

    I always used only 2-3 long files instead because i was lazy to split them lmao

  • @HR-zg9ci
    @HR-zg9ci 11 месяцев назад +6

    What if I want an AI voice which is capable of speaking calm, like reading a book out and loud, and also who is able to scream out loud, like a football coach? Should I feed for training both calm and screaming footage in one session?

    • @amiraskari4055
      @amiraskari4055 8 месяцев назад +1

      this is exactly my question, did you found an answer?

    • @PriyanshuSingh-sd2dc
      @PriyanshuSingh-sd2dc 7 месяцев назад

      Train two ai with variations of audio and use both and blend them works great

    • @vixxcelacea2778
      @vixxcelacea2778 5 месяцев назад

      @@PriyanshuSingh-sd2dc How do you blend two models?

  • @TheNoobyworld
    @TheNoobyworld Год назад +2

    Do you mean for training you provided a training dataset directory with just one large audio file for the process data step in RVC?
    Or you had multiple audio files each from a RUclips videos etc that you fed into process data step?

    • @Jarods_Journey
      @Jarods_Journey  Год назад +4

      Just one large audio file for the training. I'll be detailing it all out in the follow up video so stick around till that one

    • @TheNoobyworld
      @TheNoobyworld Год назад

      @@Jarods_Journey
      Thanks!

  • @denblindedjaligator5300
    @denblindedjaligator5300 2 месяца назад

    I have found a model that is trained with the setting False. When I change the pitch, it will affect the model. how did you do it, because if I make a model and set it to false, I can't do the same. imagine an autotuner here.

  • @Sebax
    @Sebax 9 месяцев назад +1

    Thanks for your videos 🙏🏻 quick question, is it possible to train a model using say, 2 files of ~1 hour each (voice only)?
    is the workflow the same?

  • @denblindedjaligator5300
    @denblindedjaligator5300 Год назад +1

    How can auto detect the index file? I am using a screen reader and i have to brows the compobox. What is the singer and speaker id?

  • @denblindedjaligator5300
    @denblindedjaligator5300 2 месяца назад

    can
    you explain me how to start tensor board from within rvc

  • @oleksandr5700
    @oleksandr5700 11 месяцев назад +1

    Hi, I woud like to ask , could share the structure of the whisper script , how to cut speech only when there is no interuption, i would like to try it on training. Thank you!

  • @brazerlazer3834
    @brazerlazer3834 Год назад

    Can you do a video about ckpt Processing in The Mangio-RVC-Fork? There are no real guides to it anywhere and I would Really like to understand how it works

  • @nirupamdhar3449
    @nirupamdhar3449 6 месяцев назад

    Friend i am from india
    I have a request
    How to clone voice on lowend computer?

  • @denblindedjaligator5300
    @denblindedjaligator5300 6 месяцев назад

    and what is the advantage of having more tensor cores, if I have 384 on my 4080, does it affect my batch size, so if I can upgrade my vram, I mean I have to use more tensor cores, to be able to get up at a higher batch size?

  • @aji9666
    @aji9666 11 месяцев назад

    Hello my friend, I need your help. I have a model that has been trained and I have 500 votes and I want to transfer them all at once. How can that be done? Of course, the duration of the votes does not exceed 10 seconds.

  • @denblindedjaligator5300
    @denblindedjaligator5300 4 месяца назад

    if i make a module that doesn't respect pitch how can i get it to change pitch when i transbone it i have a daft punk vocoder module and when i transpose it will work the module even if it has no pitch and what batch size should i train on

  • @denblindedjaligator5300
    @denblindedjaligator5300 2 месяца назад

    Hi Mike. if i make a module that doesn't respect pitch how can i get it to change pitch when i transbone it i have a daft punk vocoder module and when i transpose it will work the module even if it has no pitch i can make a recording of it if you like. do you have dropbox?

  • @Matchstickn
    @Matchstickn Год назад +2

    What happens if you clump all the small splits into one big audio clip and then train the model from there?

    • @DeL2022
      @DeL2022 Год назад

      It will divide and train it again. It showed that it was a built-in rvc function

    • @Jarods_Journey
      @Jarods_Journey  Год назад +2

      What was said ^^, with them already split, it may be better to just train with the split instead of remerging. I'm still testing this though.

  • @macdoctorsg
    @macdoctorsg Год назад

    Hey Jarods, you got a good recommendation of a proper Google Collab link? I still couldn't get my Mac to locally install the RVC till date and no one seems to be able to offer any help over at Discord since, so I guess I'll have to go with the online thingy instead. There're quite a few different links out there, and many of them don't really do a good job; they either hang or timeout halfway, error messages kept showing up every now and then. Just can't train a model properly!

    • @Jarods_Journey
      @Jarods_Journey  Год назад +3

      The only official link that I know of is on their github page, which can be found by following this tutorial here: ruclips.net/video/9wu6LSue_dU/видео.html&pp=gAQBiAQB
      By now, they are a bit older, but should still have all of the stuff you need to get going. The fact that they hang or timeout halfway is unfortunately due to collab itself most likely and not due to the repo. Sorry to hear about the Mac installation, but as I don't have a Mac, makes it hard for me to help you guys out on that side :(!

    • @macdoctorsg
      @macdoctorsg Год назад

      @@Jarods_Journey thanx, will try that again..

  • @JonathanSantosDeveloper
    @JonathanSantosDeveloper 11 месяцев назад

    Hi! Thank you for sharing your knowledge. Do you kknow shich sientific paper supports rvc? I'm working on a paper where I need to cite RVC, but I'm not finding the correct paper to cite. Thank you!

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      To my knowledge, there's no paper behind RVC, but you can probably cite the github repository. You might also want to look into citing the other papers that rvc used which are at the bottom of the repo

  • @denblindedjaligator5300
    @denblindedjaligator5300 6 месяцев назад

    can i upgrade my vram on a 400 i am blind so i have to get sighted help.

  • @KuletXCore
    @KuletXCore Год назад

    What settings do you recommend for a GTX1060?
    I don't have the money to get a brand new card so i'm going to use my current one

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      You can try training with a lower batch size, like 1-2 to see how that performs on your card. I would actually recommend you do a little bit of training via Google colab as that would be faster.

  • @magickey8
    @magickey8 11 месяцев назад

    How do I work with more than one speaker or more than one model? I'm trying to keep the model I've already trained but start a new model or speaker and be able to select between the two and maybe add more later. Do I need to clear out any files from the first training before I begin the new one, etc.? I tried to add a second speaker but it seems that the "training" went way too fast to have actually worked and the vocals.out.wav is the voice from the first speaker.

  • @Mr_Thoror
    @Mr_Thoror 11 месяцев назад

    how can I convert the project to start it locally

  • @eventfakt
    @eventfakt 11 месяцев назад

    Hello, when I use collab, the most time-consuming connection is suddenly disconnected, and when I try to connect again, it does not work at all. This issue has been happening to me for several days, please give me a solution so that I can use it again.

    • @vixxcelacea2778
      @vixxcelacea2778 5 месяцев назад

      If you're using free version, this is just what it does. Pro also has limitations. When it disconnects, you are out of time to us their GPU on that account. You need to either switch accounts or wait. Free is limited. Pro is also limited, but less so. Usually it disconnects after an hour or two in my experience.

  • @moulciber
    @moulciber 11 месяцев назад

    What is the best audio format for RVC training? Everyone is used to using WAV, but why? If there is the same FLAC with a huge bitrate.

    • @Nerthexx
      @Nerthexx 11 месяцев назад

      Is FLAC truly lossless? And doesn't introduce artifacts (even from basic floating point maths)?
      Just talking from a signal processing perspective: if the data is invisible/you can't see or hear it, it doesn't mean it's not there, and it does make a difference. And a very huge one.
      Just a basic example, if you open up photoshop and apply gaussian filter to the image, you probably think it's not reversible. But it is (using FFT and inverse kernels), as long as you keep spacial information in place and don't resize your image/apply further processing/quantize the data in any way. If you change at least anything - the data is gone and you cannot reconstruct it.

    • @PriyanshuSingh-sd2dc
      @PriyanshuSingh-sd2dc 7 месяцев назад

      So in short there are two types of compression of audio video fir specifically audios lossy(mp3) and lossless (wav,flac ) basically it depends upon the compression technique that how all the highs ,mids and lows frequencies are retained So always go for wav

  • @salat
    @salat Год назад

    Question from a non native speaker: You pronounce RVC like OVC - is this intended? You otherwise pronounce "r" quite distinctively - noticed this in the last 5 videos or so.. :)

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Eh, just speaking it as a native lol, no intention behind it. I don't think I'm saying OVC but closer to "arr-vee-see"

    • @jerrythefeared
      @jerrythefeared Год назад

      @@Jarods_Journey Sounds like (a)RVC to me, which is the normal pronunciation.

  • @denblindedjaligator5300
    @denblindedjaligator5300 6 месяцев назад

    what batchsize do you train on

  • @HappyHostages
    @HappyHostages Год назад

    How much space should I expect to use if I’m new to all this? I’m getting a new computer but want to make sure I get the right amount of space on it… if you don’t mind me asking

    • @DeL2022
      @DeL2022 Год назад

      The folder with the program and all my models takes 594 gb, don't forget that you will need space not only for rvc

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      I recommend you get a 2tb card, those can be found relatively affordable nowadays around like $60, but if not, you could always get a 1tb and add/upgrade later.
      I don't have a video on this, but managing what files you have is really key to making sure you don't go through storage space too quickly. An example is often times I'll train through 100gb of data in one session, and then delete upwards of 99gb sometimes worth of data.

  • @j0x1m3r
    @j0x1m3r 11 месяцев назад

    Hi, are there any alternatives to Google Colab, because Google Colab doesn't let you train voice model more than 4-9 epochs.

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад +2

      There are other cloud services, a popular one is paperspace but they are paid services. I think Collab is starting to cut down on the amount of free usage for projects like these unforuntately, but if other alternatives get found, I might make videos on those.

  • @GgGg-qu2jp
    @GgGg-qu2jp 11 месяцев назад

    what is the best in the rvc program crepe or tiny crepe or full and why, if possible

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      Tiny - fast, worse audio
      Crepe - normal, good audio
      Full - slower, best audio
      The different sizes use modified versions of the same algorithm to estimate pitch (changing layers and widths). Try them out to gauge which one you like the best

  • @deadvesu
    @deadvesu 10 месяцев назад

    Hey I am struggling to make the voice sound clear there's always artifacting or whatever its called ( distortions in vocals, not for every word but some words ) .. I am using usually 5-10 minute files in my training folder .. about 3 files. the audio is kinda clear theres not much background noise , maybe the mic quality isn't perfect but it's still really scuffed.. any tips to make it sound clearer? or some videos? Should i separate it into smaller audio files? should i do more audio file processing before I start training? should i train for longer than 400 epochs? I'm lost :c

    • @PriyanshuSingh-sd2dc
      @PriyanshuSingh-sd2dc 10 месяцев назад +1

      just increase the data set and try to use de-echo and reverb in rvc uvr5 tab on every dataset make sure your data set is dense means less silence

    • @deadvesu
      @deadvesu 10 месяцев назад

      ​@@PriyanshuSingh-sd2dc Okay thanks i will try that

    • @amiraskari4055
      @amiraskari4055 8 месяцев назад

      i have the same problem. my 10 minute dataset is clear, but when i want it to cover something like opera it comes out with a lot of artifact and distorions, did you fixed the issue?
      i don't know if the problem is because of the high pitch of the opera sound of undertrained ai voice model

  • @MCDTHEKID
    @MCDTHEKID 10 месяцев назад

    idk if your able to help but whenever i try to train a model locally my whole system crashes, any ideas as to why that would happen?

    • @retrolinkx
      @retrolinkx 8 месяцев назад

      Could be a memory issue. When I train stuff I usually just close everything else and go do something else for a few hours. The first time I tried it it wouldn't work due to not having enough memory so now I make sure nothing is open by the command line and my browser.

  • @aldinsx
    @aldinsx 11 месяцев назад

    Google collab pro now cant train rvc?

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      Might be the case, google is cutting down on free usage of its platform as these projects grow largwr