How to Make the PERFECT Dataset for RVC AI Voice Training

Поделиться
HTML-код
  • Опубликовано: 15 июл 2024
  • Links referenced in the video:
    Ultimate Vocal Remover - github.com/Anjok07/ultimatevo...
    ffmpeg - www.gyan.dev/ffmpeg/builds/
    Audio Splitter - github.com/JarodMica/audiospl...
    Python - www.python.org/downloads/
    whisperx - github.com/m-bain/whisperX
    Install git, vscode & python - • How to Install Python,...
    Hardware for my PC:
    Graphics Card - amzn.to/3pcREux
    CPU - amzn.to/43O66Ir
    Cooler - amzn.to/3p98TwX
    RAM - amzn.to/3NBAsIq
    SSD Storage - amzn.to/42NgMFR
    Power Supply (PSU) - amzn.to/430bIhy
    PC Case - amzn.to/447499T
    Mother Board - amzn.to/3CziMXI
    Alternative prebuilds to my PC:
    Corsair Vengeance i7400 - amzn.to/3p64r22
    MSI MPG Velox - amzn.to/42MnJHl
    Cheapest and PC recommended:
    Cyberpower 3060 - amzn.to/3XjtZoP
    Come join The Learning Journey!
    Discord - / discord
    Github - github.com/JarodMica
    TikTok - / jarodsjourney
    If you found anything helpful, please consider supporting me and the content I am trying to produce!
    www.buymeacoffee.com/jarodsjo...
  • НаукаНаука

Комментарии • 355

  • @Jarods_Journey
    @Jarods_Journey  Год назад +52

    The end of the video got cut off -_-. I only had like 10 seconds left so when I get the chance, I'm just going to link a shorts so that you guys can see the rest of the video lol

    • @Jarods_Journey
      @Jarods_Journey  Год назад +7

      Finishing the Data Curation Video...

    • @rytraccount4553
      @rytraccount4553 Год назад +3

      @@Jarods_Journey Your audiosplitter code exports 44.1khz audio. how do I make it export 48khz? I am losing quality with this code!

  • @joshuashepherd7189
    @joshuashepherd7189 Год назад +4

    OMG Jerod! Your video tutorials are becoming better and better. I love seeing a new release from you! Thanks for all your hard work!

  • @OthiOthi
    @OthiOthi 11 месяцев назад +2

    Jarod managed to help me figure out a strange problem that I was not able to figure out at all. He's got my sub. Thanking you kindly!

  • @ohheyvoid
    @ohheyvoid Год назад +27

    Just found your channel last night, and your workflows are so clear and to the point. Quickly becoming my go-to for voice2voice workflows. Thank you for your work.

  • @keisaboru1155
    @keisaboru1155 Год назад +39

    how to combine ! - voices to create a total unique one !

  • @IIStaffyII
    @IIStaffyII 11 месяцев назад +11

    Wow, I am amazed by this channel. A few weeks ago I was searching for Diarization of voices but had no good luck finding a good fit.
    Not only do you have a very good tutorial, you seem to be knowledgeable and up to date with everything (as up to date as one can when things are moving this quick).

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад +3

      Too many things, too fast. Appreciate it :D, tis is the realm of open source.

    • @brianlink391
      @brianlink391 5 месяцев назад

      @Jarods_Journey Love you, bro! Thanks a ton. I didn't even know this existed!

  • @ShiinoAndra
    @ShiinoAndra 5 месяцев назад +1

    Just found your channel, and I want to say i'm too deep into the rabbit hole that I instantly recognize all the voice you use for conversion at the start😂

  • @temporallabsol9531
    @temporallabsol9531 5 месяцев назад

    Bro. This channel is amazing. I've been around and you are needed by many. Welcome.

  • @cubicstorm81
    @cubicstorm81 16 дней назад

    For those receiving an error with the "split_audio" script not creating the .srt audio as per the above tutorial, run this in an Anaconda or Python prompt, let it download the required dependencies and it will work as you need.
    Thank you for a great tutorial!

  • @M4rt1nX
    @M4rt1nX Год назад +24

    Thank you Jarod.
    If people don't want to use GIT they just can download the zip and unpack it at the preferred location. 😉

    • @Jarods_Journey
      @Jarods_Journey  Год назад +2

      Solid tip, thanks Luz! Totally skipped my mind.

  • @ZitronenChan
    @ZitronenChan Год назад +8

    Your channel and the AI Hub have helped me a lot in getting started. I just trained a model with 2 hours of audio from Faunas last stream in RVCv2 on 1000 epochs and it came out very well

    • @Jarods_Journey
      @Jarods_Journey  Год назад +2

      Haha awesome, glad to hear!

    • @paradym777
      @paradym777 Год назад +1

      Is there a way I can get a copy of it? (>

    • @VexHood
      @VexHood 5 месяцев назад

      how much better is that than 300? does that prevent static sounds if you don't use pretrained generators?

  • @ControllerCommand
    @ControllerCommand 11 месяцев назад

    your channel is amazing. I was looking for this long time.

  • @matthewpaquette
    @matthewpaquette Год назад

    Great tutorial!!

  • @MaorStudio
    @MaorStudio 8 месяцев назад

    Thank you so much. King!

  • @sukhpalsukh3511
    @sukhpalsukh3511 Год назад

    Great , Thank you for this video,

  • @VegascoinVegas
    @VegascoinVegas 3 месяца назад

    Exactly
    what
    I
    needed
    to
    know

  • @smokey4049
    @smokey4049 Год назад +4

    Hey, thanks for your awesome series of tutorials! As someone who is pretty new to this, it really helps out a ton. Would it be possible if you could make a tutorial on how to train a RVC 2 Voices with the dataset I just created? Thanks again and keep up the great work!

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Appreciate it! Respective tutorials already exist, so I'd go check those out! ruclips.net/p/PLknlHTKYxuNshtQQQ0uyfulwfWYRA6TGn

  • @fuuka69420
    @fuuka69420 Год назад

    Hey another banger video mate!
    Do you reckon its wise to keep the sound of breaths such as when they inhale or exhale?? or do I need to ONLY need the part where the source voice talks or sing?? let me know your thought and keep up the cool vids!

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Whatever is included in the split audio, should be fine. It may cut out some of the breathing perhaps at the end of a sentence or beginning, it everything else in between is fine to keep :)!

  • @whimblaster
    @whimblaster 10 месяцев назад +1

    Do I need to sing in the audio for the dataset or talk is enough (like reading something from the web)? Thx, apart from that great tutorial. ^^

  • @PowerRedBullTypology
    @PowerRedBullTypology 8 месяцев назад

    Jarod, do you know if there is software or websites or whatever that let you make a new voice out of other voices? Like blend them into a new voice? especially RVC type of voices (since I know that best) ..but would be curious otherwise of others too

  • @Dante02d12
    @Dante02d12 Год назад +3

    Hey there! Thank you for all those videos! I hadn't realized UVR5 had advanced options, lol.
    Hey, I have a question that can look silly but it is serious : is it really required to train for _hundreds_ of epochs? I have had absolutely great results with 50 epochs only. What does more epochs bring exactly?
    Meanwhile, the issues I have also happen with models trained for hundreds or thousands of epochs, because most of my problems come from the way I clean the audio I want to clone.
    I also noticed my feminine voices tend to break at growls. Is it required to have growling audio in the database used for training? Or is there a secret sauce to make any voice have growls?

    • @Jarods_Journey
      @Jarods_Journey  Год назад +2

      Appreciate it! A finished epoch indicates that the model has seen every sample once. Increasing epochs just repeats this process for X number of epochs. It's all data dependant, as you don't always need more epochs for a good model.
      As well for growls, just in general, they seem to be harder for the models to infer on and my anecdotal experience seems to be all models kinda struggle with it. I have yet to try training with growls, but I want to try a similar experience with laughing because often times laughing just sound weird 😂

  • @pilpinpin322
    @pilpinpin322 Год назад +1

    Thank you so much ! It's a clear video and we see that you know what you are doing! I have a small question regarding the .wav files of the dataset, is it better to encode them in stereo or in mono? Or does the program make no difference?

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      I don't think it makes a difference, but I read somewhere that it should be done in stereo. It flattens them I believe though so it doesn't really matter after it's been processed though

    • @pilpinpin322
      @pilpinpin322 Год назад

      @@Jarods_Journey Thank you very much ! One last question : Is it better to segment the sounds into files of 10 seconds each, or to cut in the form of complete sentences (and therefore to have files of very variable duration)? Thx for your work !

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      @@pilpinpin322 :), complete sentences works best so you don't get weird clippings, but if you run out of VRAM, you'll need to split into smaller segments.

    • @pilpinpin322
      @pilpinpin322 Год назад

      @@Jarods_Journey Thx for the fast reponse, even if there are very small sentence of 1 sec like " Yess i agree ! " ?

  • @ShelfxYT
    @ShelfxYT Год назад

    Do you have any voice modifications like the ones in the video played in real time? to use the same discord for example voicemod/clownfish ?

  • @dookiepost
    @dookiepost 2 месяца назад

    If you get an error when running xwhsiper, make sure you have version 12 of NVIDIA CUDA toolkit installed

  • @JobzenPadayattil
    @JobzenPadayattil Год назад

    Hey Bruh I'm getting some errors while converting trained data to out put,, ffmpeg error + dtype, type error... (Ffmpeg is already installed )

  • @matrixxman187
    @matrixxman187 Год назад +1

    I have 3 minutes of studio quality lossless vocals I would like to use to train. Is that sufficient?
    Additionally, there are some interviews on RUclips of the same artist speaking at length but I was concerned whether the lower quality mp3 stuff should be avoided for these purposes. Thanks for your video! Very informative

    • @Jarods_Journey
      @Jarods_Journey  Год назад +2

      Muffled audio should be excluded but if the voice sounds good enough you can include it. 3 minutes may be okay, but idk, you just gotta try it out mate 🤟.
      10 minutes or more is recommended but you can use less sometimes and it'll be fine.

  • @foxey461
    @foxey461 Год назад

    🔥🔥

  • @alphaxeu
    @alphaxeu 11 месяцев назад

    Ultimate Vocal Remover is struggling with some track like i hear the instrumental in the back with Kim Vocal 1 is there a model where the vocal are perfect like ?? great vid!

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      The vocal removers are really good, but they're not 100% unfortunately. That's very hard to achieve and I'm sure there are brilliant minds working towards this eventually. But doesn't exist ATM rn, you may be able to get better results with ensemble mode, but you'll have to research a bit on the best combos: github.com/Anjok07/ultimatevocalremovergui/issues/344

  • @Grom76300
    @Grom76300 10 месяцев назад

    I thought this included both the separation and training, but all those GB of programs are only for isolating voice, daym !

  • @shampun2281
    @shampun2281 9 месяцев назад +1

    They have been updated and now it is not possible to sort files by speakers. Can you look at the new version and tell me what can be done? Is it possible to use the old version somehow?

  • @moddest7123
    @moddest7123 Год назад

    Hey Jarod. Slight issue when cloning the audiosplitter_whisper. I don't get the .git file at the top. Just the rest of the files. How do I fix that?

  • @edwincloudusa
    @edwincloudusa 9 месяцев назад

    Can you make a video on how to keep the emotions from the original souce voice? I have everything beautifully working for a clean and perfect voice clone but my source audio has some strong emotion acting, anger/fear/happiness etc, that are not represented in the cloned audio. Thanks.

  • @LosantoBeats
    @LosantoBeats 11 месяцев назад

    Does it matter if my source audio is chopped up? For example incomplete words/sentences etc..

  • @chaunguyenthanh6664
    @chaunguyenthanh6664 2 месяца назад

    Hi Jarods, can I use large-v3 model instead of large-v2?

  • @enoticlive9103
    @enoticlive9103 Год назад +2

    Hi! I'm from another country and I don't really understand English, but this topic is very interesting! How can I teach a model to speak my language better?

  • @MFSCraft
    @MFSCraft Год назад +1

    Is there some kind of vocaloid-like interface so that i have some control on how certain words would sound like? would be cool to have a TTS that could run the trained RVC voices.

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      ATM, I don't know of any that use RVC voice, though I'm bound to see it happening someday

  • @bruhby6276
    @bruhby6276 11 месяцев назад

    Thx for your content! Why would I use WhisperX tho? Is it just for data management or is it actually helps RVC train?

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      For curating better data, by using sub timing, there's may be less chances for audio samples being empty noise

  • @youngtrapgod6375
    @youngtrapgod6375 6 месяцев назад

    Can this be done for so vits? Becaus RVC loses the human element in my voice when I try making cover songs

  • @Metalovania
    @Metalovania Месяц назад

    Hi! I followed your tutorial and managed to set everything up and run the script without getting any errors, but the problem is that I didn't get the expected amount of segments.... I tried the script with three different audios. The first one, of about 4 minutes, got me an output of 35 seconds worth of segments; the second one, also about 4 minutes got an output of 1min 36sec total; and the the third, a bit over 2 minutes, got 55 seconds. Do you know what could be the issue? Also, I tested speaker diarization with another audio but it didn't go very well. It had 4 different speakers, which it separated in only 2 and all 4 speakers where in both folders.

  • @KrazyGen
    @KrazyGen Год назад +1

    I'm trying to do my own voice and got some decent results, but it can't handle higher pitches. Should I add more samples with my voice in a higher pitch, or give it more samples with my normal voice and train it for longer? I have it trained using the Harvard Sentences from a previous video and I did 300 epochs.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      You can try adding samples of higher pitch, it's mainly going to be good at speaking in the pitch and timbre of the voice you train it with, so if your voice is naturally deeper it's not going to know how to handle that if you try to speak high all of a sudden

  • @lockdot2
    @lockdot2 11 месяцев назад +1

    I am still working on it, I have decided to do this on the worst quad core CPU there is, the 1.3 GHz, with no turbo, 4 core, 4 thread AMD Sempron 3850. I spent a bit over a week getting clean audio to save on the Ultimate Vocal Remover. I am using 12 hours of talking.

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад +2

      There is probably a way to do this on collab, but atm, Collab is a hassle I don't wanna have to deal with :(. Good luck on it 🫡

    • @lockdot2
      @lockdot2 11 месяцев назад +1

      @@Jarods_Journey Thanks! It's going somewhat smoothly, got 5 errors in the CPU part of Visual Studio Code, but I am just going to pretend they don't exist, and move on with it. Lol.

  • @denblindedjaligator5300
    @denblindedjaligator5300 5 месяцев назад

    just have a question. How high is your batch size, when you train? Is it something that if you set it too high, you get an imprecise module? If I have a dataset of one hour, what should my batch size be?

  • @21f.a.c.e.s
    @21f.a.c.e.s 4 месяца назад

    Unfortunately, I don't see any file for Cuda setup file in the cloned directory. Any help?

  • @hungluu8404
    @hungluu8404 Год назад

    is there a way to use RVC model in text to speech such as using it in tortoise TTS

  • @RoxWinted
    @RoxWinted 9 месяцев назад

    hello, i'm askin anyone right now because i got a bit lost. i'm trying to make the ai voice not glitch out whenever i'm doing long vowels so it doesn't look for all of them at once making it sound like a mess, and i so far thought you have to train them to sound better, but i think that's not the case. can someone explain what i have to do to achive this?

  • @asdanimatezstuff848
    @asdanimatezstuff848 5 месяцев назад

    how can i return powershell to normal after finishing training the model ?

  • @m0nkeyb0i666
    @m0nkeyb0i666 2 месяца назад +3

    copied from the issues section, worked for me.
    Running split_audio.py threw this error
    Exception has occurred: FileNotFoundError
    [Errno 2] No such file or directory: 'D:\ai\programs\audiosplitter_whisper\data\output\1.srt'
    File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 96, in extract_audio_with_srt
    subs = pysrt.open(srt_file)
    File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 150, in process_audio_files
    extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
    File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 180, in main
    process_audio_files(input_folder, settings)
    File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 183, in
    main()
    FileNotFoundError: [Errno 2] No such file or directory: 'D:\ai\programs\audiosplitter_whisper\data\output\1.srt'
    Additionally, the terminal was saying something about not having or not finding cublas64_12 (I can't remember exactly what it said)
    The error is thrown because the program can't find the srt file, because it can't make the srt file, and this is caused by a mismatch of CUDA versions. Torch (or something) has CUDA 11, but the script (or whatever) needs CUDA 12. I'm not a programmer, I don't know exactly what is what. All I know is that I fixed it.
    To fix this, do the following.
    Download and install CUDA 12 developer.nvidia.com/cuda-12-0-0-download-archive
    Navigate to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin"
    Copy cublas64_12.dll, cublasLt64_12.dll, cudart64_12.dll
    Navigate to "...\audiosplitter_whisper\venv\Lib\site-packages\torch\lib"
    Paste the dlls into this folder
    Now when you run split_audio.py, it will be able to create the srt file, fixing the issue with not being able to find said file.

  • @GaypataponALT
    @GaypataponALT 7 месяцев назад

    I have 954 audio file in my training folder, is it a bit too much for rvc to train?

  • @kaant21
    @kaant21 11 месяцев назад +1

    Dont forget to change execution policy to default when you are done with this

  • @aboodghanem1679
    @aboodghanem1679 7 месяцев назад

    Hello dear, I would like your help regarding sound reproduction via Google Colab. Is the data uploaded in Wave Mono or Stereo format and is it 16 bit or 24 bit?

  • @fountainbird
    @fountainbird Год назад

    Thanks for the vid although I'm confused. I understand the UVR step to isolate vocals. I would generally then use that as the dataset. What is the benefit of the next step of splitting the file up? is that all it does? What else is happening that I don't know about? I've generally just used longer clean audio files for training. Thanks for enlightening me :)

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      By splitting it, we solve the biggest issue of CUDA out of memory as I don't believe RVC splits larger audio files into more digestible chunks. Splitting it allows us to control this issue, and then additionally, get rid of any silence in the audio samples. Then theres also the fact you can easily remove any bad data from the audio file that you may not want in the training set.
      If your running it just fine with UVR without the out of memory issue though, you should be good to go there, but splitting it just gives you a bit more freedom with the data.

  • @Skurios18
    @Skurios18 4 месяца назад

    Just a maybe random question I was having issues installnig the audio splitter and I thought it was because I haven't installed cuda toolkit of NVIDIA, so ended up installing it, but it was other thing that was giving me the error so my question is Should I uninstall this cuda toolkit I don't know what it does exactly or it won't harm my configuration or gpu in the future ?

  • @nexgen91
    @nexgen91 4 месяца назад

    I have audiosplitter_whisper installed and vscode opened, trying to run debugging as per 12:00 in the video and am getting the following error "configuration 'python:file 'is missing in 'launch.json"" any idea what might be going on? BTW: It appears to work if I run "python split_audio.py" in power shell.

  • @kazuviking
    @kazuviking 11 месяцев назад

    One thing with uvr5 is that i leaves trash after every processing. You will have to reinstall it after every 100 or so files processed.

  • @michaelcasado
    @michaelcasado 3 месяца назад

    All points to using WIndows on any of these. Or am I missing something? I am on MacOS and all stuff is .bat and .exe , google collab sandbox running things. Is there no UI to thisdate, that also runs on MacOS? Have I missed to locate it perhaps?

  • @user-gi8le3cv1u
    @user-gi8le3cv1u 11 месяцев назад

    hell @jarod, i got this error whiles its creating output and vocal audio sets
    CUDA is available. Running on GPU.
    The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
    The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
    Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.6. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file C:\Users\kit\.cache\torch\whisperx-vad-segmentation.bin`
    Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
    Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118. Bad things might happen unless you revert torch to 1.x.
    >>Performing transcription...
    Traceback (most recent call last):
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\Scripts\whisperx-script.py", line 33, in
    sys.exit(load_entry_point('whisperx==3.1.1', 'console_scripts', 'whisperx')())
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\whisperx\transcribe.py", line 159, in cli
    result = model.transcribe(audio, batch_size=batch_size)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\whisperx\asr.py", line 288, in transcribe
    for idx, out in enumerate(self.__call__(data(audio, vad_segments), batch_size=batch_size, num_workers=num_workers)):
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__
    item = next(self.iterator)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\transformers\pipelines\pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\transformers\pipelines\base.py", line 1028, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\whisperx\asr.py", line 228, in _forward
    outputs = self.model.generate_segment_batched(model_inputs['inputs'], self.tokenizer, self.options)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\whisperx\asr.py", line 138, in generate_segment_batched
    result = self.model.generate(
    RuntimeError: CUDA failed with error out of memory

  • @victoroam
    @victoroam 4 месяца назад

    11:01 i don't know why, but keeps getting me the same error (No module named 'pysrt') but 'pysrt' is already installed

  • @davidmaldonado9254
    @davidmaldonado9254 Год назад +1

    Thank you for your amazing videos, it really helps me understand how everything works, just one question, I'm having some problems when running the "split_audio" script, it seems it isn't creating the .srt file of the audio and when it tries to call the file it runs into an error, do you know what it could be?

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Whisperx may not be being downloaded correctly. I would try rerunning the setup file again and trying to get this going. One other thing you can do is type and enter whisperx into the console after activating the venv to see if it got installed

    • @davidmaldonado9254
      @davidmaldonado9254 Год назад +1

      @@Jarods_Journey Thanks! I'll try uninstalling everything and installing again because now the set-up is showing error when previously it didn't

    • @nadaup6023
      @nadaup6023 Год назад +1

      ​@@davidmaldonado9254Managed to solve? I have the same problem

    • @Zielloss
      @Zielloss Год назад +1

      Run VS code as admin.

    • @el-bicente
      @el-bicente 9 месяцев назад

      I think I had the same problem using the cuda installation. If your debugger tells you that it can't find the .srt file when running split_audio script then check your terminal logs. If you have an error like this:
      "ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation."
      Then it means that your GPU does not support FP16 execution.
      To fix it go line 26 in the split_audio script which must be: return 'cuda', "float16" and replace "float16" by "float32" or "int8".

  • @MadFakto
    @MadFakto 26 дней назад

    Which Video Player do you use?

  • @pennyS_
    @pennyS_ Год назад

    Can't find a tut for created dataset usage, can someone link me pls?

  • @wugglie
    @wugglie Год назад +2

    for some reason i keep getting an error where it cannot open the vocals.srt file. did i miss a step? there is no vocals.srt file generated in the output folder for audiosplitter.

    • @battletopia
      @battletopia 11 месяцев назад

      I'm having the same problem. Did you manage to sort this out?

  • @williameneni2923
    @williameneni2923 8 месяцев назад

    Hi, any updates regarding the missing part?

  • @handsomebanana4060
    @handsomebanana4060 Год назад +1

    What if my voice doesn't speak any of the default languages? I have found a phoneme-based ASR model that suits me but how do I use it in your code? Anyway, great tutorital!

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Ah... I haven't dabbled in that area yet and don't know how it works in other non supported languages. I would test it as a command line script first to see if you can get it working that way. I believe the --align_model argument would need to be used

  • @LosantoBeats
    @LosantoBeats 11 месяцев назад

    Can I use talking + singing audio to create my model or should it be split into two separate models. One for singing voice and one for talking voice. I am having trouble finding clean singing audio for my model and considering using talking audio from like interviews etc.

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      You can use both. As long as it's the same voice, it should be fine

  • @supersonicunitedsupersonic8531
    @supersonicunitedsupersonic8531 7 месяцев назад

    I have source track with background noises and of course I can solve that using UVR5 or another voice isolation VSTs, but there are also segments with much voice reverb and when I decrease that reverb it cuts low-mod frequencies from voice, what i shoul do at such situation? maybe i need to find reference with good eq and try to improve target data using eq match?

    • @Jarods_Journey
      @Jarods_Journey  7 месяцев назад

      In this case, you're in a tough spot because if you can't clean the data, it may have some murkiness in the final output. As much as you can, you would wanna get you're audio as clean as possible before training.

  • @grasshoffers
    @grasshoffers 5 месяцев назад

    I do not think I have cuda...just cpu but got the error ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
    ERROR: No matching distribution found for torch

  • @oxanaivanova8007
    @oxanaivanova8007 3 месяца назад

    ModuleNotFoundError: No modulenamed 'yaml' how do I fix it??

  • @miyrrecs3024
    @miyrrecs3024 Год назад

    I got it... but only one segment came out, and it has no transformation.

  • @nazersonic6938
    @nazersonic6938 11 месяцев назад

    Thanks for the helpful video, I have a gtx 1660 ti 6gb vram cuda say i am out of memeory is there a low vram option like in stable diffusion or i am stuck with using cpu?

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      There are some low VRAM options built into whisperx that have to be passed, you would have to modify the script to do that. I'll get around to adding it when I get the chance

  • @gokulkrish3839
    @gokulkrish3839 9 месяцев назад

    Do we need highlevel gpu spec to do above things that you showed in the video

    • @Jarods_Journey
      @Jarods_Journey  9 месяцев назад +1

      Anything that is a Nvidia 3060 12 GB or above should be fine, even 20 series cards work still too. Anything that is not Nvidia often has issues so I don't recommend those.

  • @SK-hj1xh
    @SK-hj1xh 8 месяцев назад

    Hi. How train above 1000 Total training epochs. If I put some bigger number it returns max 1000 epoch.

  • @tetragrammaton3
    @tetragrammaton3 Год назад

    lol, that cliffhanger.

  • @zafkieldarknesAnimation
    @zafkieldarknesAnimation Год назад +1

    Hello please help me erorr
    (Requested float16 compute type, but the target device or backend do not support efficient float16 computation.)

    • @battletopia
      @battletopia 11 месяцев назад

      I am having similar issues, did you ever figure it out?

  • @denblindedjaligator5300
    @denblindedjaligator5300 5 месяцев назад

    Hi, I have a question about rvc. I am trying to train a module where I have chosen no pitch. it sounds autotuner like how can i fix it` how does learning rate work` what is batch size`

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад

      Not too sure about this unfortunately

  • @Malkovitz_
    @Malkovitz_ 11 месяцев назад

    Thanks for tutorial, could you please explain how to replace the whisper model for the one that was trained on my native language?

    • @Malkovitz_
      @Malkovitz_ 11 месяцев назад

      BTW, I already found the model, but it's still a mystery on how to use it with your script

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      Sorry mate, I haven't looked into this area and don't know quite exactly how to do it either. You have to tell whisperx the location of the alignment model your using, but that's as far as I know.

  • @hamdmashhouri410
    @hamdmashhouri410 7 месяцев назад

    please make a tutorial video by AMD Gpu on windows !

  • @TheChipMcDonald
    @TheChipMcDonald Год назад +1

    1) what/how/can I change this to have multiple data directories (if I want to tweak/add on a later retry, and as a way of keeping things organized). I presume I can make a subdirectory like the "vocal" ones for each unique dataset?
    2) can I bypass the audio split step if I've exported my dataset in

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      1. Each file you put in the data folder will be exported to its own segmented folder in the output folder. Once finished here, I recommend moving the finished files to some somewhere else on your PC.
      2. Yes, no need
      3. The exported files (segmented pieces) are coded in by me and organized to export to the folder you chose at the start. Means unlimited freedom if you wanted to modify the code
      4. It sorta is a batch process, what additional feature are you looking for? From the question, I'm assuming you just want to choose an input and an output folder right? Since it makes a folder per file name, I can see this being a bit cumbersome to have to manually move them into one directory, but this is for sorting reasons.
      A 3060 is good as it can utilize CUDA. Imo, 3060 will gives more flexibility due to its 12gb VRAM so this would be the cheaper option to go with compared to like a 3070 or 3060ti

    • @TheChipMcDonald
      @TheChipMcDonald Год назад

      @@Jarods_Journey 1) ok 2) ok 3) ah; following along without actually doing it makes it easy to discount where you started at, ahrgh, sorry 4) by batch, effectively automating starting Visual Studio, getting to the point where training ui begins... or in essence, an actual app ala UVC that does the environment setup, python behind the scenes. I want to copy my dataset over, then jump to a ui to start training.... and ideally the same ui to manage models, inference. Installing python, visual studio etc. are one time things I don't mind - I'm thankful you've done these tutorials, but the steps, steps, steps, steps, steps just to get to starting training seems automatable?
      My interest is in music, singing replacement; and what happens by tweaking the dataset, getting to what I hear in my head. Which I want bad enough to jump through hoops (and buy a new pc I previously didn't need, lol) but.... gahhh... it's like being a kid again, configuring AUTOEXEC.BAT and CONFIG.SYS for hours, only to be burned out by the time you get Wolfenstein to run in SVGA with a hand-me-down SoundBlaster 16 card....

    • @TheChipMcDonald
      @TheChipMcDonald Год назад

      ​@@Jarods_JourneyThanks

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      @@TheChipMcDonald Gotcha! The RVC web-UI is actually pretty close, it's literally just missing the data curation side of things as it comes in a downloadable release too.
      A few more quality of life things later like file browsers instead of paths, etc. and I think we're looking at a very robust and easy to follow workflow. I'll definitely keep the channel updated WHEN someone comes out with something that has all of the puzzle pieces put together. 🙏

  • @lisaree-tn8dm
    @lisaree-tn8dm 10 месяцев назад

    Hey, is it bad if there are low sounds of people slamming doors or making pop-like noise in the back?(They get loud on purpose everytime I sing)
    I can't get rid of those as well as plosives from breathing. But you can still here my voice :/

    • @PeteJohnson1471
      @PeteJohnson1471 10 месяцев назад +2

      make space cakes and give it to them, start recording an hour later. You should be good for a few hours whilst they are all monging on the Sofa ;-)
      I feel for your situation that people around you can't be reasonable with you for ten or so minutes.
      Maybe show them some video's of what you are looking to do, and offer to make them a voice , on the proviso that they just shut up or 10 minutes whilst you do your?
      Good luck

  • @olaitanluvsojewale
    @olaitanluvsojewale Год назад

    Hello, I got a few questions..
    So I have access to 6ch audio with the voice I want to clone, and I'm extracting it all manually using Adobe Audition.
    1. Using UVR helps remove any lingering bg noise but sometimes a little noise will remain. It is not that noticeable, so is it okay to have a little noise or will that affect the model?
    2. I know to remove long silences, but what about the small gaps between when the character is actually speaking, should I remove that too so it is just a continuous stream of talking with not even 0.5 second breaks? And what about the sounds when a character isn't actually speaking, e.g. growls or hums, or breathy sounds like laughing, that naturally have some silence in there.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      My observation is the little bit of noise is ok, it shouldn't be that noticeable. One case though I have of a model is that it does show in the output that I can hear the bg that was not removed. Hard to get it perfect though.
      2. The little gaps are fine, as for growls and what not I'd say to cut those out, but I haven't actually tried so I can't say for certain.

    • @olaitanluvsojewale
      @olaitanluvsojewale Год назад

      @@Jarods_Journey Thank You!

  • @chranman1855
    @chranman1855 11 месяцев назад +2

    I'm getting FileNotFounderror in Visual Code Studios, where it cannot find srt_file. I followed your tutorial step by step, but I'm sure I did something wrong since I dont get the same results when I run the program. Since I have no python experience, I'm not sure what I did wrong here.

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад +1

      Some people have reported that it'll work if you try running vscode in admin mode

    • @chranman1855
      @chranman1855 11 месяцев назад

      @@Jarods_Journey Thank you for responding! I will try that.

    • @colinosoft
      @colinosoft 6 месяцев назад

      Maybe it's too late, but I solved it with "pip install -r requirements-cuda.txt" in my case I have an Nvidia graphics card, if you use cpu then replace it with "requirements-cpu.txt" for some reason there is a missing package that it is not installed when running "setup-cuda.py". Always run the command within the virtual environment created previously with "venv"

  • @martin_taavet
    @martin_taavet Год назад

    hey, will this app work in multiplayer games, or is it client-side only?
    i tried using it in discord, there seems to be no effect.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      It works, you just need to connect via vb audio cable (tut on channel)

  • @DeL2022
    @DeL2022 Год назад

    I'm getting speech cuts in the audio. What padding values should I use to fix this?

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      Start with 0.1 and play around with it there

  • @Englishnamee
    @Englishnamee Год назад +1

    Hii! I've been following your videos and it has been tot's awesome, however I find that since my setup is outside my room I often encounter a lot of background noise that I can't really escape from (family, vehicles passing, fan noises etc). I've been looking around the app but I can't seem to find a solution for implementing noise cancellation to my AI voice, Does anyone know any fix that I can do to fix this? (And no, I can't move my setup :c)

    • @Jarods_Journey
      @Jarods_Journey  Год назад +1

      That's a tough situation, you'll either need a better mic that is less sensitive or try to find other audio processing software that can be done before feeding into the AI voice

  • @denblindedjaligator5300
    @denblindedjaligator5300 11 месяцев назад

    there is 2 new version of RVC and i can now train on my AMD graphic card with out of GPU!

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      This is awesome! Saw that they recently added AMD with directml!

  • @AmanKhan-bj7im
    @AmanKhan-bj7im 11 месяцев назад

    Hi if my model is under trained . Do i have to train a new model with more voice sample .. or can i do something with the current one ?

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      If you've reached a flat spot and your model still sounds bad, you'll need to add more data

  • @caleb8857
    @caleb8857 11 месяцев назад

    When running it like on 13:00 it says that it `failed to align segment ("!!!!!!!!!!"): no character in this segment found in model disctionary, resorting to original...` multiple times and once it was finished the folder had no segmented audio and was just empty. How do I fix this

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад

      I think this is a language issue, if your audio files have multiple languages being used, this causes issue with whisperx, as well as if it's an unsupported language. Further than that, please reference the whisperx GitHub issues page for more details as I'm not sure what else causes this.

  • @RayplayzFN
    @RayplayzFN 4 месяца назад +4

    this is an error i got RuntimeError: Library cublas64_12.dll is not found or cannot be loaded

    • @MrAcapella
      @MrAcapella 4 месяца назад +1

      SAME! :(

    • @Timiny118
      @Timiny118 3 месяца назад

      i had this same error but ended up having the file from a previous installation of alltalk_tts. I'm sure you could find it elsewhere though. I ended up placing it in "audiosplitter_whisper\venv\Lib\site-packages\torch\lib" and everything worked as it did in his video.

  • @xerotivi
    @xerotivi 11 месяцев назад

    Just found out your channel and wanted to ask you if you know any ways to follow these steps on Mac. As a student the only computer I have is my MacBook Air M1. I watched your video where you show how to use RVC on Colab and I want to learn how I can create my own dataset and remove vocals from songs.

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад +1

      You can run this on CPU using setup-cpu, though I haven't tried myself since I don't have MAC. You could technically do all this in Collab as well, but you'll have to set that up yourself

    • @xerotivi
      @xerotivi 11 месяцев назад

      @@Jarods_Journey I will spend some time on it and if I found a way, I will post here for others.

  • @kratoos0.0
    @kratoos0.0 8 месяцев назад +2

    when i run script this is my error no module name 'ymal"

    • @maxikittikat
      @maxikittikat 7 месяцев назад

      i had to manually go through the pain of finding out and basically you make sure your not the the virtual environment to make sure type "deactivate" then all you do for everything isn't installed or is saying the module name isn't found find out online the command to install it then add "--use-pep517" after each command so try "pip install PyYAML --use-pep517" for yaml

  • @chansophal5357
    @chansophal5357 11 месяцев назад

    this AI not allow thailand languages sir?

  • @nobodywakeup
    @nobodywakeup Год назад +2

    dude can this AI voIce use in OBS, or discord? plis tutorial.

    • @tetragrammaton3
      @tetragrammaton3 Год назад +1

      what his other videos, he explains this already

  • @kamariboardley7592
    @kamariboardley7592 8 месяцев назад

    I have a folder that a friend sent for a playboi carti ai , it contains pth file and a index file. where do i go from there? this my first time using. need help. i record into audacity but i dont know how to work the files.

    • @McRobin06
      @McRobin06 6 месяцев назад

      follow a rvc tutorial

  • @user-cm7ip4qr8c
    @user-cm7ip4qr8c Год назад

    In my record, script start using phrases from record instead of SPEAKER_00 and SPEAKER_01, what can cause that problem?

    • @user-cm7ip4qr8c
      @user-cm7ip4qr8c Год назад

      Nvm, it's seems like some phrases don't have speaker, so i just modified script a little bit.

  • @Random_person_07
    @Random_person_07 9 месяцев назад

    Just a question does remove the background voice of another speaker if there is another speaker speaking behind the the target speaker

    • @Jarods_Journey
      @Jarods_Journey  9 месяцев назад

      Unfortunately it does not, overlapping speech and disentanglement is still a research in progress field

    • @Random_person_07
      @Random_person_07 9 месяцев назад

      @@Jarods_Journey One last question what does Speaker diarize do? like cut out each speaker? Nvm you explained it in the video

  • @habibahmad654
    @habibahmad654 Год назад

    can you make tutorial update okada voice changer please

  • @Overneed-Belkan-Witch
    @Overneed-Belkan-Witch Год назад

    Hi Jarods, Im currently on my project of doing Audiobook using cloned voice where I will be the voice
    How good the training will be If I have an i5 and GTX1060 6gb. Is this enough?

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      That GPU might be rough... You might wanna train on Google colab. The training quality should be the same, just training time will be different

    • @Overneed-Belkan-Witch
      @Overneed-Belkan-Witch Год назад

      @@Jarods_Journey Thanks for the tips

  • @olaitanluvsojewale
    @olaitanluvsojewale Год назад

    One more question.. for now.. if that’s okay?
    Say I wanted be excessive to get the cleanest, most accurate, almost perfect result possible on the first train. And I had 1 and 1/2 hours or even 2 hours max audio data. And My PC could probably handle it (For context i a have NVIDIA GeForce rtx 3060 graphics card and 32GB ram) What is the Max amount of epochs do you recommend I could train for?

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Dunno, the big answer is "it depends". Just try training for 10 epochs and hear how it sounds. Tain around other epochs and try those as well. You're looking for the lowest epoch #

    • @olaitanluvsojewale
      @olaitanluvsojewale Год назад

      @@Jarods_Journey Oh okay then 🤔 Thank you a lot! I really appreciate you taking the time to answer

  • @jeremybauchet6845
    @jeremybauchet6845 11 месяцев назад

    Hello ! I've followed closely the tutorial three times, but I keep getting that one error at line 101 : "Exception has occurred: File Not Found Error" It seems to be looking for an srt file ? Also the terminal says "Requested float16 compute type, but the target device or backend do not support efficient float16 computation."

    • @Jarods_Journey
      @Jarods_Journey  11 месяцев назад +1

      That means no srt file was generated by whisperx. Try redownloading with setup-cpu.py as you're gpu probably doesn't support float16. That, or in the code, you can change it to int8 where there is float16. I'll need to work on a fix for this.

    • @jeremybauchet6845
      @jeremybauchet6845 11 месяцев назад

      @@Jarods_Journey Thank you ! I'll try so.

  • @jacobhobson7490
    @jacobhobson7490 Год назад

    so how come i use the voice changer it works initially but then after a little while i go to open it and the python script doesnt open the voice changer. pls help if u can.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Dunno, you might wanna reinstall it so delete the old folder and rerun

  • @junofall
    @junofall Год назад

    How come we have to split the audio data into smaller parts? I just threw a 30 minute audio file at RVC and it handled it no problem.

    • @Jarods_Journey
      @Jarods_Journey  Год назад

      Splitting the audio into smaller parts does two things: allows you to train on the dataset without getting out of memory issues due to VRAM and cleaning out the silences of an audio file.
      What type of file did you feed it, was it a wav file?

    • @fountainbird
      @fountainbird Год назад

      @@Jarods_Journey This was my question as well. I've just always used full length clean audio files. I usually do a few things in audacity before training. I'll truncate silence, convert to mono, and de-noise if needed. im on a 4070 and haven't run out of memory with hour long wav files. As cool as this setup is, I don't think it does much for me personally aside from the uvr process. Am I missing something? Is it just best practice to split up files for use as datasets? Thanks for everything!