How to Clone Most Languages Using Tortoise TTS - AI Voice Cloning

Поделиться
HTML-код
  • Опубликовано: 19 сен 2024
  • Links referenced in the video:
    Git - git-scm.com/
    Python 3.11 - www.python.org...
    Github Repo - github.com/Jar...
    Other Videos mentioned:
    I think I figured how to clone (almost) any language in Tortoise TTS - • I think I figured how ...
    How I Train Tortoise in Other Languages - • How I Train Tortoise i...
    Hardware for my PC:
    Graphics Card - amzn.to/3pcREux
    CPU - amzn.to/43O66Ir
    Cooler - amzn.to/3p98TwX
    RAM - amzn.to/3NBAsIq
    SSD Storage - amzn.to/42NgMFR
    Power Supply (PSU) - amzn.to/430bIhy
    PC Case - amzn.to/447499T
    Mother Board - amzn.to/3CziMXI
    Alternative prebuilds to my PC:
    Corsair Vengeance i7400 - amzn.to/3p64r22
    MSI MPG Velox - amzn.to/42MnJHl
    Cheapest and PC recommended:
    Cyberpower 3060 - amzn.to/3XjtZoP
    Come join The Learning Journey!
    Discord - / discord
    Github - github.com/Jar...
    TikTok - / jarodsjourney
    If you found anything helpful, please consider supporting me and the content I am trying to produce!
    www.buymeacoff...

Комментарии • 145

  • @Jarods_Journey
    @Jarods_Journey  4 месяца назад +6

    Zero-code package is here if you're running into difficulties installing: huggingface.co/Jmica/ai-voice-cloning/blob/main/ai-voice-cloning-3.0.7z
    Make sure you have the latest 7zip and when you unzip it, run start.bat.

    • @IDOLSKPOP68
      @IDOLSKPOP68 4 месяца назад

      I got an error when using it, I created a folder in the voice section but when I refreshed the voice list it didn't appear in the list. And some other errors

    • @macanhhuydn
      @macanhhuydn 4 месяца назад

      Do you get errors?
      Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
      Model was trained with torch 1.10.0+cu102, yours is 2.2.2+cu121. Bad things might happen unless you revert torch to 1.x.

    • @LJames-ez9lr
      @LJames-ez9lr 3 месяца назад

      Guys remember to update to the latest version of 7zip. Mr. fancy pants used some modern compression. V3 works great after I updated 7zip and extracted. Thank you sir!

    • @LJames-ez9lr
      @LJames-ez9lr 3 месяца назад +1

      Does this only work with float16 compute for the training? or can I edit a config file to change the compute type? I have a gtx 1080 float16 doesn't work for me.

    • @Schawum
      @Schawum Месяц назад

      @@LJames-ez9lr gleiches problem bei mir. ich habe auch die 1080 ti und float16 läuft nicht.

  • @احمدصبيح-خ7و
    @احمدصبيح-خ7و 5 месяцев назад +17

    I trained the program on several hours of Arabic audio. Finally, he started speaking Japanese instead of Arabic

    • @tipu-j6e
      @tipu-j6e 4 дня назад

      i have heard that it needs atleast 10k hrs of good quality audio

  • @anikethhebbar6438
    @anikethhebbar6438 5 месяцев назад +10

    00:01 Install the latest version of AI voice cloning repository and set it up with python and git.
    02:21 Install and update CUDA and drivers for smooth operation.
    06:29 Recommend disabling Whisper X alignment
    08:46 Configuring tokenizer and training settings for AI voice cloning
    13:08 Confirming and optimizing model usage
    15:25 Organizing audio files for voice cloning
    19:32 Use low samples and 30-50 iterations for TTS generation, and recompute voice latent if necessary.
    21:42 Tortoise TTS is robust for speech-heavy audio.
    25:55 Saving frequency and continuing training in Tortoise TTS
    28:01 Explanation of training and modifying the state for longer training

  • @SAnsAN091190
    @SAnsAN091190 5 месяцев назад +1

    Jared, thank you for your hard work!
    I haven't tried the current changes yet, but I will definitely try!
    At the moment, I have independently converted the code for training non-English languages. I've been training for Cyrillic for 400+ hours of source audio. In the settings, I selected 35 epochs and Batch Size = 1024 (in order to maximize the use of video memory) and Gradient Accumulation Size = 16. The rest of the settings are similar to yours.
    When generating short phrases (up to 11 seconds) I get silence and strange sounds at the end of the audio.
    As you mentioned at 19:00, I increased the 'Length Penalty' and 'Repetition Penalty'. This relieved me of the silence, but there are still artifacts (strange sounds and repetitions). I think this is also largely the fault of poor audio splitting using Whisper for Cyrillic (many phrases are cut off in mid-word).
    Maybe you can tell me how I can try to fix this? And if you have alternative communication channels where you can communicate with like-minded people, I would be happy to join them =) (I saw a mention of Discord somewhere)

  • @hackpop
    @hackpop 4 месяца назад

    I ran into various issues, but at the end, I finally understood that curl was missing from my system, after curl was properly installed, everything went smoothly, thank you Jarod for your contribution, this project is awesome !!!

    • @LJames-ez9lr
      @LJames-ez9lr 3 месяца назад +1

      @hackpop hi, what kind of issues were you having and what is curl?

  • @bomar920
    @bomar920 5 месяцев назад +1

    Thank you for contributing to the open sourcecommunity . Your channel deserve more subs . Your content are high quality

  • @Artholos
    @Artholos 5 месяцев назад +5

    Yeah baby! Jarod you’re the hero once again! 🎉 Thank you so much for your hard work!

  • @blakusp
    @blakusp 5 месяцев назад +2

    Wonderful tutorial! Exist the possibility to share the Spanish (or all languages) base model you trained so far? For the people (including myself) that cannot have the resources to train from scratch? :( haha, thanks!
    PS: I completely understand if you don't want to share it.

  • @giovannif2567
    @giovannif2567 5 месяцев назад

    You're so talented man! And you make everything look so easy! Happy to be a supporter, and i will continue to be !🚀

  • @alexisgomes1740
    @alexisgomes1740 5 месяцев назад +3

    Hello, I have cuda toolkit 12.4, windows 11, git and python installed. When running the set-up cuda bat I am getting an error while extracting rvc.zip. (error opening archive : failed to open 'rvc.zip') (ERROR: Could not open requirements file: Errno 2 No such file or directory. Then pannel shutt down. What can I do ?

  • @gmfPimp
    @gmfPimp 4 месяца назад

    Thanks for your effort. FYI, you are not using MP3s, your file extension is MP4. MP3s have better audio quality than mp4.

  • @szymonnawrocki890
    @szymonnawrocki890 3 месяца назад

    Really great videos and content. Thanks to you I'm getting into voice modeling myself

  • @heyyanito
    @heyyanito 4 месяца назад +1

    Hi Jarod thanks so much for the release and for walking through this process. It's wonderful. Do you have API examples which include the RVC pipeline? I'm not sure the ones listed in gradio on the most recent release include the flags for adding the RVC inference to the request, although I could just be misunderstanding as programming is not something I am very good at :)

  • @zenkidpress2271
    @zenkidpress2271 5 месяцев назад +2

    Hello Jerods, it would be nice if you trained the voices in other languages for the community and then shared everything (even charging a sum because you obviously spent time training the other languages), I would gladly pay 🙂

    • @tempertephra
      @tempertephra 4 месяца назад

      agree may be good to share other lan files in the community. please initiate.

    • @ph0enixph0enix65
      @ph0enixph0enix65 3 месяца назад

      In case you're willing to do so, I would need a female german voice model. I would also gladly pay for it.

  • @LJames-ez9lr
    @LJames-ez9lr 3 месяца назад +2

    i got this error when i did the test generate
    Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 2 in the list.
    even after I clicked recompute voice latents still same error

    • @Vxk_yt
      @Vxk_yt 3 месяца назад

      did you find a fix? i have the same issue. Tried checking on github... there are a few ppl with the same issue

    • @manfredice391
      @manfredice391 2 месяца назад

      Try set "Candidates" to 1

  • @sirjared21
    @sirjared21 20 дней назад

    Have run into issues using this. For one voice, I keep getting a 'CUDA out of memory' error where it tries to set aside something like a terabyte of RAM, lol - it didn't happen before using the same voice and settings, but just randomly happened. For all voices, if you use the en_tokenizer when generating after training, the outputted voice sounds utterly insane/incomprehensible. Switching back to '/modules/tortoise-tts/tortoise/data/tokenizer.json' fixes it. There really needs to be a guide for all the sliders and what they mean. There needs to be some kind of "recommended" settings for training epochs, etc. I've done 80 for a voice and it took like 8 hours. For another voice it took only 2, so I guess it depends on how big of a data set you have. This is my first foray into ai voice cloning - doing it for a mod project, and so far it's been frustrating. After all's said and done, I've yet to create a realistic copy of a voice.

  • @dani0001
    @dani0001 Месяц назад

    I downloaded the Huggingface version and started training a Hungarian language voice model with it. However, for some reason, I can't reach the stage of text generation. (RE)Compute Voice Latents runs indefinitely, then I get a CUDA Out Of Memory error. Additionally, it also tries to generate the text forever. I am using everything according to the settings shown in the video. What could be the problem? Is my NVIDIA GeForce GTX 1660 6GB video card and the 32GB RAM in my computer insufficient for this? Thank you very much in advance for your response!

  • @dthSinthoras
    @dthSinthoras 5 месяцев назад +1

    While "Transcribe and Process" I get
    UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: or set allow_custom_value=True.
    warnings.warn(
    Incorrect BOM value
    Error reading comment frame, skipped
    Incorrect BOM value
    Error reading comment frame, skipped
    Incorrect BOM value
    Error reading comment frame, skipped
    Incorrect BOM value
    Error reading comment frame, skipped
    Incorrect BOM value
    Error reading comment frame, skipped
    Incorrect BOM value
    ...
    What does it mean? Its running anyways, but I have put in hundrets of hours of audio, so would be great to know, if I should aboard the run...

  • @augustinolarian
    @augustinolarian 4 месяца назад +1

    Hi. is there any way to import models already trained?
    Is there are way we can download already trained voices?
    I am unable to clone voices in Romania. I get an undescriptible audio every time.

  • @احمدصبيح-خ7و
    @احمدصبيح-خ7و 5 месяцев назад

    Thank you for your wonderful explanation, which many people learn from. I want to tell you that I have been using text-to-speech programs for a long time, but they are weak for the Arabic language. Finally, I applied the explanation in this video, and at the end a message appears stating that the CUDA memory is full. Perhaps the reason is due to my lack of sufficient knowledge in applying this explanation. I hope to apply this explanation to an audio sample of the Arabic language so that I can apply it and explain the numbers entered and divide them by two because I did not understand their exact meaning.

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад +1

      CUDA memory being full means you're GPU VRAM is too small. Recommend that you start at batch size = 1 and gradient accumlation = 1. Then if training start with these settings, you can restart (close the browser window) increase batch size by 1, save configuration, and keep doing this until you run out of memory again. With this, you'll know what the smallest batch size you can use is.

  • @jeyraxel
    @jeyraxel Месяц назад

    When I train a voice, terminal always says "\ai-voice-cloning-3.0>pause" at 98,7% and it doesn't move anymore, it happens in every try. Any solution?

  • @rubenrodenascebrian3855
    @rubenrodenascebrian3855 3 месяца назад +1

    Great video and great repository, thank you very much for your work. I AM HAVING A PROBLEM... I train the model in Spanish, set "ES" for Whisper to recognize the Spanish language, but when I finish the training, it speaks with an English accent, but totally English. Why is this happening? Thank you very much!!!

  • @alexlazareibanez1047
    @alexlazareibanez1047 3 месяца назад +1

    Hello, it was to know if tortoise v3 supports Spanish language and accent? It is that when I do the voice training I put it in ES language instead of EN but it sounds with English accent the Spanish. I am working with RVC Retrivial xtts2 but I heard that tortoise is better. Thank you.

  • @miyrrecs3024
    @miyrrecs3024 2 месяца назад

    I did all the steps according to the video for the Spanish language with fluent input, but what I get is a messy voice like 'dysphasia'.

  • @francsharma7276
    @francsharma7276 2 месяца назад

    I tried hindi 2hr of voice with 300 epoch, and it say 4 days after 6hr it just stop loading, my graphic card is 3070ti

  • @VASTimages
    @VASTimages Месяц назад

    this is a very useful tool, thank you
    is there a way to speed up training? maybe increase settings so you can use less dataset audiofiles?

  • @DM-dy6vn
    @DM-dy6vn 5 месяцев назад

    14:05 The same tokenizer which was used during training has to be selected as well.

  • @chiyanchandru5914
    @chiyanchandru5914 27 дней назад

    how can i run i already have an transcription data with audio

  • @Physengineer
    @Physengineer 3 месяца назад +1

    Question: I am trying to use the audiobook 3.0 program, which requires a rvc voice, but I can't see how using voice cloning 3.0 software to make an rvc voice and index file required by the audiobook program.

    • @Physengineer
      @Physengineer 3 месяца назад

      Well, I guess it can't. But I installed applio. Once I figured out how to use it, it worked great for making voices. I am using the first voice together with the voice cloning and audiobook 3.0 app. They work great, I could not be happier.

  • @AlexisGomes-n4r
    @AlexisGomes-n4r 5 месяцев назад +2

    When running start.bat i am getting the error : no module named 'psutil'

    • @oosixcosoo-yt5548
      @oosixcosoo-yt5548 5 месяцев назад

      I think it's because you don't have Python 3.11 installed. If you want it to run normally, just reinstall python at 3.11 version and rewatch the video

  • @myang12003
    @myang12003 3 месяца назад

    11:19 I can't validate training configuration because it just gives me an error saying "empty data set"

  • @StefanHackbarth-xz7ee
    @StefanHackbarth-xz7ee 3 месяца назад

    Help please... when i'm running Training i get these error message: 'utf-8' codec can't decode byte 0x81 in position 2: invalid start byte

  • @CyberPhonkMusic
    @CyberPhonkMusic 5 месяцев назад

    How much GPU do you need to train a new language like Brazilian Portuguese? Do you need 25 hours of audio? Does it always have to be the same person speaking?

  • @pupattolino75
    @pupattolino75 5 месяцев назад

    I followed the installation but I received this error: from rvc_pipe.rvc_infer import rvc_convert
    ModuleNotFoundError: No module named 'rvc_pipe'

  • @farsi_vibes_edit
    @farsi_vibes_edit 5 месяцев назад +1

    please help i get this error G:\tortiois\ai-voice-cloning>call .\venv\Scripts\activate.bat
    Traceback (most recent call last):
    File "G:\tortiois\ai-voice-cloning\src\main.py", line 23, in
    from utils import *
    File "G:\tortiois\ai-voice-cloning\src\utils.py", line 41, in
    from tortoise.api import TextToSpeech as TorToise_TTS, MODELS, get_model_path, pad_or_truncate
    ModuleNotFoundError: No module named 'tortoise'

    • @farsi_vibes_edit
      @farsi_vibes_edit 5 месяцев назад

      i get this error when i click on the start.bat

  • @Vaultcitizen
    @Vaultcitizen 5 месяцев назад

    I installed 2 versions in different folders. How should I uninstall the older one? Simply delete the folder, or there is a better way (through cmd) ?
    Thanks for your work and making it easy to test :)

  • @francsharma7276
    @francsharma7276 2 месяца назад

    can we train model in parts, if yes please make a video, "for 10 epoch and 2hr of audio" it tool 4 hours

  • @francisgoeltner5569
    @francisgoeltner5569 2 месяца назад

    Hello Jarod!
    First of all: Awesome video and a great channel you have there. Really helpful stuff!
    I experienced a bit of a problem though with the training continuation process as you described it. I had exactly the outlined problem with a crashed console and tried to resume from the .state file of the previous run. Configuration import and setting of the old state as resume state path worked nicely, but when I try to run the training I get this message:
    PermissionError: [WinError 5] Access is denied: './training\\Voice2Train\\finetune' -> './training\\Voice2Train\\finetune_archived_240624-081331'
    The file named indeed does not exist.
    Did I miss something here?
    Any help would be greatly appreciated!

  • @AlexisGomes-n4r
    @AlexisGomes-n4r 5 месяцев назад +1

    I dont have rvc.zip while downloading

  • @BorygoTomka
    @BorygoTomka 5 месяцев назад +1

    Hey I have a little trouble
    On step in 7:12 when I clicked Transcribe and Process i got a error:
    ValueError: Requested float16 compute type but the target device or backend do not support efficient float16 computation.
    What I need to do to make it work?

    • @BorygoTomka
      @BorygoTomka 5 месяцев назад

      No such file or directory: 'training\\Ja\\processed\
      un\\dataset\\wav_splits\\file___2\\file___2.srt'

    • @BorygoTomka
      @BorygoTomka 5 месяцев назад

      I found the problem but still can't fix it. When running the CUDA setup program, an error pops up
      conflicts.hydra-core 1.3.2 requires antlr4-python3-runtime==4.9.*, but you have
      antlr4-python3-runtime 4.8 which is incompatible.hydra-core 1.3.2 requires
      omegaconf=2.2, but you have omegaconf 2.1.0 which is incompatible
      What to do?

  • @farsi_vibes_edit
    @farsi_vibes_edit 5 месяцев назад

    tnx i really nThank you. I really needed this software and your training. I am installing it. I hope I won't have any problems.

  • @AIUnveil
    @AIUnveil 5 месяцев назад

    Awesome Bro! was checking your channel almost everyday for this video. You are pretty much the only one doing this stuff. Great work.❤❤

  • @emmanueltoussaint2466
    @emmanueltoussaint2466 5 месяцев назад

    Thank you so much for that one. But I keep getting that error: RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
    And I got this warning just before.
    !!!! WARNING !!!! No GPU available in PyTorch. You may need to reinstall PyTorch.
    Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band)
    No hardware acceleration is available, falling back to CPU...
    What can I do to solve this please?

  • @StringerBell
    @StringerBell 5 месяцев назад +5

    Followed every single step and ended with jibberish mess in Bulgarian language. Trained on 3 hours of studio quality voiceovers for 500 epoch (save every 20 epoch)

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад +6

      Issue with training another language is the model needs to generalize and 3 hours isn't sufficient enough for that. I'd say you wanna start with at least 25 hours of ANY language data in bulgarian, for this, you could probably scrape from audiobooks. Even 25 hours may produce a rough model, so the more the better.
      After this, we'll call this a bulgarian base model.
      Now that you have a base model, you can then "finetune" it for a voice style you want. Though, it's a lot to talk about in a comment so I'll think about making a followup video

    • @StringerBell
      @StringerBell 5 месяцев назад +1

      @@Jarods_Journey Is there a way to train a language and then change the style to super EXCITED and over the top performance? Also can I use random voices? Male and female for the training?

    • @StringerBell
      @StringerBell 5 месяцев назад +3

      @@Jarods_Journey How to finetune a base model is a super interesting topic. Adding emotion or style to it. Please do make a video when you can, it will be imennsly helpful followup to this tutorial!

    • @StringerBell
      @StringerBell 5 месяцев назад

      I just trained for 29 hours on my RTX 4090 on 48 hours of studio quality audiobooks in Bulgarian. Let's say the result is underwhelming.

    • @satyajitroutray282
      @satyajitroutray282 Месяц назад

      @@StringerBell Did you successfully trained your model in Bulgarian language with that amount of data?

  • @himelhs
    @himelhs 3 месяца назад

    My laptop is intel can i didn't use it ?

  • @Bichos28-bg4nm
    @Bichos28-bg4nm 3 месяца назад

    Hi, a French speaking person, I would like to hear a Portuguese text spoken. Is this possible with ai-voice-cloning 3.0?

  • @ToukoWhite
    @ToukoWhite 2 месяца назад

    After I click train I get this error
    " RuntimeError: CUDA error: device-side assert triggered
    [Training] [2024-06-23T23:29:29.443537] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    [Training] [2024-06-23T23:29:29.443537] For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    [Training] [2024-06-23T23:29:29.443537] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions." Anyway to fix it?

  • @stevecato
    @stevecato 5 месяцев назад +1

    Can rvc be cloned from the same dataset? If using it, how much effort should go into training tortoise vs rvc? Thanks.

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад +1

      Yep, it should be generally. But rvc only requires 10-60 minutes of audio.
      Can't really determine, but rvc is generally easier to get it matching. Tortoise is more important though to get the style of how a character speaks, etc

  • @nottobemessed4628
    @nottobemessed4628 4 месяца назад

    you have set the large-v3 model as default how do we change that model to lower one , like medium or small ?

  • @iQOmni
    @iQOmni 4 месяца назад

    You are amazing thanks for all that you do

  • @frh1700
    @frh1700 2 месяца назад

    when i try to train the model i have this error Missing dataset: ./training/test//whisper.json

  • @mosambielal6700
    @mosambielal6700 2 месяца назад

    Can you please guide me on how did you added emotions tab? And how can we add other emotions here?

  • @RA-ss5fe
    @RA-ss5fe 4 месяца назад

    1. will it work for Urdu/Hindi language?
    2. will it work with any type of nvidia gpu? i mean with low end gpu ?
    3. how much space in harddrive does it requires?

  • @swedishcat7448
    @swedishcat7448 5 месяцев назад +1

    Awesome tutorial and fantastic job on making this, for people to use. I do have a weird bug or something I need some assistance with, if you may. I'm training a model with English speech and have set the language to en. But when I generate a prompt, the voice is in Japanese (or something like that). I don't quite understand where that come from. Is there any other settings I can change? Thanks.

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад

      I also don't know where that's coming from. Make sure the tokenizer is English

    • @swedishcat7448
      @swedishcat7448 5 месяцев назад

      @@Jarods_Journey Thanks for answering, Yeah, the tokenizer is the "en_tokenizer" as in the video but I still get that. Is there any other fix? Can I provide any logs or something for you to see anything? I really like the turtoise TTS

    • @bigadz87
      @bigadz87 5 месяцев назад

      @@Jarods_Journey I am getting the same issue, followed your video exactly

    • @JanPeter56
      @JanPeter56 4 месяца назад +1

      Same here lmao, had it training on 12 minutes of clean af audio of Christopher Lee talking, the first time all i could get was whale noises, and the 2nd time training it sounded like a japanese whale
      Edit: It must be the tokenizer i think. when i select the default "./modules/tortoise-tts/tortoise/data/tokenizer.json" instead of "./models/tokenizers/en_tokenizer.json"
      The model suddenly created clear english audio

    • @bwheldale
      @bwheldale 3 месяца назад

      @@JanPeter56I tried this and mine went from gibberish to English which was hopeful, but the accent was too 'English' instead of Australian. An audio language detection site said it was 65% English though it sounded more like German.

  • @edgarl.mardal8256
    @edgarl.mardal8256 3 месяца назад

    Hi, are you pinoy? I was wondering if I could ask for help to create a AI Cold sale Agent with norwegian LLM and train a TTS to talk fluent Nowergian?

  • @mydreams3437
    @mydreams3437 5 месяцев назад

    how much time duriation of input voice file..can i take mp3 file

  • @threepe0
    @threepe0 3 месяца назад

    stuck on "No module named 'vc_infer_pipeline'"

    • @CINECOMBO
      @CINECOMBO 2 месяца назад

      same problem

  • @WonderWhat1000
    @WonderWhat1000 5 месяцев назад

    Hello Jarod, I have a question about cloning English voices in AI voice cloning. So there is a voice I want to clone with an hour of data , so how many epoch are needed to clone a good voice. can you please elaborate this part. Thank you

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад +2

      If you're doing english, for an hour, I'd train to about 50 epochs first and see how that sounds. You can always train longer if you determine the need to

  • @kernsanders3973
    @kernsanders3973 5 месяцев назад +1

    Thank you!

  • @Vlad-hm7cj
    @Vlad-hm7cj 5 месяцев назад

    Does this work on linux? My windows machine has an AMD instead of an Nvidia... T-T

  • @SyamsQbattar
    @SyamsQbattar 26 дней назад

    Is it support Indonesia languange?

  • @Djamel__LD
    @Djamel__LD 5 месяцев назад

    can i use it with Intel Iris(R) Xe 16 GB GPU ?

  • @fdgfdgdfgdfgfdgdf
    @fdgfdgdfgdfgfdgdf 5 месяцев назад

    when i push train: ModuleNotFoundError: No module named 'axial_positional_embedding'

  • @IDOLSKPOP68
    @IDOLSKPOP68 4 месяца назад

    Is there any way to install on linux?

  • @MuratAtasoy
    @MuratAtasoy 16 дней назад

    I train for Turkish language with 5 min voice record, results are nonesence :) like a new language lol. This is only for english?

    • @MuratAtasoy
      @MuratAtasoy 16 дней назад

      a day's work gone trash

  • @stickmanland
    @stickmanland 4 месяца назад

    How much time would it take to train?

  • @craigcarter1572
    @craigcarter1572 3 месяца назад

    setup-cuda says Python 3.11 not installed, however when I run >python --version it sees the version 3.11

    • @Vxk_yt
      @Vxk_yt 3 месяца назад

      try uninstall and reinstall, if you have other versions uninstall all of them, also click on add to path when installing

    • @craigcarter1572
      @craigcarter1572 3 месяца назад +1

      @@Vxk_yt I found the issue, when installing python, even though I checked the box for path setup, win 11 did not update environmental settings. I added paths manually and it fixed everything. Reminds me of the old MS DOS days. Thanks much for your reply.

  • @ohyesucan
    @ohyesucan 4 месяца назад

    how about cantonese languages ?

  • @ywueeee
    @ywueeee 4 месяца назад

    can this run on mac?

  • @RexVergstrong
    @RexVergstrong Месяц назад

    I'm getting this error when I validate the training config.
    [Errno 2] No such file or directory: './training//train.txt'

    • @RexVergstrong
      @RexVergstrong Месяц назад

      There's a train.json in that folder but in my model folder there was a train.txt. I copied into the training folder directly and it seemed to work for now.

  • @gorizon9802
    @gorizon9802 5 месяцев назад

    Does this already support Brazilian Portuguese?

  • @adamrastrand9409
    @adamrastrand9409 5 месяцев назад

    ⁠ but why doesn’t the voice sound like me when I trained it? I only trained for two minutes of data but why is fine-tuned voices good then if you just use the auto regressive modern with a voice sample and when you trained a new language with many hours of data, how do I find tune it them so when I trained the new language does that count as a voice or as a new language I don’t really get it

  • @farsi_vibes_edit
    @farsi_vibes_edit 5 месяцев назад

    ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    unknown package:
    Expected sha256 f9ef0a648310435511e76905f9b89612e45ef2c8b023bee294f5e6f7e73a3e7c
    Got 887e84fc28f6772ed033ff6d269a01179021bf974277d1b1859c9654541781ba

    • @farsi_vibes_edit
      @farsi_vibes_edit 5 месяцев назад

      i get this error but downloading is still continue is this ok?

  • @MaorStudio
    @MaorStudio 5 месяцев назад

    You are awesome.

  • @deadwarrior9866
    @deadwarrior9866 3 месяца назад +1

    doesnt work

  • @MadeEasyTube
    @MadeEasyTube 5 месяцев назад

    Thank you

  • @tylerchambliss8379
    @tylerchambliss8379 5 месяцев назад

    I don't understand why your models aren't skipping. I still can't make my books bro. What are you doing? How are you making these models not skip and glitch?

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад

      My models do have some skipping, but it's not every generation. Unfortunately, the only thing I can say is that my datasets are generally clean, and even with mass my mass transcribed datasets for other languages, those are dirty datasets.
      I'm not doing anything particularly special in my models.

  • @WorldYuteChronicles
    @WorldYuteChronicles 5 месяцев назад

    big up!

  • @aachannel2843
    @aachannel2843 2 дня назад

    Can Arab voices be reproduced?

  • @ללמד_טבעי
    @ללמד_טבעי 3 месяца назад

    It's so complicated and the result doesn't sound good either, in short it's a waste of time. We would be happy for a short, simple way with results that sound humane
    זה כל כך מסובך וגם התוצאה לא נשמעת טוב בקיצור זה בזבוז זמן. נשמח לדרך קצרה פשוטה ועם תוצאות שנשמעות אנושיות

  • @adamrastrand9409
    @adamrastrand9409 5 месяцев назад

    Hello so after I’ve trained my tortoise model on my voice, I made a short data set of five minutes with 200 epocs. I think when I select the autoreggressive model from my voice and select none as voice type, it doesn’t sound like me at all or no, it says that none is not classed as an argument, however, when I select random or my voice from the voice folder with the auto aggressive model, it sounds like the latents are computed for another voice with my timbre and such but how do I fix it so it’s completely my own voice should I delete everything from the folder. Should I delete the computer latents file and just keep the audio files or should I delete everything or should I keep the shortest audio file, and when I prepare the tokennicer for another language, is it necessary to have like 70 hours of audiobooks? Also, I wonder do I need a large amount of audio data to prepare the token Iser or is it for training another language and I also have another question so say that you trained a new language with many audiobooks let’s say 50 hours, how do I train a new voice for example a new Spanish voice do I use the previous data set or how do I use the new audio with the new token nicer or new data set I don’t really know

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад +1

      You've got a lot of good questions on training afterwards, I won't be able to respond to it all in this comment. In general, after training a language, you can run "finetunes" of that language to get specific voices.
      As for you initial question, I'm not entirely sure on what is happening either. Sometimes, the voice won't sound like you. This is where I have RVC come into play as it helps rematch whatever voice you want to get close to.

    • @adamrastrand9409
      @adamrastrand9409 5 месяцев назад +1

      @@Jarods_Journey but why doesn’t the voice sound like me when I trained it? I only trained for two minutes of data but why is fine-tuned voices good then if you just use the auto regressive modern with a voice sample and when you trained a new language with many hours of data, how do I find tune it them so when I trained the new language does that count as a voice or as a new language I don’t really get it and how does it sound like when I have two little data for a new language say Norwegian Swedish or any other language will it sound like that language and how many training data do I need to prepare the token nicer?

  • @v3ucn
    @v3ucn 4 месяца назад

    support Chinese?

  • @Airbender131090
    @Airbender131090 5 месяцев назад

    Does it work with Russian?

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад

      It should work, after having trained 4 languages so far to a good degree of accuracy, I don't see why not. Just make sure you have enough data and it should be fine to run :)

    • @Test-ep7gg
      @Test-ep7gg 2 месяца назад

      And will Bulgarian work? There are good results on sites, I prefer to use the resources of such projects, but no matter how many projects I install, there is still no Bulgarian language.
      This one is currently giving me an error and I don't know if I should bother doing it.
      ImportError: cannot import name 'RootModel' from 'pydantic'

    • @yasenkey3779
      @yasenkey3779 Месяц назад

      @@Test-ep7gg did it work for bulgarian

  • @lazar4426
    @lazar4426 3 месяца назад

    8:20

  • @kushalvirulkar
    @kushalvirulkar 5 месяцев назад

    please clone hindi language/

  • @stickmanland
    @stickmanland 4 месяца назад

    21:12

  • @stepantrekhleb3271
    @stepantrekhleb3271 3 месяца назад +1

    this shit does not work at all

  • @peterimade003
    @peterimade003 3 месяца назад

    Do you have a discord channel? It'll be nice to have a community research together on this tool.

  • @peterimade003
    @peterimade003 3 месяца назад

    How do one get good trained models

  • @ЗлодейПо
    @ЗлодейПо 5 месяцев назад

    I tried to install the necessary ones, but then the new ones were not compatible with something else, maybe I was doing something wrong?, I just cloned git, and then launched setup-cuda
    In general, I get these errors from the console
    DEPRECATION: omegaconf 2.1.0 has a non-standard dependency specifier PyYAML>=5.1.*. in pip 24.1, this behavior change will be enforced. A possible replacement is to upgrade to a newer version of omegaconf or contact the author with a proposal to release a version with the appropriate dependency specifiers.

    ERROR: The pip dependency recognition program currently does not take into account all installed packages. This behavior is the source of the following dependency conflicts.
    onnxruntime 1.17.1 requires numpy>=1.24.2, but you have numpy 1.23.5, which is incompatible.
    onnxruntime-gpu 1.17.1 requires numpy>=1.24.2, but you have numpy 1.23.5, which is incompatible.
    torchcrepe 0.0.20 requires librosa==0.9.1, but you have librosa 0.8.1, which is incompatible.
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    gradio 4.22.0 requires pydantic>=2.0, but you have pydantic 1.10.15 which is incompatible.
    ModuleNotFoundError: No module named 'fairseq'
    UPD: I reinstalled everything here 100 times in general, as I understand it, the main problem is omegaconf, its versions do not match what other programs need, when installing fairseq, omegaconf version 2.0.6 is installed, but then an error appears that (pyannote-audio 3.1.1 requires omegaconf=2.1, but you have omegaconf 2.0.6 which is incompatible.). If you install omegaconf 2.1 then the error (fairseq 0.12.2 requires omegaconf

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад +1

      The dependency conflicts get resolved by reinstalling the requirements.txt file at the end of the installations, though, the biggest concern for me is the Module Not Found one. The fairseq installation is a wheels file that I uploaded to huggingface, that's where the installation resides. It's possible your device is failing to download it from hugging face, hence, why the script isn't installing it.

    • @ЗлодейПо
      @ЗлодейПо 5 месяцев назад

      ​@@Jarods_Journey It seems that this is exactly the problem, you're right, it's a pity that I can't fix it, because I don't even roughly understand what to do, the latest version of the turtle worked perfectly
      ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\fairseq-0.12.4-cp311-cp311-win_amd64.whl'
      ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\deepspeed-0.14.0-cp311-cp311-win_amd64.whl'
      ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\pyfastmp3decoder-0.0.1-cp311-cp311-win_amd64.whl'
      I rearranged python because I forgot to add it to the path and opened the console (setup-cuda) with administrator rights, it didn't help, it's a pity, the last option I have left is to look at the problem on hugging face, but I doubt that there will be something worthwhile there, sorry for all these errors, then you are doing really amazing things, thank you for that)

  • @dthSinthoras
    @dthSinthoras 5 месяцев назад

    How serious should this warning be taken I get while training?
    UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
    torchaudio.set_audio_backend("soundfile")
    Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\KI-Stuff\__Sound\TorToiseAnyLanguage\models\torch\whisperx-vad-segmentation.bin`
    Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
    Model was trained with torch 1.10.0+cu102, yours is 2.2.2+cu121. Bad things might happen unless you revert torch to 1.x.

    • @Jarods_Journey
      @Jarods_Journey  5 месяцев назад +1

      Not an issue, should be good to go

  • @AlexisGomes-n4r
    @AlexisGomes-n4r 5 месяцев назад +2

    Hello, I have cuda toolkit 12.4, windows 11, git and python installed. When running the set-up cuda bat I am getting an error while extracting rvc.zip. (error opening archive : failed to open 'rvc.zip') (ERROR: Could not open requirements file: Errno 2 No such file or directory. Then pannel shutt down. What should I do ?