FREE Voice Cloning in Microsoft Windows with Coqui TTS

Поделиться
HTML-код
  • Опубликовано: 21 ноя 2024

Комментарии • 257

  • @guilherme1556
    @guilherme1556 Год назад +15

    That's great you brought this tutorial for the windows community. I personally use linux to train my models, but it's awesome you are making an effort to make the windows open voice community stronger.

    • @ThorstenMueller
      @ThorstenMueller  Год назад +4

      Yes, personally i use Linux for training, too. But model training on Windows has been requested quite often.

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      ​@@user-wc2jy4jr7r Not sure if i got you right. Do you mean "SAPI" in context of Windows integrated TTS voices?

  • @luke_foxy5170
    @luke_foxy5170 22 дня назад +1

    Danke für's video. Es funkioniert endlich! Richtiger ehrenmann 😀

  • @Vito_0912
    @Vito_0912 Год назад +5

    Thank you for this tutorial and your entire audio series. I once started with Turtoise, which was too slow for me. Then I found coqui and your public voice model, which is also really good and understandable and with the factor 0.41 is also super fast for me. For my use case, however, still too funny pronunciations of proper names. Through this video I could finally create my own voice model that is completely adapted to the requirements of telling stories.
    It still sounds a bit shaky here and there and has just 100k steps (with increasing audio material), but is already on the way to improvement.
    Due to recording conditions and my unfortunately not so great narrator voice. I even come to a loss of 26-36%. So here can still be properly readjusted.
    For all who are interested in the Sats, if they also want to do something like that:
    Specs: RTX 2070, I7-10900k, Samsung Evo 970
    Steptime: 0.5-0.6
    Batchsize (you can go higher): 20
    Checkpoint_steps: 1000 (just because i am lazy and train it in the middle of some idle periods, school work etc., so i don't have to wait for 10000)
    Audio dataset:
    Specs: HyperX idk (the rgb one) with pop filter, relative big room
    Here I can't make a statement like this and if you start with the "Total" you will get faster results. I trained in steps with increasing audio files:
    0-5k: about 230 files ~ 0.4h
    5-10k: about 350 files ~ 0.6h
    10-30k: about 500 files ~ 1h
    30-60k: about 800 files ~ 1.6h
    60k-100k: about 1200 files ~ 2h
    Current total: 1200 ~ 2h
    Milestones:
    from 10k: First beginnings to understand not only noise but (not understandable)
    from 20k: First word recognizable without knowing text
    from 30-40k: Understandable text (but not nearly speech)
    from 80k: It's okay :)
    * Please note, however, that I used as input books and book excerpts with many proper names and denglish (German with some English words in books). This makes the training process slower in any case and generally worse (but in the trained areas, proper names, very good).
    Recording:
    For the recordings I wrote a Python script that automatically splits the text of a text file into sentences (ignoring sentences below 5 words) and outputs them. Then the recording was automatically started and stopped as soon as one second the sound was below 50DB. Then this audio was trimmed so that front and back everything is dropped (below 50DB to garantee a instant speech) and filled with 50ms silence. Then nomalized and saved in ljspeech format.
    Delete function included

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for sharing your great setup and training step times 👏😊. This will help other users for sure. I agree that pronouncing foreign words is still a challenge.

    • @josebo8780
      @josebo8780 2 месяца назад

      How many iterations did you get with this setup? I am only getting ~80 iterations per HOUR with a RTX 3090, AMD Ryzen 9 5950X 16-Core , and 64gb RAM 3200Mhz so I think something is wrong with my installation or training setup.
      I am using a batch_size=64

  • @davidtindell950
    @davidtindell950 2 месяца назад +2

    Since most of my friends and clients use MS Win 10 or 11, I must support Windows ! A new vid on MacOS would also be great !

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад +1

      As Coqui shut down at the beginning of 2024 i am not sure if someone will adjust the code for newer operating systems.

  • @anthonyschilling7132
    @anthonyschilling7132 Год назад +4

    I spent ages trying to get this to work and finally ended up installing wsl which made the setup work. You should make a video on how to create your own dataset for training!
    Liebe Grüße aus den USA!

    • @ThorstenMueller
      @ThorstenMueller  Год назад +2

      So now you have another way to train a TTS model in addition to wsl. Hope you enjoyed this video 😊.
      I've created a tutorial on recording and creating a voice dataset here:
      ruclips.net/video/4YT8WZT_x48/видео.html

    • @anthonyschilling7132
      @anthonyschilling7132 Год назад

      @@ThorstenMueller Ah very cool, I'll have to give that a shot. I've been using openAi's Whisper to transcribe audio I downloaded from youtube videos and podcasts and it's getting close. But I think I need to do a better job cleaning up and organizing the audio I download. Any suggestions for how how large the dataset should be when using vits? I've been using about 1-3 hours of clips and it's starting to sound ok...but I'm guessing I just need more and cleaner data. Thanks again!

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      @@anthonyschilling7132 My voice datasets are way longer - at least 10k recordings, meaning > 10 hours of pure audio. But more important might be a good phoneme coverage.

  • @VitiliKo
    @VitiliKo 23 дня назад

    Sehr gutes Video. Hätte ich gewusst, dass du hier die Installation auf Windows vornimmst, hätte ich mir 2 Tage arbeit gespart :D

    • @ThorstenMueller
      @ThorstenMueller  21 день назад +1

      Freut mich sehr, dass dir das Video gefallen hat und ich hoffe, dir fehlen die 2 verlorenen Tage nicht zu sehr 😉.

    • @VitiliKo
      @VitiliKo 20 дней назад

      @ nee habe sehr viel dabei gelernt. Bin aber schlussendlich zu Ubuntu gewechselt da es unter Windows nicht so gut funktioniert:/

  • @connordissident6881
    @connordissident6881 Год назад +1

    Thanks for listening to us and making this video!

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      You're welcome. I'm always happy for feedback and suggestions from my community and try to make right content for you 😊.

  • @christopherwoods3339
    @christopherwoods3339 Год назад +2

    Thank you very much for your videos. I almst never subscribe but I was so thankful for these that I've been liking every one and I did subscribe. :)

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Wow, that's probably one of the best feedback i received for my work on these videos 🤩.

  • @hangtime79
    @hangtime79 Год назад

    Came here looking some information on Coqui as I'm looking to do a voice clone for voice over work. Fantastic job.

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      Great feedback like yours always keeps me motivated - thank you 😊.

  • @scndsky
    @scndsky Год назад

    Great help for figuring out all these little details you just have to know somehow. Tnx!

  • @john_blues
    @john_blues Год назад

    Yay! I've been waiting on this one. Thank you so much.

  • @toykotokyoto
    @toykotokyoto Год назад +3

    nice! giving Windows some love :D

  • @davidtindell950
    @davidtindell950 2 месяца назад +1

    Thank You from a new subscriber !

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад

      Thanks for joining and welcome 😊.

    • @davidtindell950
      @davidtindell950 Месяц назад

      @@ThorstenMuellerP.S. Since Coqui is 'DEAD", what local TTS Model with personal Voice Cloning can we employ ????

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад

      @@davidtindell950 I'd go with Piper TTS for now. ruclips.net/video/b_we_jma220/видео.htmlsi=aFZ-Z5nNpiQxa0Zo

  • @CezarPopescu
    @CezarPopescu Год назад

    Thanks for sharing, Thorsten! Got yourself a new subscriber (y)

  • @devinhedge
    @devinhedge Год назад +1

    I love this if for no other reason it helps me learn German dialects.

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      So, i'm your reference for a german dialect? 😆👍

  • @jonnypawan4650
    @jonnypawan4650 Год назад

    Great and Unique Videos Always, Thank you for your time and efforts.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thank you so much. Feedback like yours always keeps me motivated ☺️.

  • @manuelherrerahipnotista8586
    @manuelherrerahipnotista8586 Год назад +1

    Really good video man. Well explained and researched. Thanks a lot

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your nice feedback. I'm happy that you liked it 😊.

  • @ŁukaszMadajczyk
    @ŁukaszMadajczyk 23 дня назад

    hi Thorsten,
    may the next "how to" would be training coqui-TTS model based on Glow-TTS and HiFiGAN vocoder?

  • @der-putz
    @der-putz Год назад

    Mal wieder klasse Video. Gibt es ein ATI Äquivalent?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Vielen Dank für das nette Kompliment 😊. Mit ATI Grafikkarten habe ich in diesem Zusammenhang keine Erfahrung. CUDA ist primär auf NVIDIA Karten ausgelegt. Es gibt/gab wohl ein altes Projekt namens "gpuocelot" was in diesem Bereich unterstützen wollte. Aber da kann ich Dir nicht wirklich weiterhelfen.

  • @MrArdo-branch-main
    @MrArdo-branch-main Год назад

    this very well done explained.. Thank you Thorston-Voice this video helps me to continue my hobby and research.

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      Thank you. Nice feedback like yours always keeps me motivated to continue this journey ☺️.

  • @phen-themoogle7651
    @phen-themoogle7651 Год назад +1

    I subscribed although I could only watch for a few mins because of some health problems I’m having nowadays. If possible I Would like a cool tutorial or explanation on ways to do this without downloading anything new to my computer or going through a long process, like maybe if it’s possible to do this 100% online then that would be awesome! Since technology is improving so fast nowadays I’m sure there’s some sites that have to exist where we can do this online right..

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      First of all, i hope you get well soon 😊. Thanks for subscribing and i agree, right now the process is not a simple 1-2-3 process, but voice cloning is getting better and for english voices it might be possible (in near future) to clone your voice easier. Not sure how perfect the cloned voice will be with a simple process, but we'll see.

    • @phen-themoogle7651
      @phen-themoogle7651 Год назад

      @@ThorstenMueller Thanks! I'm fluent in Japanese, and looking forward to doing this in Japanese sometime too.

  • @pocketsfullofdynamite
    @pocketsfullofdynamite Год назад

    Which graphic card do you use pls. Thanks for the info.

    • @ThorstenMueller
      @ThorstenMueller  Год назад +2

      In this video i've used an NVIDIA GTX 1050 Ti. But for my other models training i use an NVIDIA Jetson Xavier AGX.

  • @kostas9849
    @kostas9849 Год назад +1

    Hello,i just subscribe to your channel and i have one question:does this work with foreign languages or only english?

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      Thank you for joining my channel 😊. This will work in other languages as well. I've created an earlier video (not Windows specific) with some more detail if that's helpful for you. ruclips.net/video/4YT8WZT_x48/видео.html

    • @kostas9849
      @kostas9849 Год назад

      @@ThorstenMueller Thank you so much,you are the best!

  • @prakharpaw-de7vh
    @prakharpaw-de7vh Год назад

    Thank you so much for this video, really helpful!

  • @amaarboss2115
    @amaarboss2115 Год назад +1

    Hello, Mister @Thorsten, I wanted to know how you do the training a thousand times, and yet the sound does not sound clear, but when I use your voice through the tts-server, the sound appears very clear .... How did you train your voice? (which is on the server) and Thank you for this great effort.

    • @ThorstenMueller
      @ThorstenMueller  Год назад +6

      Thanks for your feedback. The training in this video is just for the demo. With 3.000 steps there cannot be a clear voice. My public released models with tts-server have been trained for over 2 month with around 600.000 steps. Does this explanation help you?

    • @amaarboss2115
      @amaarboss2115 Год назад +1

      @@ThorstenMueller Thank you for this useful information. The picture is now clearer

  • @seansean995
    @seansean995 Год назад

    i subscibed 1st video great teacher!!!!!!!!

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks a lot for your very nice feedback - and welcome 😊

  • @impishsquirrel
    @impishsquirrel Месяц назад

    Chatgpt provided me step by step with all the codes needed to run coqui TTS

  • @loiclacaille8683
    @loiclacaille8683 Год назад

    Your content is amazing, really useful. Thx.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks a lot for your nice feedback 😊. I'm always happy to hear if people find my content helpful.

  • @TheCeratius
    @TheCeratius Год назад

    Hi Thorsten, thanks for this awesome tutorial which worked perfectly on my machine. However, I trained my model and it's great but not perfect. Is there an option to continue training with this model instead of training a new one (which would take ages just to get to the point where i am now)? I am relatively new to python, so I am not sure if I just have to modify the training script a little or if there is a command somewhere which does this, or if it's just not possible. If you could give me a pointer that would be great!

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your nice feedback 😊.
      You're looking for restore_path and/or continue_path. I've made a special video tutorial on continuing a TTS model training from a previous step checkpoint.
      ruclips.net/video/O6KxJR95WpE/видео.html

    • @TheCeratius
      @TheCeratius Год назад

      @@ThorstenMueller wow, i didn't see that. Sorry about that and thanks a lot for the quick reply and help!

  • @AdityaGupta-k3q
    @AdityaGupta-k3q Год назад

    Thanks for the tutorial. Its really helpful. Can you also make a tutorial on how can we make use of coqui TTS service to fine-tune yourTTS for low resource language with better quality. That would be really helpful. Thanks and keep inspiring :)

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your nice feedback. So you mean a model that is fast enough for e.g. a Raspberry Pi but with a high quality?

    • @AdityaGupta-k3q
      @AdityaGupta-k3q Год назад

      @@ThorstenMueller With low resource language I mean Hindi, Korean, Arabic etc

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      @@AdityaGupta-k3q Okay, sorry did get that wrong 🤦‍♂. Not sure on that. Maybe you can get a good answer when asking this good/important question on Coqui TTS community.

  • @Supratim-jc9kz
    @Supratim-jc9kz Год назад

    Thanks for the video. Also can you make a video on how to run tortoise tts locally on your computer.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your comment 🙂. I've TorToiSe TTS already on my TODO list.

    • @Supratim-jc9kz
      @Supratim-jc9kz Год назад

      @@ThorstenMueller tyvm

  • @vadzimyesman7693
    @vadzimyesman7693 Год назад

    Great tutorial! Thank you for all the details! I have a question though about the training process and dataset. I used 102 samples for my dataset. In order to record them I used Audacity with default recordings settings (mono, 44100 Hz, 32-bit float). For the recipe file, I used the one you show in your video (named something like a "youtube recipe"). After 1000 Epochs I checked the results by synthesizing some words and sentences using tts-server. It was sounding very slow, not normal. While checking the congif,json file I found out that the sample rate in was set to 22050. After I changed it to 44100 and restarted the tts-sever voice was sounding closer to mine, but the quality is still really bad. Could the fact that all the samples were recorded at 44100 Hz affect the whole training since the default saple_rate in that config.json file is 22050 or it is irrelevant and I just need to train it more? Or do I need to start over using samples recorded with 22050 Hz frequency?

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      Thanks for your nice feedback on the details in my tutorial 😊. I guess that you might not get great results with just 102 recordings. Did the training process run even the samplerate did not match? I'd thought this should abort training process. However just changing the value after the training and just for time of synthesis this will not work. Samplerate in config and wave SR must match before starting training process not matter if 22 or 44k at least config is matching reality 🙃

    • @vadzimyesman7693
      @vadzimyesman7693 Год назад

      @@ThorstenMueller The training process did run even the samplerate did not match, 1000 epochs.

  • @anaveragegoogleaccountname
    @anaveragegoogleaccountname Год назад

    I would have appreciated you breaking down how the audio samples should be formatted, maybe a bit more explanation of the code and also torch audio does not install along with torch either.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your suggestion. I thought diving to deep into code might be hard to follow, but i'll think on more in detail video - which will be longer though.

  • @mementomori-l2l
    @mementomori-l2l Год назад

    Hello, thanks so much for the video. I'm in the process of training a custom VITS TTS model using a dataset that I've created. Around the 200,000-step mark, the average loss on my trainEpochstats/avg_loss_1 is creeping up . My dataset is fairly small, approximately 1 hour in length, but it does have good coverage of phonemes. When I tested the audio, it had the correct voice quality but the speech was nonsensical. Should I halt the training to expand my dataset, or is it typical for models to require more training steps to produce meaningful audio?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      You're welcome 😊. If your dataset is nice phonetically balanced it should produce useable results. My VITS model has been trained (i guess) for 600k steps so there might be room for more training. But maybe you can ask this on the Coqui TTS Github discussion before there are real pros in machine learning. If available add some screenshots on Tensorboard for analysis.

  • @Hellfreezer
    @Hellfreezer Год назад

    Is there a way to stop and resume training? The continue path command does begin the process but it then fails when generating sample sentences.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      It's some time ago since i used continue/restore a training. I guess you know my video on exact this topic? ruclips.net/video/O6KxJR95WpE/видео.html
      This isn't working? Maybe it's a bug or a changed usecase in Coqui TTS then.

    • @Hellfreezer
      @Hellfreezer Год назад

      @@ThorstenMueller Yes, that's the video I found the method in. I'm not sure if anyone else is having the same trouble, but I haven't been able to find a solution at present.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      @@Hellfreezer Is there any specific error message when running continue and while generating sample sentences?

    • @Hellfreezer
      @Hellfreezer Год назад

      @@ThorstenMueller I tried to post the full info but it seems to have been hidden. Basically the traceback ends in TypeError: expected string or bytes-like object

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      @@Hellfreezer There's a closed issue on that. Maybe this is helpful for you.
      github.com/coqui-ai/TTS/issues/2070

  • @techterry5299
    @techterry5299 10 месяцев назад

    5:36 is not very clear where did that come from?

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад

      You mean the voice dataset in this LJSpeech file and directory structure?

  • @RichardCastuera-d8l
    @RichardCastuera-d8l Год назад +1

    Thank you so much!

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thank you for this really nice feedback. Feedback like yours keeps me motivated 😊.

  • @feixym
    @feixym Год назад

    thank you for your video , it's great worker

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      You're very welcome. Happy it's helpful for you 😊.

  • @MrAngryWh1te
    @MrAngryWh1te Год назад

    Hello! Thanks for the tutorial! Just finished teaching. My bot can't string letters into words at all. I would like to ask you what scale the dataset should be, and is it possible to speed up the training with google collab?

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      You are welcome 🙂. Not sure what you mean by "letters into words"? Do you mean, as example, "TTS" vs. "T T S"? pronunciation? Google colab provides simple GPU power which is far better than CPU, but it disconnects sessions regularly (in the free edition).

    • @MrAngryWh1te
      @MrAngryWh1te Год назад

      @@ThorstenMueller First, thanks for the reply! I mean my bot can't say a word, it's more like a monster roar (like grr). But at the same time, he can change the tone of speech, using, for example, an exclamation mark.
      I asked about the dataset in my first comment because I think it's my problem and the quality of my dataset is not high enough.

  • @belalgaber555
    @belalgaber555 Год назад

    I love your knowledge man

  • @zsoltvastagh7023
    @zsoltvastagh7023 Год назад

    awesome tutorial, thank you... unfortunately, it keeps getting interrupted with a multiprocess error before the last step, I'm looking for a solution to solve the error. If others have succeeded, and I see in the video that it works for you, maybe it will work for me too. :)
    Could there be a difference between Windows that could cause this error?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your nice feedback 😊. Different Windows version might be a reason. Which version do you use? Is there an error message shown?

  • @boogeyman8099
    @boogeyman8099 Год назад

    How do I fix the freeze issue? I can't find anything about it other than the resource you provided (bug) was 'closed' with the authors comment being 'we don't support windows' when you've clearly done it on windows! I've spent a lot of time on this and would like to figure it, and help would be appreciated.

  • @qodeninja
    @qodeninja Год назад

    cool video, can you do this with a docker setup, sans windows?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your feedback 🙂. Do you mean training a TTS model using Coqui TTS inside a Docker container?

    • @qodeninja
      @qodeninja Год назад

      @@ThorstenMueller yes, exactly. is that even possible or do you need GPU? I want to be able to use my local NAS for something more than a filestore so I was wondering if this was possible

    • @qodeninja
      @qodeninja Год назад

      yes please@@ThorstenMueller

  • @justelesnews
    @justelesnews Год назад

    Hi, nice video ! Could you tell me what you think of the new arduino for speech recognition ? -> nicla voice

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Personally i've no experience with arduino. You think it's worth to check this topic?

    • @justelesnews
      @justelesnews Год назад

      @@ThorstenMueller I don't know. Arduino says this is the first time that we can recognize voice commands with neural decision processor, ultra low power consumption and very good recognition. I don't know if it's true or not. It's expensive but I think I'll give it a try

  • @RobinLorenczat
    @RobinLorenczat Год назад

    Is it possbile to combine two voices? And what sample rate should I use for the dataset?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      What do you mean with "combining two voices"? I've trained my TTS models with 22kHz samplerate.

  • @EzmiTV
    @EzmiTV Год назад

    Hi! Everything works fine, thanx! Except that it refuses to handle accented Hungarian characters (éáűőúöüóí). Does it need to be converted somewhere to handle these letters as well? For sentences without an accented character, it is perfect.

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      Do you mean you have problems on training the model with these chars or did training run good and you're having problems synthesizing? Have you trained using phonemes or characters? Maybe you can run this script on your dataset and add any specials chars to your config.
      github.com/coqui-ai/TTS/blob/dev/TTS/bin/find_unique_chars.py

    • @EzmiTV
      @EzmiTV Год назад

      @@ThorstenMueller Yes, "abcdefgh..." - ok. "éáőúöüó..." - omits it from the speech. A new config.json is created in the new folder at every start. Where can I add the returned values to the configuration?

  • @jaylee6488
    @jaylee6488 5 месяцев назад

    hello Thorsten: I try to figure it out by myself follow the step, but it doesn't work in some how, can i make appointment with you for about half an hour, so that you can give me some guidance?

    • @ThorstenMueller
      @ThorstenMueller  5 месяцев назад

      You can contact me by using my contact form here, but it might take some time until i can respond. www.thorsten-voice.de/en/contact/

  • @cmyk8964
    @cmyk8964 Год назад

    I started training the model, and after 8 hours, only 2 epochs were completed. Is this normal and do I need to complete all 1000?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      What do you mean by "completed"? Normally the training process runs until you stop it manually. Did training end automatically?

  • @mukhamejantalap4526
    @mukhamejantalap4526 8 месяцев назад

    hey, I am trying to train my model on my language(kazakh) by your tutrotial. it's been over 1 day since it training, but I am getting some weird noises of speakers, I didn't see that you change or add any symbols, so did I. Do I need to add alphabet of my language?

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      In general one day is not much time for training a tts model. Do you use phoneme or character based training?

    • @mukhamejantalap4526
      @mukhamejantalap4526 8 месяцев назад

      @@ThorstenMueller I've used phoneme based. Well I was thinking maybe at least I will get something. The data was containing over 12k audio samples with a lot of speakers, each speaker has 250 samples. Maybe because of that the feature it didn't match.

  • @omarharbah6972
    @omarharbah6972 11 месяцев назад

    A lot of thanks man !

  • @mi16chap
    @mi16chap Год назад

    Hi Thorsten, thanks for putting the video together, when I try run my version of your train_vits_.py script, I get an error saying ModuleNotFoundError: No module named 'TTS.tts.configs.shared_configs - any pointers (I tried to add the project path to my system environment variable, but no luck)

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Hi, are you in your Python venv? Does "pip list" shows a TTS package?

  • @nestboxcam-Surabaya
    @nestboxcam-Surabaya Год назад

    Thank you for this

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      You're welcome 😊. I hope it's been helpful for you.

  • @ThugLife-is1yo
    @ThugLife-is1yo Год назад

    confused where exactly did you put your voice file for training ?

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      You're looking for the parameter "dataset_config" in the training recipe file. There you can write the file location to your voice files (in LJSpeech format) for training.

  • @masamiakita993
    @masamiakita993 Год назад

    Thanks a lot!!

  • @andiratze9591
    @andiratze9591 Год назад

    Hey Thorsten. Kann man coqui so installieren mit allen Models und Funktionen, wie auf der Website, dass man keine Commands mehr eintippen muss und es komplett offline nutzen kann über das User Interface?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Hi Andi, ich gehe davon aus, dass Du Coqui Studio meinst.
      Soweit ich weiß, ist das nicht Teil ihrer Open-Source Veröffentlichung. Also sage ich mal, das ist nicht möglich. Lediglich das Kommando "tts-server" bringt ein lokal lauffendes Webfrontend, was aber natürlich nicht mit Coqui Studio verglichen werden kann.

    • @andiratze9591
      @andiratze9591 Год назад

      Gibt es andere Software, die man, nachdem man alles eingerichtet hat, offline nutzen kann oder Coqui wenigstens mit ein paar pretrained Models?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      @@andiratze9591 Du kannst alle Coqui TTS Modelle offline nutzen, nur eben nicht per so komfortabler Oberfläche wie Coqui Studio. Kennst Du das Video von mir? Da zeige ich das. ruclips.net/video/alpI-DnVlO0/видео.html

    • @andiratze9591
      @andiratze9591 Год назад

      Ah danke, ich dachte, das ist nur ein Video mit Terminalbefehlen, ohne vorhandenes User Interface. Ich mache nachher mein Windows neu und probiere es mal aus.🙂

    • @andiratze9591
      @andiratze9591 Год назад

      Ich werde später mal versuchen, Python zu lernen. Vielleicht kann ich mein eigenes TTS-VC programmieren. Es ist unmöglich Freesoftware in dem Bereich zu finden, die einfach zu bedienen ist. Bei allen finde ich was. Foto Video u.s.w aber tts ist voll schlimm🥴

  • @youngphlo
    @youngphlo Год назад

    I follow every step up until 08:33 but when I run `pip install TTS` it tries to install every version of transformers. I would share a screenshot if I could. Never seen a `pip install` go through all the different versions of a package

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Maybe Coqui TTS dependencies have changed in newer releases? Could you download/clone the version i've used in the video just to check if this works.

    • @shivam5648
      @shivam5648 10 дней назад +1

      So any solution to that problem?

    • @youngphlo
      @youngphlo 10 дней назад

      @@shivam5648 are you running into the problem i described when you try to install now? This only happened for the old release back then as I understand it. The OG Coqui is pretty much deprecated now but this error shouldnt happen anymore.

    • @shivam5648
      @shivam5648 10 дней назад

      @@youngphlo it's just not installing and take hours after installing for hours there is this error .it's so frustrating

  • @RogueMandoGaming
    @RogueMandoGaming 10 месяцев назад

    So i'm getting as far as running the "pip install -e ." command before getting errored out with status code 1 something about wheel

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      Try running "pip install setuptools wheel -U" before, maybe this helps.

  • @kostas9849
    @kostas9849 Год назад

    I need help!Inside the folder TTS - training there are some archives as you show in the video, how were these archives found there? How do I put it exactly the same in the folder TTS - Training I made?and when i change directory and enter in the TTS - Training folder and type the python command nothing happens.Please could you help me on that? :(

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      I'm not sure if i understand your question right. So training process starts and the "output_folder" is created and filled with files. Are you already trying to synthesize voice while training? Are audio samples in Tensorboard available?

    • @kostas9849
      @kostas9849 Год назад

      @@ThorstenMueller I don't know how the output file was created in your video and filled with files.I follow your steps one by one, i installed python,eSpeak-ng,Microsoft Build Tools and when you open the command prompt i really stuck there.I created the directory as you did but in my directory there's not the files that you show in the video.I type the python commands but nothing happened.What i did wrong? :(

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      ​@@kostas9849 Strange, the output directory with the training_run name and a timestamp for training start date will be created automatically. Did cloning the Coqui TTS repo work and adjusting the recipe?

  • @michaelb1099
    @michaelb1099 Год назад

    great tutorial but i am trying to replace my microsoft voices with my cloned voice is this doable?

    • @ThorstenMueller
      @ThorstenMueller  Год назад +2

      Thanks for your nice feedback 😊 and great question. I tried this some time ago too, but didn't find an easy solution for this. But if this is interesting in general i might give it a closer look. Most voices seems to come out of their Microsoft Azure cloud services.

  • @josebo8780
    @josebo8780 2 месяца назад

    I am getting only around 80 iterations per hour in a setup with a rtx 3090. Is tooooo slow right?

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад +1

      Good question. But it is way faster than my NVIDIA Jetson Xavier AGX 😉

  • @pink_kniteu
    @pink_kniteu Год назад

    I would like to training new model tts for new language. Is this the same way to to that? Can you give me some advice it please.. it's really help me

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      You're right. It's working the same way. Maybe you can watch this tutorial showing how to create a voice dataset for your new language model.
      ruclips.net/video/4YT8WZT_x48/видео.html

  • @MatyssMatyss
    @MatyssMatyss 9 месяцев назад

    hello! I just wanted to know hoy many audio files do I need to clon a voice, since i just recorded like 50 wavs files but when I start the trainer the script fails since "there is no sample left"

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад

      I guess 50 is way too less. I recorded over 10k wave files for my german "Thorsten-Voice" voice clone. Maybe give it a try with 1000 recordings.

  • @psyk0l0ge
    @psyk0l0ge Год назад

    It tells me that I might need to install an third party phonemizer for the language de.... Where do you get the extra files from that u have installed and cd.. into at about 10:37 ? I

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Did you install espeak-ng as shown here?
      ruclips.net/video/bJjzSo_fOS8/видео.html

    • @captainlavenderVHS
      @captainlavenderVHS 11 месяцев назад

      I had this problem too... A reboot seemed to fix it, but I also did a "pip install phonemizer" before, which may not have actually been necessary.
      In case anyone else is wondering, got this running on Win 11, using Anaconda 2.5.1 (Python 3.11.5), CUDA 12.3.5.1, and Coqui TTS 0.21.2

  • @JamesBond-ix8rn
    @JamesBond-ix8rn Год назад

    how long training until it sounds good?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Depending on what you mean by "good" 😉. By step 30k you should be able to hear a voice with lots of background noise. Starting by step 100k voice should be clearer. Then it's up to your personal expecations.

    • @JamesBond-ix8rn
      @JamesBond-ix8rn Год назад

      @@ThorstenMueller thanks for the prompt response. how long does this take in hours/days/months and how much input data would approximately need?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      ​@@JamesBond-ix8rn It's hard to call specific values as it depends on the hardware you have available for training. Might be some hours to weeks/month training time. Ensure a good phonetic balance and add more recordings by time if you're not satisfied with the result.

  • @deeber35
    @deeber35 Год назад

    Can you change the tone of the voice reading text {e.g. excited, sad, etc}?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Emotions aren't supported on Coqui TTS models (as far i know). Maybe SSML in Mimic 3 might be at least a little bit helpful in that context.

  • @MistakingManx
    @MistakingManx 7 месяцев назад

    Right, how should I go about creating the dataset though?

    • @ThorstenMueller
      @ThorstenMueller  7 месяцев назад

      Hi, do you know my tutorial on Piper-Recording-Studio for doing so? ruclips.net/video/Z1pptxLT_3I/видео.html

    • @MistakingManx
      @MistakingManx 7 месяцев назад

      @@ThorstenMueller I started following your mimic recording studio and it's instructions, so I could make my own Coqui LJSpeech model, but it isn't working for some reason.
      Some files don't exist anymore, and it seems mad about numpy.

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      @@MistakingManx Hmm, as Mimic-Recording-Studio is not actively maintained this might stop working due newer package versions (like numpy). I'd use Piper-Recording-Studio as it will generate an LJSpeech like dataset too.

    • @MistakingManx
      @MistakingManx 6 месяцев назад

      @@ThorstenMueller I already used mimic-recording-studio, it's what the tutorials used, and it seemingly worked fine, minus the part I had to fix.
      Your script that makes the dataset was useful, I just can't get the training stuff to work at all.
      I wanted to use windows since I have a 4090ti on it.
      Would it be possible to talk on a platform like discord?

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      ​@@MistakingManx You can send me an email using my contact form here: www.thorsten-voice.de/en/contact/
      But it might take some time to respond for me so please be a little bit patient 🙂.

  • @kaymat2368
    @kaymat2368 Год назад

    11:09 Help please im stuck in this step becuase its gave this error: "OSError: [WinError 126] The specified module could not be found. Error loading "cudart64_110.dll" or one of its dependencies."

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Seems like your CUDA installation is broken. Are you sure CUDA is installed correctly?

    • @kaymat2368
      @kaymat2368 Год назад

      @@ThorstenMueller Im not sure, i followed your steps clearly

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      @@kaymat2368 Hard to say, what might cause this issue. Maybe try installing a newer CUDA version.

    • @kaymat2368
      @kaymat2368 Год назад

      @@ThorstenMueller Ok, thanks for replying, btw, my GPU is nvidia GeForce GT 520, Os Win 7

  • @Live_draw_today
    @Live_draw_today Год назад

    Sir while running last line, error occurres = charmap, codec can't decide bytes.
    Plz help

  • @pink_kniteu
    @pink_kniteu Год назад

    Nice thank youu

  • @peethaer
    @peethaer Год назад

    Du bist mein Held.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Soweit würde ich wohl nicht gehen 😉. Aber ich freue mich sehr über dieses mehr als nette Feedback 😊.

  • @-.nocturna.-
    @-.nocturna.- Год назад

    How long does it take to train a model? lg

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      Hallo 👋. For my Thorsten-Voice models training took around 3 month 7x24 compute time. But this depends on your available hardware for training.

    • @-.nocturna.-
      @-.nocturna.- Год назад

      @@ThorstenMueller Woah, did you train it yourself? What GPU did you use? Thats insanely long in this trying times of energy prices. :/

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      @@-.nocturna.- Absolutely. This is the usual trade-off between graphics performance and duration. I used an NVIDIA Jetson Xavier AGX, which has a relatively low power consumption.

    • @-.nocturna.-
      @-.nocturna.- Год назад

      @@ThorstenMueller Thats a nice one. 30w vs the 320w of my 4080 :| i think i will do it if my other projects fail :P Have a nice night :>

  • @recrieprodutora
    @recrieprodutora Год назад

    The process return the error: "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process..." Im used the your code.

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      I've seen this error previously, but i'm not absolutely sure about the reason. Is training running nevertheless or not starting? Does running command line prompt as admin change the behavior?

    • @recrieprodutora
      @recrieprodutora Год назад

      @@ThorstenMueller The training starts, but the error occurs in the sequence. I don't know how to fix

    • @recrieprodutora
      @recrieprodutora Год назад

      @@ThorstenMueller I tried modifying the root of the folder and the permission of the prompt, but the error keeps returning.
      Have you ever seen anything like it? Even using your "train..." which already contains "if _name_ == '__main__':", returns me with an error in training. Can you imagine which way I should go? 😪😥

    • @shadaaan
      @shadaaan Год назад

      same error- i am also getting, any solution found this?

  • @dayteimyasuki
    @dayteimyasuki Год назад

    i can't get the pip command to work, help!!

  • @muhammadalfahrezi1745
    @muhammadalfahrezi1745 Год назад

    I want to make a new model of Indonesian language. but in espeak-ng it doesn't support that language. is it still possible to make a new model?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your good question. Yes, that's possible. You can set "use_phonemes" to "false" and then it will use character based training.
      Maybe this helps a bit. tts.readthedocs.io/en/latest/tutorial_for_nervous_beginners.html?highlight=use_phonemes

    • @muhammadalfahrezi1745
      @muhammadalfahrezi1745 Год назад

      @@ThorstenMueller still using espeak or not? the alphabet is the same as in English, but only the spelling is different. sorry I ask a lot

  • @IngridUterus
    @IngridUterus 10 месяцев назад

    Ich habe Python 3.11 installiert. Muss ich das deinstallieren und 3.8 installieren? Wäre voll kacke

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      Laut Readme sollte Python 3.11 funktionieren (python >= 3.9, < 3.12.).

  • @mungamurisairamiiitdharwad7451

    how many samples do we need for the trainnig

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      As always - it depends 😉. With less than 100 the training process will not start. I recorded > 10.000 phrases for my german "Thorsten-Voice" TTS models. But phonetic coverage might be more important than the pure number of recordings.

  • @thebluefacedbeastyangzhi
    @thebluefacedbeastyangzhi Год назад

    Is there a non CUDA version?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Coqui has a command line parameter called "use_cuda" which can be set to "false", but i guess training will take waaay longer than with CUDA.

    • @thebluefacedbeastyangzhi
      @thebluefacedbeastyangzhi Год назад

      @@ThorstenMueller Thank you doe the reply. I have AMD and not Nvidia. So should I give up this method?

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      @@thebluefacedbeastyangzhi Hard to say, but maybe you try a Google colab notebook with GPU that supports CUDA. Might be a more easy way for you if you don't have access to a local NVIDIA GPU card.

    • @thebluefacedbeastyangzhi
      @thebluefacedbeastyangzhi Год назад

      @@ThorstenMueller thank you again for this information

  • @shazams461
    @shazams461 Год назад

    Okay 👍🏻👍🏻

  • @BaDHamisteR
    @BaDHamisteR Год назад

    is it possible to train the model to speak in Portuguese?

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Sure, if you have a Portuguese voice dataset ready for training.

    • @BaDHamisteR
      @BaDHamisteR Год назад

      @@ThorstenMueller well.. i have my own voice 🤣. i wanna try that.

  • @azer0013
    @azer0013 Год назад

    Where is TTS-training??

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      It is an empty folder in which you start working. I created a new folder "TTS-Training" but you can name it whatever you want.

  • @Hinterfrage
    @Hinterfrage Год назад +1

    Oh, nur betrug clips stellt der Herr rein, intressant, da gibt es viel zu reporten ...

  • @magenta6
    @magenta6 Год назад

    Thanks Thorsten for your endless efforts at communicating a complex subject with enthusiasm and passion to people who don't know much about python. I see that you have linked another video about preparing recordings ruclips.net/video/4YT8WZT_x48/видео.html

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      You're very welcome 😊. And yes, i'm really passionate about this topic.

  • @tesitest378
    @tesitest378 Год назад

    Coqui Eleutherodactylus a frog from Puerto Rico 🇵🇷

  • @nobudy_left
    @nobudy_left 2 месяца назад

    das shirt 😂 scheiß encoding, fühl ich

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад +1

      Vielen Dank 😊 - ist auch eins meiner Lieblingsshirts.

  • @JoeLinux2000
    @JoeLinux2000 Год назад

    Waiting fro Linux to get proper HQ Text to Speech.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      With Coqui TTS or Piper TTS there are some pretrained and really nice sounding TTS models available for Linux in multiple languages 😊. Do you know these?

  • @tarekhassan6958
    @tarekhassan6958 Год назад

    It looks like mining issues

  • @KominoStyle
    @KominoStyle Год назад

    Well something on my end is not working -.-!

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Do you get any specific error message?

    • @KominoStyle
      @KominoStyle Год назад

      @@ThorstenMueller Well sorry for the late respond, i tried many different ways to install and use TTS, but one big problem i have was that i cant install python 3.8 for all users, every other version i can
      and im not sure if thats the big problem

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      @@KominoStyle Which Python version are you using then?

  • @a.tevetoglu3366
    @a.tevetoglu3366 Год назад

    ei gude wie?! ;)

    • @ThorstenMueller
      @ThorstenMueller  Год назад +1

      Ei subba - un selbst? ;)

    • @a.tevetoglu3366
      @a.tevetoglu3366 Год назад

      @@ThorstenMueller wies halt so geht. Übrigens besten Dank für Deinen content. Ich hab mir 2 rtx a5000 gekauft, und frag mich was ich damit anstellen kann da ich kein Gamer oder Architekt oder Programmierer bin (die ursprüngliche Absicht eine Renderingworkstation zu bauen wurde aus unterschiedlichen Gründen obsolet) und deine Vids inspirieren zu ganz interessanten Versuchen. Ich war interessiert eigene ai Projekte auszuführen, und es scheint du bietest hierzu know how an. Beste Grüsse aus der Türkei vom rheinischen Exilanten.

  • @OurSouthernLife
    @OurSouthernLife 10 месяцев назад

    Thank you, this video has helped me get to this point. Can you help with this error, I am stuck here and can't seam to find a solution. I followed your video but when I go to run the trainer i get the following error:
    (TTS) C:\Users\7danny\Documents\CoquiTTS\TTS>python .\train_vits_win.py
    Traceback (most recent call last):
    File ".\train_vits_win.py", line 6, in
    from TTS.tts.configs.vits_config import VitsConfig
    File "C:\Users\7danny\Documents\CoquiTTS\TTS\TTS\tts\configs\vits_config.py", line 5, in
    from TTS.tts.models.vits import VitsArgs, VitsAudioConfig
    File "C:\Users\7danny\Documents\CoquiTTS\TTS\TTS\tts\models\vits.py", line 38, in
    from TTS.vocoder.models.hifigan_generator import HifiganGenerator
    File "C:\Users\7danny\Documents\CoquiTTS\TTS\TTS\vocoder\models\hifigan_generator.py", line 6, in
    from torch.nn.utils.parametrizations import weight_norm
    ImportError: cannot import name 'weight_norm' from 'torch.nn.utils.parametrizations' (C:\Users\7danny\Documents\CoquiTTS\TTS\lib\site-packages\torch
    n\utils\parametrizations.py)

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      You're welcome. Did you update all python packages before starting the training?

  • @OmriDaxia
    @OmriDaxia Год назад

    This is an awesome tutorial, thank you for doing all the trial and error that I kept running into.
    I do have one problem though. I've used your modified training script and only changed the directories, but I'm still getting a permission error:
    PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:/TTS/ThorstenTut/ljsAlex01-April-26-2023_05+12PM-0000000\\events.out.tfevents.1682554375.DESKTOP-IUNHJ2B'
    Is there any workaround for this? It's pointing to one of the files it just generated, which means it's not being used by any other process, so it must be that multithreading problem you mentioned still being an issue somehow.

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      Thanks for your nice feedback 😃. I run into that permission thing once, too. I'm not sure how i solved it. I'll check my notes for this video and think how i solved this. When i remember i can share it here. Maybe try running command line prompt as local admin might be a first try.

    • @zsoltvastagh7023
      @zsoltvastagh7023 Год назад

      @@ThorstenMueller I have the same problem. Please let me know if you have found a solution to the error. Thank you very much!

    • @thefurrowzor
      @thefurrowzor Год назад

      Any updates regarding this issue?

    • @OmriDaxia
      @OmriDaxia Год назад

      @@thefurrowzor nope, still stuck here. Not sure what to do

    • @ThorstenMueller
      @ThorstenMueller  Год назад

      @@thefurrowzor Might this issue help you? For me it worked while testing for this tutorial. Hopefully i'll work for you too. If this is the case, i could add the link to the video description.
      github.com/coqui-ai/TTS/issues/1711