Local voice cloning with 6 seconds audio | Coqui XTTS on Windows

Поделиться
HTML-код
  • Опубликовано: 21 ноя 2024

Комментарии • 218

  • @toykotokyoto
    @toykotokyoto 11 месяцев назад +13

    another great video, Thorsten 👏 We have a happy update... you can now use unlimited audio for the 0-shot clone :D no longer are you limited to just 6 seconds. The HuggingFace space is still hard coded to max out at 30 seconds though... so we don't overload their servers 😆

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад +5

      You're very welcome and thanks for the update 😊.

    • @juanjesusligero391
      @juanjesusligero391 11 месяцев назад

      This is great news! :D You probably should make another video comparing the quality differences between the 6 seconds and 30 seconds input audio! (or maybe more, if you can change that max value in the local installation) ^^ @@ThorstenMueller

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад +4

      @@juanjesusligero391 An audio samples comparison video with different audio input length is already in the making 😉.

    • @tsunderes_were_a_mistake
      @tsunderes_were_a_mistake 10 месяцев назад

      Does the output sound better with longer audio? I tried the Japanese version on hugging face and output sounded robotic.

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад

      ​@@tsunderes_were_a_mistake In my german model i didn't encounter a change depending on the text length. But i did not exactly check this specific aspect. If you think this would be helpful i can give it a more specific try (with a german model). But i can't say anything about the Japanese model.

  • @juanjesusligero391
    @juanjesusligero391 11 месяцев назад +6

    I was exactly like you, I also had too high expectations for Coqui XTTS, haha ^_^
    While the outcome wasn't quite what I was expecting, the results are still quite impressive, especially considering they are based on just a 6-second sample. I was also really happy to read in the comments that the devs are working on improvements, like allowing for voice samples longer than 6 seconds.
    I loved the video! Thanks a lot for your work, Thorsten! ^^

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад +1

      Thanks a lot for your nice feedback 🥰.

  • @Reincarnated_Recap
    @Reincarnated_Recap 6 месяцев назад +1

    omg, the quality is so good compared to all the other voice-cloning TTS

  • @MohanPoornachandra
    @MohanPoornachandra Месяц назад

    Thanks a lot. I was wanting to train a model from many days and was thrashing with various errors. This solved everything

  • @schakuun1995
    @schakuun1995 11 месяцев назад

    Great video! I'm really getting into TTS and it's so exciting to see what's possible now. It's incredible how something that needed hours of data a year ago can now be done in just 6 seconds. It's fascinating to watch this tech evolve

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад

      Thank you for your nice feedback 😊. I'm really curious to see where quality is going in near future.

  • @secondaccount5512
    @secondaccount5512 11 месяцев назад

    Great video, expectations after listening to the interview with Josh were high, but XTTS is still kinda new, so I am excited for the future improvements.

  • @davidtindell950
    @davidtindell950 2 месяца назад +1

    Thank You Yet Again! P.S. In addition to "Schei? Encoding" ... I am a fan of: "CAUTION I TEST IN PRODUCTION".

  • @nuborn.studio
    @nuborn.studio 9 месяцев назад

    Nettes Tool und großen Respekt an den Entwickler! Ich finde die Idee super, allerdings könnte ich persönlich nichts mit der Qualität anfangen. Aber hey, für 6 Sekunden input ist dass doch ein mega Ergebnis finde ich!

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад +1

      Dem kann ich mich anschließen 😊.

  • @nerdynav
    @nerdynav 11 месяцев назад

    Hi Thorsten, I am a computer engineer and AI RUclipsr myself (who isn't nowadays? haha :P). Just wanted to say that you make great tutorials on AI voice. I stumbled on this tutorial while exploring Coqui and it is the best tutorial I found. Thanks for taking the time to do these.
    Also, a subscriber asked me for a resource on Coqui TTS tutorials on reddit, I have shared your channel! Keep up the great work.

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      Hi 👋. Thanks for your kind feedback on my content 😊. You're right, we are not alone on AI content 😆.

    • @ThatGuyNamedBender
      @ThatGuyNamedBender 9 месяцев назад

      Pretty much 95% of youtube and the working class are against AI lmfao but keep daydreaming

  • @Cmapukan
    @Cmapukan 8 месяцев назад

    Thanks for the good explanation and clear example. I wish you prosperity and new opportunities. I apologize for my broken English.

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад +1

      Thank you for your nice comment. I wish you all the best, too 😊.

  • @davidtindell950
    @davidtindell950 Месяц назад +1

    Using my local PC GPU: Cloned Voice WORKED WELL ... and ... sounded 'somewhat ' like me BUT actually BETTER than me ( bolder and stronger ) !!!!

    • @elplayeravefenix2280
      @elplayeravefenix2280 Месяц назад +1

      this work for you actually??????

    • @davidtindell950
      @davidtindell950 Месяц назад +1

      @@elplayeravefenix2280 Yes. Not very well but it ‘worked’. On other projects I have found that more voice samples worked better but takes time. Ok.

  • @МихаилЮрков-т1э
    @МихаилЮрков-т1э 8 месяцев назад

    Thanks for the informative video and interesting presentation.
    Please make a guide on how to train a model on a custom dataset.

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      Thanks for your nice feedback 😊. This topic is already on my (growing) TODO list.

  • @CatonSilver
    @CatonSilver 10 месяцев назад +1

    amazing video! I am wondering if it's possible to train a given voice and then just use that voice for future use. In the "clone your voice locally" section, the code requires the reference audio as an input. I'm thinking in terms of efficiency and that if you plan to use the same voice over and over, you shouldn't need to train the model each time.

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      Good question. I didn't think about that - up to now.

  • @humanperson8418
    @humanperson8418 Месяц назад

    It has a clear English bias, but overall sounds pretty good.

  • @MarcoManzo
    @MarcoManzo 11 месяцев назад

    Great! I was looking forward to this, only got it running on linux. Thank you for the tech support ;-)

    • @MarcoManzo
      @MarcoManzo 11 месяцев назад

      😂 maybe cuda is exactly my problem on windows🤷‍♂

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад

      Thanks and you're welcome 😊. I'm happy if people find my videos helpful.

  • @__________________________6910
    @__________________________6910 11 месяцев назад

    Sir, your explanation is very easy to understand.

  • @КравчукІгор-т2э
    @КравчукІгор-т2э Месяц назад

    Ich danke Ihnen vielmals. Sehr gutes Video. Deutsche Ordnung in allem!

  • @64jcl
    @64jcl 11 месяцев назад

    Quite amazing that they can do this with such a short clip. I had the same results as you with english, it doesn't really sound like me even though I tried to speak my best english. :) - How would you compare it with Piper with regards to TTS performance? Ofc Piper is quite difficult to train for new voices, but its free to use commercially even. I wish there was some simpler way to clone voices with it and that would be golden. I have looked at your video for this but preparing the training set seems like a chore.

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад

      Thanks for your comment 😊. I didn't compare the performance between XTTS and Piper TTS. I guess when you want a free and best voice clone i'd go with Piper TTS right now, but the effort is higher - as you said.

  • @chrispeters8295
    @chrispeters8295 8 месяцев назад

    Thank you for the super informative video! You're awesome!

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      Wow, thanks a lot for your nice feedback 😊.

  • @AmrAli-ig2mk
    @AmrAli-ig2mk 7 месяцев назад

    Thanks a lot for your efforts. you are doing great work, keep it up.

    • @ThorstenMueller
      @ThorstenMueller  7 месяцев назад +1

      Thank you a lot for your kind feedback - this keeps me motivated 😊

  • @DrFukuro
    @DrFukuro 11 месяцев назад

    Ich mag deine Videos sehr, auch wenn viele leider nur auf Englisch sind. Könntest du dir vorstellen, einmal ein generelleres Übersichtsvideo zur Sprachsynthese machen? Auch nach tagelager Recherche blickt man als Laie nur unvollständig durch, es wäre großartig, wenn mal ein Profi wie Du für den Interessierten etwas tiefergehend folgende Themen erläutert:
    Was genau ist/machen Coqui,
    Xtts, Tortoise, Espeak / espaek-ng und wo ist der Unterschied zu
    Mbrola und dessen Stimmen? (Kann ich tts anstelle von Mbrola in Skripten verwenden? Ja/nein - Wie/Warum?)
    Beispielhafte Fragen zu xtts:
    Was ist eine Multilingual Voice im Unterschied zur Thorsten Voice?
    Was genau ist voice cloning im Gegensatz zu voice transfer?
    Was machen/sind Coqui speakers?
    Wo ist der Unterschied darin, des xtts Modell zu feintunen und einfach nur
    eine speaker_wav Referenz anzugeben?

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад

      Vielen Dank für deine tolle Rückmeldung und den Vorschlag 😊. Das Thema gefällt mir sehr gut. Wenn man sich so lange und intensiv mit einem Thema beschäftigt, dann werden diese "Grundlagen" irgendwie so normal, dass man gar nicht mehr drüber nachdenkt. Ich habe das Thema auf meine TODO Liste gesetzt. Besten Dank dafür 😊.

  • @amp3253
    @amp3253 11 месяцев назад +1

    Could you help, please?
    tts : The term 'tts' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
    At line:1 char:1
    + tts --list_models
    + ~~~
    + CategoryInfo : ObjectNotFound: (tts:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад +1

      Did you use a python venv? Is this activated when try to run "tts" command? Does "pip list" show you an installed TTS package?

  • @starbuck1002
    @starbuck1002 11 месяцев назад

    Ich habe mich ebenfalls ein wenig mit Coqui XTTS ausprobiert. Ich bin zu dem Entschluss gekommen dass es sich nicht lohnt.
    1. kann coqui XTTS nicht annährend mit den führenden Mitstreitern bezogen auf Qualität der clones mithalten.
    2. Ist coqui XTTS für diese Qualität bei diesem Preis meiner Meinung nach nicht lohnenswert, betrachtet man auch hier die Qualität und Pricings der Mitstreiter!
    Trotzdem wieder vielen Dank für dein Video Thorsten!

    • @ratside9485
      @ratside9485 11 месяцев назад

      Welchen Preis? 1$ am Tag für Unternehmen sonst ist es Kostenlos.

  • @hashtag_
    @hashtag_ 12 дней назад

    For anyone coming recently, the tts repo isn't maintained anymore according to an issue post on the github. It results in an error when running 'pip install tts'. This fork worked for me instead: 'pip install coqui-tts'

    • @ThorstenMueller
      @ThorstenMueller  7 дней назад

      Thanks for that fork hint 👍🏻. Maybe an issue with a (too new) python version.

  • @anarmustafayev9145
    @anarmustafayev9145 11 месяцев назад

    Genau das haben wir gesucht. Herzlichen Dank 👍

  • @tobiasd2755
    @tobiasd2755 3 месяца назад

    Sehr gut erklärt.
    Ich hatte von dem video jedoch erhofft, nicht nur einen einzelnen speech zu erstellen, sondern mein eigenes model abzuspeichern, so dass es dann z.B. unter tts --list_models auftaucht oder ich es zumindestens bei --model_name angeben kann.
    Ist das auch möglich?

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад

      Vielen Dank 😊. Die "--list_models" Option zeigt Informationen aus der .models.json Datei aus dem Repo an. Du könntest versuchen dein Modell in der Datei lokal bei dir einzutragen. Du hast also bereits ein eigenes Modell trainiert?

  • @PlayGameToday
    @PlayGameToday 6 месяцев назад

    Hello, sir Thorsten! The title of the video doesn't really capture the point. Unfortunately, I didn't find in your video how to start the GUI for Coqui TTS. In the title to the video you have stated - XTTS - and just I was hoping that I could run the gradio-gui that was at the beginning of your video. Too bad you don't have a video tutorial on how to deploy on your local machine the handy GUI for voice generation that was in the demo.

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      Do you mean the Huggingface UI from the video?

    • @PlayGameToday
      @PlayGameToday 6 месяцев назад

      @@ThorstenMueller Yes

  • @timo1949
    @timo1949 9 месяцев назад

    Sehr sehr guter Kanal! 👍 Ich habe mich gefragt: Was ist denn der Grund für die doch niedrige Samplingrate von 22.050Hz im ThorstenVoice Dataset? Einfach eine schnellere Vearbeitung der Daten?

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад

      Vielen Dank für deine tolle Rückmeldung 😃. In den Tests war in der Audioausgabe kaum ein Unterschied hörbar, dafür aber war der Rechenaufwand bei bspw. 44kHz merklich höher.

    • @timo1949
      @timo1949 9 месяцев назад

      @@ThorstenMueller Danke für die Info. Elevenlabs will ja für ein Professional Voice Cloning auch nur 128kbps mp3 und meint, dass kein Nachteil feststellbar ist. Sehr interessant, wie die AI das verarbeitet.

  • @callmefred
    @callmefred 6 месяцев назад +1

    It's sad that they've discontinued the project.

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад +1

      Yes, but they did not just discontinue the project, but Coqui AI (the company) behind XTTS shut down.

  • @terryjones2213
    @terryjones2213 8 месяцев назад +1

    What is your python version?

  • @dempa3
    @dempa3 Месяц назад +1

    This seems very useful, but when I run "pip install tts", I get "Error compiling Cython file", and the operation breaks.

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад

      Strange, which python version are you using?

  • @IvarDaigon
    @IvarDaigon 8 месяцев назад

    I've been using coqui for months and it's amazing that Coqui simulates breathing at all, but breathing is typically the most distorted part of the generated the audio which can make it sound unnatural.. I'm wondering if you remove the breathing from the source audio whether that will improve the quality of the cloned voice or whether the distorted breathing is just a symptom of the underlying model.

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      I've no idea how this could work. Maybe it helps if you use audio tools to cut out your breathing from the recording you provide to XTTS. Or maybe there are audiofilters like sox or ffmpeg that can remove breathing sounds from the generated audio.

  • @Gute_Nacht_Kurzgeschichten
    @Gute_Nacht_Kurzgeschichten 7 месяцев назад

    Super erklärt 👍Wie kann ich denn meine Stimme Klonen das er mir ganze Texte vorliest? z.b. eine PDF Datei oder ein Word Dokument, oder beschränkt es sich nur auf 6 Sek.

    • @ThorstenMueller
      @ThorstenMueller  7 месяцев назад

      Vielen Dank für das Lob - das freut mich sehr 😊. Eine fertige Lösung für Text/Word/PDF Input gibt es (glaube ich) nicht, aber generell kannst Du längeren Output erzeugen. Du musst den Eingabetext vielleicht aufteilen, aber sicherlich gehen deutlich mehr als 6 Sekunden.

  • @LeSchurke
    @LeSchurke 6 месяцев назад

    nices video ;)
    und ei gude wie?
    Is it better, when the ref voice is longer than 6 sec?
    or doesn't matter or more worse? 00:43

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      Ei subba, freut mich', dass des Video gefällt :)
      According to my talk with co-founder of Coqui AI, Josh Meyer, the model is optimized for a 6 second audio input. Before trying longer audio input try using other 6 second clips.

  • @Name-is2bp
    @Name-is2bp 6 месяцев назад +1

    did you make a tutorial on how to install and use cuda?

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад +1

      No, not yet. But interesting idea. I've added it on my TODO list 😊.

  • @quamagi
    @quamagi Месяц назад

    Creo que clono su voz muy bien con esos pocos datos que tuvo la inteligencia artificial

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад

      Estaré feliz si funciona bien en español con pocos datos de entrada.

  • @adityapatil6723
    @adityapatil6723 6 дней назад

    This error originates from a subprocess, and is likely not a problem with pip.
    error: subprocess-exited-with-error
    im getting this error please help someone

  • @madhushantan1887
    @madhushantan1887 Месяц назад

    Hi, now that Coqui is shutting down, we can’t use the model via API? I find trouble using the model like that. for the import code: “from TTS.api import TTS” module not found

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад

      Might be a problem with your local installation, too. Does "pip list" show a TTS package?

  • @bobbyboe
    @bobbyboe 11 месяцев назад

    Hi Thorsten, sieht so einfach aus bei dir. Ich hab Coqui über Pinokio installiert und gestartet, in der Erwartung dann irgendwie lokal zu dieser GUI zu kommen. Pinokio sagt dann auch "running" aber unter den üblichen local hosts im browser finde ich nichts. Dann gibt es noch einen button "server", den hab ich mal gedrückt und bekomme die Antwort: .........Connected! Macht alles den Eindruck als liefe alles wie es soll... nur für mich endet das Erlebnis dort, weil ich nicht weiß wo sich Coqui mir zeigen könnte... schade eigentlich. Pinokio ist normalerweise ein gute Zugang für Non-Coder.

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад

      Meinst Du die GUI von Huggingface?

    • @bobbyboe
      @bobbyboe 11 месяцев назад

      @@ThorstenMueller ja, ich meinte generell irgendeine GUI

  • @64jcl
    @64jcl 11 месяцев назад

    Btw, how do I get the gpu parameter to work. I have a 3000 series GPU but even if I select gpu=True it says CUDA is not available. Also I have noticed that the cloned voice from my own speech shifts to sometimes output british accent and sometimes american (likely because my accent is neither). But it also means it is impossible to get consistent results with this. Is there some way to save a snapshot of whatever it came to was "the voice" and reuse that as input on subsequent generations. If not it is quite useless and just a fun demo really.

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад +1

      Did you install CUDA and is it working? There are Python code sniplets available to check if CUDA is working.

  • @ricardorey259
    @ricardorey259 10 месяцев назад

    Hello, good video, do you know how to remove the character limit restriction when writing?
    Warning: The text length exceeds the character limit of 239 for language 'es', this might cause truncated audio.

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад

      Thanks for your nice feedback 😊. Hmm, not really. Earlier we sometimes run into a "max_decoder_steps" which caused truncated audio, but i'm not sure if this applies here too.

  • @TomiTom1234
    @TomiTom1234 11 месяцев назад

    Can you please tell me what program did you use to run the codes on @15:28 ?

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад +1

      Sure, it's a code editor from Microsoft, called "Visual Studio Code".

  • @spiritual_audiobooks
    @spiritual_audiobooks 6 месяцев назад

    What do you say to Applio TTS? Maybe the best Open Source TTS?

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      I didn't heard about Applio TTS. You say it's worth giving it a try?

  • @akemixx._0
    @akemixx._0 9 месяцев назад

    Is it possible to use AI even with texts in another language? I would really like to know because I want to dub a game with this tool.

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад

      I'm not sure about that. I'd recommend asking on Coqui community, but as Coqui AI (the company) has shut down i'm not sure on how fast you might get an answer.

  • @saadjutt1660
    @saadjutt1660 3 месяца назад

    Is there any way we can push this trained model to huggingface? Like once we give the audio sample and next time when pushed to huggingface hub we only need to pass the text to generate the audio with respective voice?

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Do you mean the actual model or a space to use the model out of the box?

  • @PlayGameToday
    @PlayGameToday 6 месяцев назад

    What parameters I need to include to make audio output more quality? It's looks like only 96kbps bitrate..

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад +1

      Normally generated output is the same samplerate as the voice dataset the model has been trained on. Maybe you can use tools like ffmpeg to adjust samplerate afterwards, but i doubt if this will increase the quality.

    • @PlayGameToday
      @PlayGameToday 6 месяцев назад

      @@ThorstenMueller I need to train my own model in 48KHz, so the output will be more quality

  • @MYODM.
    @MYODM. 3 месяца назад

    Can I hire you for a few hours? I need help with a project that’s deeply personal and I would like to go the local hosting route.

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад

      Feel free to contact me here (with some additional info). www.thorsten-voice.de/en/contact/

  • @gonzaloorellanatech
    @gonzaloorellanatech 2 месяца назад

    how we can get a more fast response?... better hardware?, ram? processing? ... thsnks for the video!

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад +1

      First, you're welcome :). Do you use cpu or gpu? Because gpu (CUDA) provides faster response.

    • @gonzaloorellanatech
      @gonzaloorellanatech 2 месяца назад

      @@ThorstenMueller thnks for your response. Yea!... GPU, but my notebook is only to development... i need better process to audio files from cloning voice tts

  • @Aiolia_Games
    @Aiolia_Games 7 месяцев назад

    Posso usar essa voz para narrar um vídeo no RUclips?

  • @marcinziajkowski3870
    @marcinziajkowski3870 6 месяцев назад

    Can we create ready to use object instead of "speaker_wav" list passed every time we generate "output.wav" ? to speed up process ?

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад +1

      As i'm not sure, i'd recommend asking on Coqui community on github. But as Coqui AI (the company) has shut down, i'm not sure on how fast you might get a reaction.

  • @congtaihu1287
    @congtaihu1287 7 месяцев назад

    thank you for this video! i am running into problems. when i execute the script, it shows "AssertionError: CUDA is not availabe on this machine.". But i have cuda12.3 and compatible torch and my other ai software ran well. i have no idea what is happening. please help!

    • @ThorstenMueller
      @ThorstenMueller  7 месяцев назад

      Does it work if you use it with "use_cuda false" in general?

  • @jab4li
    @jab4li 2 месяца назад

    If i install xtts on my computer, i can use unlimited characters? Because the demo version on huggingface has 200 characters limitation.
    Thanks.

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад +1

      This should be the case. The limitation is part of their Huggingface space and should not apply locally.
      huggingface.co/spaces/coqui/xtts/blob/d3b67acd01a3f63524371ad7d35a044ac0e75f60/app.py#L200

    • @jab4li
      @jab4li 2 месяца назад

      @@ThorstenMueller Nice, i'm gonna try it. Thanks!

  • @ignacioalonsol
    @ignacioalonsol 6 месяцев назад

    Has anyone made a comparison between xtts and piper training? I'm curious on what's better quality @thorsten?

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад +1

      Personally i prefer Piper. But i trained my models in piper with way more input data then the 6 seconds input to xtts.

  • @Chriscs7
    @Chriscs7 8 месяцев назад

    What is better this or Tortoise TTS (Ecker Voice Clone) ?

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      Hard to say, as i didn't give Tortoise TTS a closer look, but it's still on my todo list.

  • @RossDCurrie
    @RossDCurrie 5 месяцев назад

    "ERROR: Failed building wheel for tts" - What version of python are you running?

    • @ThorstenMueller
      @ThorstenMueller  5 месяцев назад

      This error often occurs when you use an older version of pip. Did you run "pip install pip setuptools wheel -U" before installing Coqui?

    • @RossDCurrie
      @RossDCurrie 5 месяцев назад

      @@ThorstenMueller ​ This may have been the issue. Played around with it a bit and got it working again, but can't recall exactly which thing I did differently. Thanks for the reply though!
      If you're looking for content ideas, one thing I am struggling with is how this all fits together now, in June 2024. Specifically - when I start the server and hit the local webserver, I get a very different UI than what I see in other videos on XTTS. And I know there are all different UIs for XTTS - there's a fine tuning one, a web UI, RVC, etc. and some of them have bits that don't work, and it sounds like Coqui has abandoned the project now and... it's hard to catch up on it all when coming into it for the first time, and it changes so rapidly.
      So I guess what I'm trying to figure out is - if I want to build an AI voice clone of me, today, what's the strategy/stack you recommend?

  • @nomadhgnis9425
    @nomadhgnis9425 10 месяцев назад

    have a question for you. IF I wanted to pause for a number of seconds between sentences then how can I do that. Piper is really cool. Thanks.

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      Normally this is an aspect of SSML (Speech Synthesis Markup Language), which is by now not supported by Coqui and Piper. Maybe you can try a workaround and add multiple dots (....) to create a pause. But i didn't try it out myself.

    • @nomadhgnis9425
      @nomadhgnis9425 10 месяцев назад

      @@ThorstenMueller thanks. will try that.

    • @nomadhgnis9425
      @nomadhgnis9425 10 месяцев назад

      @@ThorstenMueller just tried it. I put dots where I wanted to pause bit it does not work. It only responds to one dot.

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      ​@@nomadhgnis9425 Okay, then maybe it's a workaround to create multiple tts wave files and merge them together including pauses. That's not an optimal way but it could do the job.

    • @nomadhgnis9425
      @nomadhgnis9425 10 месяцев назад

      @@ThorstenMueller I found a way. I am using debian. I had to create a 3 second silent wav file and split the paragraphs into different wav files and then merge them together with the ilent wav where I need it. I done this with a bash script. So problem solved. Do you know where I can get more voice files other then the ones listed.

  • @ari4340
    @ari4340 9 месяцев назад

    Hello! I've been using this on hugging face for a few months, but today when I went to the page this error appears: Runtime error
    Scheduling failure: not enough hardware capacity
    Container logs:
    Fetching error logs...
    Any idea of what's happening? Thank you!

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад +1

      According to the error message the XTTS container does not have enough compute power on Huggingface platform. This might be a temporary problem or might relate to the shutdown of Coqui AI as a company.

    • @ari4340
      @ari4340 9 месяцев назад

      @@ThorstenMueller Thanks for your reply! I hope it's not the later, It's the only free and online option that I knew of 😓

  • @rogerperez9856
    @rogerperez9856 10 месяцев назад

    Hello, do you know why when converting a text of about 500 words it takes about 25 minutes?

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      I didn't try it with such long texts. Is it faster when you split it into smaller pieces and put the chunks together in post generation?

  • @alexlavertyau
    @alexlavertyau 10 месяцев назад

    I have tried a some voice cloning tools and provided my voice as a reference audio, but none of the results sound anything like me... : ( I have an australian accent but the generated voices come out with American accents, not sure what I'm doing wrong.

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад +1

      I guess you're doing nothing wrong. Maybe the english model has been trained on a voice dataset with hours of native english speaking people and one phrase has not enough "power" to change the accent. Normally i'd recommend asking in Coqui TTS community, but as Coqui is shutting down, it might take some time to get an answer, because of other priorities maybe.

  • @MrScesher
    @MrScesher 10 месяцев назад

    Hi Thorsten,
    I can't get it to run. I always receive "No module named 'TTS.api'; 'TTS' is not a package" Even though the tts package is installed. Pip lists it in the installed packages.
    The few threads I found are no help. Maybe you have an idea?

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      This is strange. If "pip list" shows the tts package then it seems that everything is installed correctly. Are you running your python script really in the right python venv? Can you run "tts --help" in the command line successful?

    • @MrScesher
      @MrScesher 10 месяцев назад

      @@ThorstenMueller The tts command in the console works. tts --list_models too.
      And yes i am running the created venv.

    • @MrScesher
      @MrScesher 10 месяцев назад

      @@ThorstenMueller I managed to get it running briefly when I use the setup of the git repo. But it is only working in that terminal and after closing it everything is gone with it. Thats not a solution, because the setup is taking too long.

  • @PhantasyAI0
    @PhantasyAI0 10 месяцев назад

    I love your videos bro but you gotta speak a bit faster XD I have to play the video at 1.5x speed haha still love the videos!

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад +1

      Hehe, thanks for your suggestion. I'll keep it in mind for next videos. As a non-native english speaker i have to think a little while for the right words 😆.

  • @Zimba-box
    @Zimba-box 9 месяцев назад

    I got this line or error code when I wanted to in the wheel -U: ERROR: Could not build wheels for tts, which is required to install pyproject.toml-based projects how to fix that?

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад

      Did you update pip to latest version first - "pip install pip setuptools wheel -U"?

  • @IngridUterus
    @IngridUterus 9 месяцев назад

    Hey, ich habe das über Pinokio installiert, da ich es anders nicht zum laufen gebracht habe. Allerdings weiß ich nicht, wie ich bei coqui-tts auf GPU umstellen kann. Welche Datei muss ich öffnen? Auch die Geisterstimmen möchte ich gerne verhindern. Weißt du wo ich da was einstellen muss? Ich weiß, dass es möglich ist, da ich einen Telegram-Bot verwende, der mit coqui arbeitet und fehlerfrei funktioniert, allerdings mit starker Zeichenbegrenzung. Achja, Zeichenbegrenzung :D wo kann ich die auch ändern? Danke dir im vorraus

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад +1

      Bei den Coqui TTS Modellen gibt es einen Kommandozeilenparameter "--use_cuda". Damit sollte die GPU genutzt werden. Zur Länge kannst Du mal versuchen die Konfigurationsdatei des Modells zu öffnen und den Wert von "max_decoder_steps" zu erhöhen (habe ich aber bei XTTS selber noch nicht versucht). Viel Erfolg 😊.

    • @IngridUterus
      @IngridUterus 9 месяцев назад

      @@ThorstenMueller danke. Das werde ich heute Abend mal versuchen. Wo genau finde ich die Konfigurationsdatei? Ist das die configs.py im TTS Ordner? Gibt es auch eine Möglichkeit, die Fehler am Ende von Sätzen und in den Stellen zwischen den Sätzen zu vermeiden? Oft entstehen da auch eine Art Geisterstimmen, die echt seltsam klingen xD

    • @ThorstenMueller
      @ThorstenMueller  9 месяцев назад

      @@IngridUterus Hast Du die config Datei gefunden?

    • @IngridUterus
      @IngridUterus 9 месяцев назад

      @@ThorstenMueller Ja, ich habe eine bessere variante für coqui-tts gefunden, die wesentlich einfacher für Anfänger ist. Kann ich dir nur empfehlen: Alltalk_tts

  • @tsunderes_were_a_mistake
    @tsunderes_were_a_mistake 10 месяцев назад

    I tried it on huggingface with Japanese but it sounded robotic. Can you make a tutorial on how to finetune xtts on local?

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      Thanks for your topic suggestion. I've added it on my TODO list but it might take some time.

  • @EfficioIgnisVitae
    @EfficioIgnisVitae 10 месяцев назад

    I'm getting this issue where when I try to check for models this happens:
    LLVM ERROR: Symbol not found: __svml_cosf8_ha
    Anyone know what's going on here?

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      That's strange. Maybe recreate your python venv and reinstall. Maybe there's an error in your installation.

  • @saadjutt1660
    @saadjutt1660 3 месяца назад

    Can I still use this toturial? since Coqui is shut down. Plus can I use it for cloning Urdu language?

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Honestly i'm not sure on the future of XTTS (model, code and huggingface space) cause of their shutdown. But right now code and space is still available so it should still work as described but please let me know if you experience bigger problems.

  • @john_blues
    @john_blues 11 месяцев назад

    Is this able to pull text from a text file? I have a Tortoise version that can do it, and it is helpful for long form text.

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      IMHO this isn't supported by now. But finding a suitable solution for that is on my TODO list.

    • @john_blues
      @john_blues 10 месяцев назад

      @@ThorstenMueller For some reason my reply keeps getting deleted. Anyhow, I run a local TTS that can pull from a text file. Maybe it will help you. It is by neonbjb on Github.

    • @ThatPain1
      @ThatPain1 8 месяцев назад

      @john_blues You can totally read in, one or muliple files via python, transform the text as you like, and use xtts to generate a synthetic speech audiofile from it.
      Im using i currently to create sort of a audobook from a fanfiction.
      Removing points at end of sentences improved the result quite a lot.

  • @GESTOR-SITES
    @GESTOR-SITES 6 месяцев назад

    How to fix
    "ERROR: Could not build wheels for tts, which is required to install pyproject.toml-based projects"
    chatgpt cannot help me.
    it´s necessary downgrade python?

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      Did you update the python dependencies in your environment? So running "pip install setuptools wheel pip -U"

  • @orcunaicovers17
    @orcunaicovers17 6 месяцев назад

    It says Cuda is not available on this machine

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      I'm working on a video about CUDA. If you want i can post an update here when it's online 😊.

    • @orcunaicovers17
      @orcunaicovers17 6 месяцев назад

      @@ThorstenMueller I've solved the problem. Torch and CUDA version should be compatible with each other

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      @@orcunaicovers17 Happy you could solve it 😊.

  • @asanostudio
    @asanostudio 9 месяцев назад

    Have you made a video tutorial to create a voice model for Indonesian, or how to add a voice model, I want to make an Indonesian voice model

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      No. But as Coqui (company) shut down i'm not sure on further development of their code. Maybe it's worth taking a look to Piper TTS for training an Indonesian tts model. ruclips.net/video/b_we_jma220/видео.html

  • @yousfalaadi5322
    @yousfalaadi5322 Месяц назад

    can i use a big text dataset?

  • @tapikoBlends
    @tapikoBlends 20 дней назад

    Love thi channel 😊😊😊

  • @chrsl3
    @chrsl3 11 месяцев назад

    Amazing result.

  • @ratside9485
    @ratside9485 11 месяцев назад

    Kannst du auch zeigen, wie man es finetune kann? Aber Lokal? Danke

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад +1

      Danke für deinen Themenvorschlag 😊. Ich habe es auf meine TODO Liste gesetzt.

    • @ratside9485
      @ratside9485 11 месяцев назад

      @@ThorstenMueller gibt inzwischen auch auf GitHub ein WebUI fürs finetunen 🙌 funktioniert ganz gut. Das einzige was noch ein Problem ist das sich die Einstellungen ändern Temperatur und Co hab da Stunden ausprobiert es werden immer Sätze übersprungen.

  • @TNMPlayer
    @TNMPlayer 10 месяцев назад

    For some reason my terminal doesn't run in the venv.

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      Could you successfully create a venv and just can't activate it or can't you create it?

    • @TNMPlayer
      @TNMPlayer 10 месяцев назад

      @@ThorstenMueller the venv created just fine but I couldn’t open a terminal within it

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      @@TNMPlayer That's strange. Do you use the .bat or powershell (.ps1) file to activate the venv?

    • @TNMPlayer
      @TNMPlayer 10 месяцев назад

      @@ThorstenMueller I used the .ps1

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад

      @@TNMPlayer Maybe try out the .bat version, this could have an effect.

  • @characters1210
    @characters1210 7 месяцев назад

    Can i make code clone Arabic voice and read arabic text

    • @ThorstenMueller
      @ThorstenMueller  7 месяцев назад

      I've no experience using Arabic with XTTS. Did you already try it using their Huggingface space?

  • @animations.ki.anokhi.duniya
    @animations.ki.anokhi.duniya 7 месяцев назад

    Coqui tts is shotting down?

    • @ThorstenMueller
      @ThorstenMueller  7 месяцев назад

      Sadly, yes. I've made a short about it. ruclips.net/user/shortsQMruRTlQu7I?si=JyDY8ziFJC8omAPY

  • @NoxmilesDe
    @NoxmilesDe 11 месяцев назад

    Is there a TTS for Android?

    • @ThorstenMueller
      @ThorstenMueller  11 месяцев назад

      IMHO by now there's no support for Coqui und Piper TTS on Android. But this would be really cool 😎. Did you ask already at their communities?

  • @stefanporath8392
    @stefanporath8392 9 месяцев назад

    Hello Thorsten,
    great video tutorials but xtts is not for me. No support for windows and never will be. No chance on older macs with nvidia cards because of lacking drivers. No support on linux without cuda. I was really looking forward to this but I simply don't have the time to fidel around for days or weeks. Thank you.

  • @Schawum
    @Schawum 3 месяца назад

    --- hallo, bitte das tutorial nochmal auf deutsch. weil das würde mich wirklich sehr interessieren. aber englsich verstehe ich kein wort.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Hallo, helfen dir vielleicht zunächst die automatisch auf Deutsch übersetzen Untertitel?

    • @Schawum
      @Schawum 3 месяца назад

      @@ThorstenMueller die sind immer aus bei mir. weil ich beim lesen dem video nicht volgen kann. daher bringt mir das nicht wirklich was.

  • @Bonk1971
    @Bonk1971 11 месяцев назад

    Not for commercial use. We need a truly open solution.

    • @juanjesusligero391
      @juanjesusligero391 11 месяцев назад

      Yeah, it's a shame it's not 100% open. Fortunatelly, we'll always have Tortoise tts :)

    • @chryseus1331
      @chryseus1331 10 месяцев назад

      Who cares it's not like they're going to sue you if you do.

    • @juanjesusligero391
      @juanjesusligero391 10 месяцев назад +1

      @@chryseus1331They could, though. If you have a company and want to use a software for commercial use, I wouldn't recommend ignoring its license.

  • @insanitytoons
    @insanitytoons 6 месяцев назад

    Cloning a voice with a sample of just 6 seconds even though it's not 100% identical, for me that's an AI that really needs to be improved, these AI that need dozens of hours to clone a voice didn't interest me much, I did it several tests using samples longer than 30, 60, 80 seconds in various languages and some were perfect, I also copied dozens of voices available on websites and the results were also very good, I suggest saving each audio generated in a different file because each The generated audio will never be the same as the previous one.

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад +1

      Josh Meyer (co-founder of Coqui AI) mentioned in my XTTS interview that 6 seconds audio input duration should be perfect for XTTS model. ruclips.net/video/XsOM1WZ0k84/видео.html

  • @JamBassMusic
    @JamBassMusic 9 месяцев назад

    Thank you!!

  • @developerzava
    @developerzava 6 месяцев назад

    TTS is available on python 3.12?

    • @ThorstenMueller
      @ThorstenMueller  6 месяцев назад

      According their README python 3.11 is the max supported version. As Coqui AI hat shut down i'm not sure if or when this will be adjusted to higher python version.

  • @alexeyshmelev9115
    @alexeyshmelev9115 3 месяца назад

    "all you need is 6 second audio" is just nonsense. It is not enough and the result is miles away from anything close to the original.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      I agree, at least on my personal tests with my foreign (german) pronunciation. The result has been far away from being a high class voice clone. Have you seen my interview with Josh (Coqui AI co-founder)? ruclips.net/video/XsOM1WZ0k84/видео.html)

  • @Silberschweifer
    @Silberschweifer 2 месяца назад

    oh no desynchorn video

    • @Silberschweifer
      @Silberschweifer 2 месяца назад

      do you clap your hands by recording?

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад

      No, but thanks for the idea to optimize video/audio sync but clapping 👍.

  • @רחלישדה-ה4מ
    @רחלישדה-ה4מ 8 месяцев назад

    must GPU?

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      Generally (not sure for XTTS in special) CPU might work but way slower than using a CUDA enabled GPU.

    • @רחלישדה-ה4מ
      @רחלישדה-ה4מ 8 месяцев назад

      if i want to clone my own voice,i need to train this?how?@@ThorstenMueller

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      @@רחלישדה-ה4מ I'd recommend you taking a look to Piper TTS for that. ruclips.net/video/b_we_jma220/видео.html

    • @רחלישדה-ה4מ
      @רחלישדה-ה4מ 7 месяцев назад

      thanks!@@ThorstenMueller

  • @michaelroberts1120
    @michaelroberts1120 8 месяцев назад

    This is only interesting to developers and programmers. Regular hpbbyists will find this video useless, because Coqui has no GUI or server.

    • @ThorstenMueller
      @ThorstenMueller  8 месяцев назад

      Coqui TTS has a simple web UI if you run it locally where you can synthesize audio.

  • @tonysolar284
    @tonysolar284 10 месяцев назад

    coqui is now dead

    • @ThorstenMueller
      @ThorstenMueller  10 месяцев назад +1

      Sadly yes, at least the company, let's see what's happening with the code and community.

  • @MuhammadChanif-cp2ut
    @MuhammadChanif-cp2ut 6 месяцев назад

    Anjai

  • @FrankGlencairn
    @FrankGlencairn 11 месяцев назад

    Leider ist das ohne UI ein verdammter Alptraum für jeden der kein Programmierer ist.

    • @starbuck1002
      @starbuck1002 11 месяцев назад

      Ja, dann benutz doch einfach das UI! xD

    • @ratside9485
      @ratside9485 11 месяцев назад

      Kannst Pinokio nutzen, mit automatischer Installation hat das Web UI von Huggingface

    • @FrankGlencairn
      @FrankGlencairn 10 месяцев назад

      @@ratside9485 leider bekomm ich da immer ne Fehlermeldung bei der installation,