Best AI Voice Generator | 2024.08

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024

Комментарии • 67

  • @guilherme1556
    @guilherme1556 Месяц назад +5

    I loved this type of content Thorsten. You made it so easy for me to test some TTS models I wanted to for using in some home automation projects. You are the best Thorsten, thank you so much 🎉 🎉🎉

    • @ThorstenMueller
      @ThorstenMueller  27 дней назад +1

      Wow, thanks for your amazing feedback 🥰.

  • @MitchRSA
    @MitchRSA Месяц назад +1

    I remember back in 1995, using the MAC TTS for the first time at the age of 12. That sense of wonder and awe... you took me back there... Thank you Thorsten!

  • @ashwinsveta
    @ashwinsveta 29 дней назад +1

    Thank you so much for your effort in this, it really helped me ❤

    • @ThorstenMueller
      @ThorstenMueller  27 дней назад

      Thanks for your very nice feedback and you're welcome 😊.

  • @clemeaux.1
    @clemeaux.1 29 дней назад +3

    Hallo Thorsten und ein herzhaftes Mopn, Moin, aus dem Norden und Danke für dieses Video! Bezüglich deiner Frage, was ich als Bestandteil deiner geplanten Folgen zu den jeweiligen TTS-Systemen gern hören/sehen würde: Für mich (und wahrscheinlich auch viele andere) wäre interessant, wie sich die jeweiligen Modelle in lokale Desktop-Anwendungen (wie etwa Open-WebUI, Text-Genneration-WebUI., LM-studio, Koboldcpp, etc.) einbinden lassen, bzw. ob das überhaupt möglich ist. Da du dich in deinen Videos häufig mit der Thematik lokal laufender Annwendungen auseinandersetzt, dürfte dies wohl sowieso ein naheliegendes Thema sein...
    Hello Thorsten! Greetings from the north of Germany and many thx for this video! Regarding your question about what I'd like to see covered in the upcoming videos about the the different TTS-models, that you're planning to create: I guess it's not only me who would be interested in how it will be possible (or if, anyway) to integrate those TTS-engines into desktop-apps running LLM's locally like: Open-WebUI, Text-Generation-WebUI (Oobabooga), LM-Studio, Koboldcpp, etc. Since running TTS locally seems to be the topic of several of the videos we find on your channel, this might be something that is close to you anyway...

    • @ThorstenMueller
      @ThorstenMueller  27 дней назад

      Guude, bzw. Moin in den Norden 😊.
      ich habe deine Anmerkungen mal in meinen "Katalog" für die Detail-Videos aufgenommen - vielen Dank dafür.

  • @pedroorden
    @pedroorden 16 дней назад

    thanks Thorsten, greetings from buenos aires, argentina

  • @greggwelker4733
    @greggwelker4733 12 дней назад

    Fantastic Thorsten, very useful and informative.

    • @ThorstenMueller
      @ThorstenMueller  6 дней назад

      Thank you so much for your nice feedback 😊.

  • @BalamuruganCRA
    @BalamuruganCRA 17 дней назад

    Thank you, Man, for this wonderful infermation

  • @iknowwhy2629
    @iknowwhy2629 22 дня назад +1

    Hi. thank you for your videos. I'm kinda new to this so I don't know much about all this. is there any "good" tts for people that have AMD gpus and are using windows? if there is, can you connect them to something like koboldAI and how?

    • @ThorstenMueller
      @ThorstenMueller  15 дней назад

      Thanks for your feedback 😊. In that case i'd try piper tts which has "good" quality, runs performant on cpu, so without nvidia/cuda gpus, works on windows and you should be able to integrate it into other processes.

  • @willthecat3861
    @willthecat3861 Месяц назад

    I'd like to hear more about 'integration' or TTS... for reading text...not just for amusing myself, by cloning my voice.

    • @ThorstenMueller
      @ThorstenMueller  6 дней назад

      You mean use cases for TTS, such as screenreaders or voice assistants?

  • @nastastic
    @nastastic 3 дня назад

    What one would be able to create a cartoon character voice? I tried a couple of huggingface models but no luck getting a sample voice in to work on building a new voice.

    • @ThorstenMueller
      @ThorstenMueller  День назад

      Most spaces on huggingface provide sort of zero shot voice cloning with a few seconds of audio input. Mostly not leading to a great clone.
      If you want duplicate a voice (take care of permissions and legal aspects) try piper tts (see tutorials on my channel). Or maybe try parler tts by huggingface which provides a prompt to describe your (cartoon) voice. Maybe the right prompt might create the voice output you are looking for.

  • @Insidestoryland
    @Insidestoryland 12 дней назад

    there is any way to train a model voice model on my own voice, after this safe the parameter of my voice safe a file and next time when i need text to speech use only these parameter to generate voice: Coqui-TTS with this model..... help me please. i search all over the internet did not find any solution

    • @ThorstenMueller
      @ThorstenMueller  6 дней назад

      Yes, it is possible to create your own TTS voice clone and just input multiple text to synthesize. Is this what you mean? Do you know my Piper TTS voice clone tutorial? ruclips.net/video/b_we_jma220/видео.htmlsi=ZE8bSEVpum6ddMBr

  • @porky1118
    @porky1118 Месяц назад

    1:39 I don't really care how some TTS sounds. Most importantly I care how easy is it to use.
    I currently use TTS to convert dialog heavy stories into audio. So I need support for multiple voices for a single audio file, or at least a way to generate the text of multiple people at once.
    Currently I use a rust program, which uses piper. It can convert a multi person text document into speech. I specify the voices in a separate markdown inspired file.
    When generating the speech a second time, only the edited segments are regenerated. If I edit the parameters of a voice, only the segments using this voice are regenerated.

    • @ThorstenMueller
      @ThorstenMueller  26 дней назад +1

      IMHO Piper TTS has SSML support on their project roadmap. This should make things easier to switch between voices in one sentence by adjusting XML based tags.

    • @Ravisidharthan
      @Ravisidharthan 10 дней назад

      What are you using?

    • @porky1118
      @porky1118 10 дней назад

      @@Ravisidharthan Piper with some custom program, which is called d2s on my GitLab.
      I just set it public, but it doesn't have any documentation yet.

    • @ThorstenMueller
      @ThorstenMueller  2 дня назад

      @@Ravisidharthan Primarily Piper

  • @thegtlab
    @thegtlab 28 дней назад

    Best open source library for fine-tuning custom voices? Im currently using alltalktts and the models come out decent, just wondering if there is anything better.

    • @ThorstenMueller
      @ThorstenMueller  26 дней назад

      Thanks for your hint for "AllTalk TTS". I've heard this a few times but not taken a closer look. You think it's worth a closer look?

    • @thegtlab
      @thegtlab 26 дней назад

      @@ThorstenMueller 100% it includes a ton of documentation and helpful tips, the installer is just one click. Fine tuning a model is a breeze...they walk you through the process step by step.

  • @RoshnaOmer94
    @RoshnaOmer94 27 дней назад

    Thank you for going over these models! I really enjoyed it!
    I have a question about Parler TTS. I want to train in on languages like Arabic that don't use English letters, do you think that could be possible? I tried using Common Voice as an example but failed

    • @ThorstenMueller
      @ThorstenMueller  26 дней назад +1

      Thanks for your nice feedback 😊. I'm not sure about their support for languages with non Latin letters, like Arabic. I will take a closer look to training a model from scratch using Parler TTS with my german "Thorsten-Voice" german dataset - maybe i'll find something on this process for Arabic language.

    • @RoshnaOmer94
      @RoshnaOmer94 26 дней назад

      @@ThorstenMueller Thank you so much! Looking forward to it!

  • @garthok6224
    @garthok6224 21 день назад

    I wonder which one is better for training a Spanish model. I want to convert books to audio with s better voice than Android. Any guidance?

    • @ThorstenMueller
      @ThorstenMueller  2 дня назад

      IMHO opinion you might look for Piper TTS. You can listen to spanish voices here: rhasspy.github.io/piper-samples/

  • @safnasthegreat7153
    @safnasthegreat7153 25 дней назад

    could you do a video about how to train TTS for our native languages. there are videos but those videos are now old and there are some updates. we would really appreciate if you do for both linux and windows

    • @ThorstenMueller
      @ThorstenMueller  15 дней назад

      Thanks for your feedback. For which tts software do you wish an updated tutorial (Coqui or Piper)? At first i'll give other tts software (like Parler tts) a look on how to create your own tts voice. But i can try to update an existing tutorial to newer release of their software in nearer future.

  • @Aristocle
    @Aristocle 7 дней назад

    Which of these are multi-lingual? in particular those who speak Italian?

    • @ThorstenMueller
      @ThorstenMueller  6 дней назад +1

      Toucan TTS support "nearly all" (around 7k) languages.

  • @suhass9837
    @suhass9837 28 дней назад

    Is it possible for two speakers can you help us to find two speakers supported models?

    • @ThorstenMueller
      @ThorstenMueller  26 дней назад

      What do you mean by "two speakers"? Do you mean switching between two different voices in one sentence?

    • @suhass9837
      @suhass9837 26 дней назад

      @@ThorstenMueller yes your right.

  • @Dseen4u
    @Dseen4u 21 день назад

    How I learn voice cloning and voice accent

    • @ThorstenMueller
      @ThorstenMueller  15 дней назад

      If your training data contains accents the tts model will learn it from the training data. Do you know my piper tts voice cloning tutorial? ruclips.net/video/b_we_jma220/видео.htmlsi=VbiQrIY9CwEdX7z5

  • @NLPprompter
    @NLPprompter Месяц назад

    developers who are do open source... they don't know they might change someone live into better living... i got blind friend it is never been so happy moment for her listening humanlike speech... she said maybe someday she could get a emotional speech driven by context paragraph it read, she said imagine if she reading (listening) a novel with automatic switching voice and emotionally accurate referred by the story...

    • @ThorstenMueller
      @ThorstenMueller  27 дней назад +1

      Thanks for your feedback. I agree that open source can change the world to the good. I'm pretty optimistic that emotional speech will come (in nearer future) which your blind friend can hopefully use for novel tts reading.

    • @NLPprompter
      @NLPprompter 27 дней назад

      @@ThorstenMueller yes, what a beautiful future already.

  • @helloworld7796
    @helloworld7796 Месяц назад

    Is PiperTTS still the best to do training?

    • @ThorstenMueller
      @ThorstenMueller  27 дней назад

      Right now i'd say yes. But this might change if i tested "Parler TTS" and "Toucan TTS" with their training features.

    • @helloworld7796
      @helloworld7796 27 дней назад

      @@ThorstenMueller Thanks, I will take a look at them as well

  • @Ravisidharthan
    @Ravisidharthan 10 дней назад

    What is the best option for mac offline?

    • @ThorstenMueller
      @ThorstenMueller  6 дней назад

      Piper or Coqui TTS. But i did not give these models mentioned in the video a closer and more in detail look, so my recommendation might change afterwards ;-).

  • @Mystinarium
    @Mystinarium 10 дней назад

    Hallo Thorsten, ich habe dir eine Mail geschrieben, ich würde mich freuen, wenn du guggen könntest 😅. Es geht um dein tolles Programm und ich hab da ein Problem. Keine Angst, ich bin das Problem, nicht dein Programm. 😇 Danke dir.

    • @ThorstenMueller
      @ThorstenMueller  2 дня назад

      Hallo, ich habe dir auf die Mail geantwortet 😊.

  • @Dseen4u
    @Dseen4u 21 день назад

    I am bigger how i learn ai voice cloning and accent

    • @Dseen4u
      @Dseen4u 15 дней назад

      Sir tell me how I learn plz

    • @Dseen4u
      @Dseen4u 15 дней назад

      Sir plz I am a bigger, because I don't know about this tell me first step

  • @lennoyl
    @lennoyl 28 дней назад

    I stupidly though Parler would speak French language but it doesn't seem to...

    • @ThorstenMueller
      @ThorstenMueller  27 дней назад

      As the name is sounding a little bit french (at least for my german ears) i understand your thought :-). According their space "trained using 45k hours of narrated English audiobooks" the available model in english only. But imho you can use their project to create a tts voice for any language. But i'll try to find out when working on Parler TTS detail video.

  • @softvision3000
    @softvision3000 26 дней назад

    Nice German accent. 😂

  • @AltMarc
    @AltMarc 29 дней назад

    Whole video is pretty pointless, can't find out which one is better, cloning your foreign accent doesn't help much too and the programming language/OS isn't useful (would be better to know if it uses CPU/CUDA/METAL and how fast is its inference)... Try cloning the voice of the Professor in Futurama.
    Your T-shirt sums it up.

  • @not_lexxzaa
    @not_lexxzaa 21 день назад +1

    So i want to ask about a tool that can extract from a person. Like for example if i want a person with their specific language and they can use their voice. The tool will allow to record the voice first and automatically extract it. Once that happens, that voice can be converted into AI Generated voice on that same voice and accent in just few words.
    From this, we can test if we type a few words from text to speech. That specific custom generated AI voice that is extracted will convert the speech to the exact voice and accent itself. Is there a specific tool for that?

    • @ThorstenMueller
      @ThorstenMueller  2 дня назад

      So, you're talking about this zero shot or speech2speech tech? No voice cloning from scratch but imitate individual voice and speech flow based on an existing model?

    • @not_lexxzaa
      @not_lexxzaa 2 дня назад

      @@ThorstenMueller yes currently Coqui doesn't support Asia language a lot so it's limited. I just wanna know for example to to implement a custom voice that can voice clone on that same accent itself