Best AI Voice Generator | 2024.08

Поделиться
HTML-код
  • Опубликовано: 25 дек 2024

Комментарии • 105

  • @MitchRSA
    @MitchRSA 4 месяца назад +3

    I remember back in 1995, using the MAC TTS for the first time at the age of 12. That sense of wonder and awe... you took me back there... Thank you Thorsten!

  • @guilherme1556
    @guilherme1556 4 месяца назад +7

    I loved this type of content Thorsten. You made it so easy for me to test some TTS models I wanted to for using in some home automation projects. You are the best Thorsten, thank you so much 🎉 🎉🎉

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад +1

      Wow, thanks for your amazing feedback 🥰.

  • @greggwelker4733
    @greggwelker4733 3 месяца назад

    Fantastic Thorsten, very useful and informative.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Thank you so much for your nice feedback 😊.

    • @elshaddaiadonai8528
      @elshaddaiadonai8528 3 месяца назад

      ​@@ThorstenMueller Which of the Free TTS is Monetizable on RUclips From your Experience ?
      ??

  • @EdTimTVLive
    @EdTimTVLive 20 дней назад

    It is a nice and useful video. Thank you. I am looking at various options right now.

  • @pedroorden
    @pedroorden 4 месяца назад

    thanks Thorsten, greetings from buenos aires, argentina

  • @Storytelling-by-ash
    @Storytelling-by-ash 4 месяца назад +1

    Thank you so much for your effort in this, it really helped me ❤

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      Thanks for your very nice feedback and you're welcome 😊.

  • @BalamuruganCRA
    @BalamuruganCRA 4 месяца назад

    Thank you, Man, for this wonderful infermation

  • @clemeaux.1
    @clemeaux.1 4 месяца назад +3

    Hallo Thorsten und ein herzhaftes Mopn, Moin, aus dem Norden und Danke für dieses Video! Bezüglich deiner Frage, was ich als Bestandteil deiner geplanten Folgen zu den jeweiligen TTS-Systemen gern hören/sehen würde: Für mich (und wahrscheinlich auch viele andere) wäre interessant, wie sich die jeweiligen Modelle in lokale Desktop-Anwendungen (wie etwa Open-WebUI, Text-Genneration-WebUI., LM-studio, Koboldcpp, etc.) einbinden lassen, bzw. ob das überhaupt möglich ist. Da du dich in deinen Videos häufig mit der Thematik lokal laufender Annwendungen auseinandersetzt, dürfte dies wohl sowieso ein naheliegendes Thema sein...
    Hello Thorsten! Greetings from the north of Germany and many thx for this video! Regarding your question about what I'd like to see covered in the upcoming videos about the the different TTS-models, that you're planning to create: I guess it's not only me who would be interested in how it will be possible (or if, anyway) to integrate those TTS-engines into desktop-apps running LLM's locally like: Open-WebUI, Text-Generation-WebUI (Oobabooga), LM-Studio, Koboldcpp, etc. Since running TTS locally seems to be the topic of several of the videos we find on your channel, this might be something that is close to you anyway...

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      Guude, bzw. Moin in den Norden 😊.
      ich habe deine Anmerkungen mal in meinen "Katalog" für die Detail-Videos aufgenommen - vielen Dank dafür.

  • @porky1118
    @porky1118 4 месяца назад

    1:39 I don't really care how some TTS sounds. Most importantly I care how easy is it to use.
    I currently use TTS to convert dialog heavy stories into audio. So I need support for multiple voices for a single audio file, or at least a way to generate the text of multiple people at once.
    Currently I use a rust program, which uses piper. It can convert a multi person text document into speech. I specify the voices in a separate markdown inspired file.
    When generating the speech a second time, only the edited segments are regenerated. If I edit the parameters of a voice, only the segments using this voice are regenerated.

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад +1

      IMHO Piper TTS has SSML support on their project roadmap. This should make things easier to switch between voices in one sentence by adjusting XML based tags.

    • @Ravisidharthan
      @Ravisidharthan 3 месяца назад

      What are you using?

    • @porky1118
      @porky1118 3 месяца назад

      @@Ravisidharthan Piper with some custom program, which is called d2s on my GitLab.
      I just set it public, but it doesn't have any documentation yet.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      @@Ravisidharthan Primarily Piper

  • @musakurel
    @musakurel 19 часов назад

    which ones we can use with Swift CoreML ? Is it possible to make them run swift locally?

  • @iknowwhy2629
    @iknowwhy2629 4 месяца назад +1

    Hi. thank you for your videos. I'm kinda new to this so I don't know much about all this. is there any "good" tts for people that have AMD gpus and are using windows? if there is, can you connect them to something like koboldAI and how?

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      Thanks for your feedback 😊. In that case i'd try piper tts which has "good" quality, runs performant on cpu, so without nvidia/cuda gpus, works on windows and you should be able to integrate it into other processes.

  • @ŁukaszMadajczyk
    @ŁukaszMadajczyk Месяц назад

    Hi Thorsten,
    How many hours/steps you spent to trains your DE dataset to become usable model in couqi-tts?
    I'm trying to do some model training with my dataset (35 minutes of audio) and I start hearing some voice on 10k steps but it is far away from what I would like to get....

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад +2

      I used my Thorsten-Voice datasets containing over 20k recordings and training took over 2 month (around 500k steps) on an NVIDIA Jetson AGX device. You might be able to hear better, human sounding, like results after 100k steps.

  • @thegtlab
    @thegtlab 4 месяца назад

    Best open source library for fine-tuning custom voices? Im currently using alltalktts and the models come out decent, just wondering if there is anything better.

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      Thanks for your hint for "AllTalk TTS". I've heard this a few times but not taken a closer look. You think it's worth a closer look?

    • @thegtlab
      @thegtlab 4 месяца назад

      @@ThorstenMueller 100% it includes a ton of documentation and helpful tips, the installer is just one click. Fine tuning a model is a breeze...they walk you through the process step by step.

  • @GabrielLucas-hy5uq
    @GabrielLucas-hy5uq 2 месяца назад

    Hi Thorsten, I use TTS with a different intention, my English pronunciation is not good, so I record an audio of myself speaking in English and use it as inference generating an audio with the same sentence.
    I currently use CoquiTTS, out of 100 audios that I generate from the same sentence, 7 have a similar intonation and emotion to the original audio 🤣.
    Would you have any recommendations for another TTS that can do the same better?

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад

      Maybe take a look to F5 tts. I'm working on a video about it 😉.

  • @RoshnaOmer94
    @RoshnaOmer94 4 месяца назад

    Thank you for going over these models! I really enjoyed it!
    I have a question about Parler TTS. I want to train in on languages like Arabic that don't use English letters, do you think that could be possible? I tried using Common Voice as an example but failed

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад +1

      Thanks for your nice feedback 😊. I'm not sure about their support for languages with non Latin letters, like Arabic. I will take a closer look to training a model from scratch using Parler TTS with my german "Thorsten-Voice" german dataset - maybe i'll find something on this process for Arabic language.

    • @RoshnaOmer94
      @RoshnaOmer94 4 месяца назад

      @@ThorstenMueller Thank you so much! Looking forward to it!

  • @garthok6224
    @garthok6224 4 месяца назад

    I wonder which one is better for training a Spanish model. I want to convert books to audio with s better voice than Android. Any guidance?

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      IMHO opinion you might look for Piper TTS. You can listen to spanish voices here: rhasspy.github.io/piper-samples/

  • @louiereyes1306
    @louiereyes1306 3 месяца назад

    Thanks Thorsten! I'm interested in Parler, is there a way to extend the number of characters it can process. My use case is short stories to be converted to audio book. I only know basic python.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      You are welcome 😊. I am just giving Pareler a closer look. But right now i can not give you a good answer. I will keep you updated on my Parler progress and when i might be able to answer your question.

  • @AbdulAzizKhan-m8d
    @AbdulAzizKhan-m8d 2 месяца назад

    Nice explain ❤, tts voice clone + run in low end pc?????

  • @rvanner
    @rvanner 20 дней назад

    What's the best TTS for use in an Apple and Android app locally (ie no server connecting)?

    • @ThorstenMueller
      @ThorstenMueller  4 дня назад

      That's a good question. Honestly i have not taken a closer look to tts on smartphones so i can't tell you (yet).

  • @ŁukaszMadajczyk
    @ŁukaszMadajczyk 2 месяца назад

    Hello Thorsten, is it possible for you to show how to install and use Bark multi-lingual TTS model ?

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад

      Thanks for your comment. "Bark" is already on my TODO list ;-).

  • @helloworld7796
    @helloworld7796 4 месяца назад

    Is PiperTTS still the best to do training?

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      Right now i'd say yes. But this might change if i tested "Parler TTS" and "Toucan TTS" with their training features.

    • @helloworld7796
      @helloworld7796 4 месяца назад

      @@ThorstenMueller Thanks, I will take a look at them as well

  • @nastastic
    @nastastic 3 месяца назад

    What one would be able to create a cartoon character voice? I tried a couple of huggingface models but no luck getting a sample voice in to work on building a new voice.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Most spaces on huggingface provide sort of zero shot voice cloning with a few seconds of audio input. Mostly not leading to a great clone.
      If you want duplicate a voice (take care of permissions and legal aspects) try piper tts (see tutorials on my channel). Or maybe try parler tts by huggingface which provides a prompt to describe your (cartoon) voice. Maybe the right prompt might create the voice output you are looking for.

  • @safnasthegreat7153
    @safnasthegreat7153 4 месяца назад

    could you do a video about how to train TTS for our native languages. there are videos but those videos are now old and there are some updates. we would really appreciate if you do for both linux and windows

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      Thanks for your feedback. For which tts software do you wish an updated tutorial (Coqui or Piper)? At first i'll give other tts software (like Parler tts) a look on how to create your own tts voice. But i can try to update an existing tutorial to newer release of their software in nearer future.

  • @suhass9837
    @suhass9837 4 месяца назад

    Is it possible for two speakers can you help us to find two speakers supported models?

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      What do you mean by "two speakers"? Do you mean switching between two different voices in one sentence?

    • @suhass9837
      @suhass9837 4 месяца назад

      @@ThorstenMueller yes your right.

  • @Marshaal__27
    @Marshaal__27 3 месяца назад

    hey there thorsten i just came across your channel and it so amziang i get the stuffs i was looking for ,these tts model but i have a question iis there a one where he nvidia graphics card is not necessary and it sounds very much human like with easy setup and probably a ui. thank you

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Thanks for your kind feedback 😊. Without gpu/NVIDIA CUDA i would say use Piper TTS. Produces good and natural results and runs performant on CPU - even on newer revisions of a raspberry pi.

    • @Marshaal__27
      @Marshaal__27 3 месяца назад

      @@ThorstenMueller thank you so much i would give it a try

  • @iseahosbourne9064
    @iseahosbourne9064 2 месяца назад

    He thorsten, what is the best overhaul voice cloning ai tool both locally and remotely? RVC, tortoise tts fast, coqui, so-vits, xtts?

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад

      Right now i like Piper tts. Running local and fast - even on newer revisions of a raspberry pi. Training takes some time.

    • @timcollins2421
      @timcollins2421 2 месяца назад

      @@ThorstenMuellerWhat would you recommend for Windows? I can’t get anything at all to work for GPU based local voice cloning. Would really appreciate some guidance for late 2024 as most repos seem to be falling apart and have no documentation

    • @RDUBTutorial
      @RDUBTutorial 2 месяца назад

      I see you are on Mac for this video. What to do you recommend for ease of install and runs locally on Mac m2 ?

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад

      @@timcollins2421 Have you tried Coqui TTS (not longer under active development) on Windows for voice cloning? ruclips.net/video/bJjzSo_fOS8/видео.htmlsi=h-oQMhlcUiO7FL3P

  • @ColinNardo-le3bl
    @ColinNardo-le3bl Месяц назад

    Hi Thorsten!
    I want to make a portfolio website where people can talk to myself. Id have a text to text that knows everything about me and that would go to a tts of my own voice to tell it what to say each time. My problem is hosting. I dont understand how the APIs of these tts models work and how id be able to host it as most gpu hosting websites offer per hour rates which seem very expensive.. what do i do! maybe ive got the wrong approach..

    • @ColinNardo-le3bl
      @ColinNardo-le3bl Месяц назад

      I also forgot to mention i do have a mini pc I can run 24/7 but it doesnt have a gpu

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад

      When you have a 24/7 pc running you should clone your voice using piper which runs performant and local on even without a gpu. Do you know my video about that? ruclips.net/video/b_we_jma220/видео.html

    • @ColinNardo-le3bl
      @ColinNardo-le3bl 6 дней назад

      @@ThorstenMuellera bit of an update but it works amazingly!! I have a trained voice of myself and cpu inferencing pretty fast. I can talk to a small model connected to rag that then outputs to piper, it’s amazing tysm!! Now I’m only dealing with slight hallucinations…

  • @dondixon4206
    @dondixon4206 2 месяца назад

    Hi Sir, Your video is Fantastic!!! .. well done!!! The most valuable feature of TTS for me is the ability to highlight words or generate visemes (or even phone numbers) in real time as the text is spoken. This functionality is incredibly important to my work, and I am wondering if any voices or systems provide this capability. Specifically, I am looking for a method to capture spoken words, phrases, or syllables as they are being generated and displayed in real time.
    While I have had success with SAPI 5 on Windows for this purpose, I have been unable to find similar solutions for Linux, particularly on my Raspberry Pi setup. My goal is to run me
    TTS locally with a childlike voice and to extract key elements such as word highlighting or real-time Phoneme generation. Any guidance or support on achieving these tasks would be greatly appreciated. Thank you!

    • @ThorstenMueller
      @ThorstenMueller  Месяц назад

      Thanks for your nice feedback 😊. I am not aware of a tts solution that highlights words while speaking. But as piper tts has a streaming function this might be worth taking a look.

  • @hifidrache5366
    @hifidrache5366 3 месяца назад

    Hallo Thorsten, dein Programm das du hier vorstellst kann leider gar kein deutsch. Aber dafür kannst du ja nichts. Hoffe das es bald bessere models gibt. xtts verschluckt in Version 2.02 leider beim generieren manchmal Wörter oder dichtet welche hinzu. Bisher habe ich kein Weg gefunden das stabil ist. Aber ich werde das weiter beobachten.

  • @ennergie
    @ennergie 3 месяца назад +1

    Lieber Thorsten. Als ADSler fällt es mir sehr schwer, lange Texte zu lesen. Ich kann viel besser Informationen verarbeiten, wenn ich sie höre. Die beste Sprachsynthese, wenn es schnell gehen muss, liefert meiner Erfahrung nach leider immer noch ege auf Windows. Aber ich suche regelmäßig nach einer bessern Computerstimme. XTTS war eine deutliche Verbesserung, was Betonung betrifft. Leider wurden manchmal Worte verschluckt. Ich folge deinen Videos aufmerksam und erwarte gespannt deinen Test von Meta Voice. etc. Ich finde deine Arbeit wichtig und bin dir für deine Mühe sehr dankbar. Weiter so.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Vielen lieben Dank für dein schönes Feedback 😊. Aktuell schaue ich mir zuerst Parler TTS an. Aber MetaVoice werde ich mir auch noch anschauen.

  • @Dseen4u
    @Dseen4u 4 месяца назад

    How I learn voice cloning and voice accent

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      If your training data contains accents the tts model will learn it from the training data. Do you know my piper tts voice cloning tutorial? ruclips.net/video/b_we_jma220/видео.htmlsi=VbiQrIY9CwEdX7z5

  • @Aristocle
    @Aristocle 3 месяца назад

    Which of these are multi-lingual? in particular those who speak Italian?

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад +1

      Toucan TTS support "nearly all" (around 7k) languages.

  • @NLPprompter
    @NLPprompter 4 месяца назад +1

    developers who are do open source... they don't know they might change someone live into better living... i got blind friend it is never been so happy moment for her listening humanlike speech... she said maybe someday she could get a emotional speech driven by context paragraph it read, she said imagine if she reading (listening) a novel with automatic switching voice and emotionally accurate referred by the story...

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад +1

      Thanks for your feedback. I agree that open source can change the world to the good. I'm pretty optimistic that emotional speech will come (in nearer future) which your blind friend can hopefully use for novel tts reading.

    • @NLPprompter
      @NLPprompter 4 месяца назад

      @@ThorstenMueller yes, what a beautiful future already.

  • @Insidestoryland
    @Insidestoryland 3 месяца назад

    there is any way to train a model voice model on my own voice, after this safe the parameter of my voice safe a file and next time when i need text to speech use only these parameter to generate voice: Coqui-TTS with this model..... help me please. i search all over the internet did not find any solution

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Yes, it is possible to create your own TTS voice clone and just input multiple text to synthesize. Is this what you mean? Do you know my Piper TTS voice clone tutorial? ruclips.net/video/b_we_jma220/видео.htmlsi=ZE8bSEVpum6ddMBr

  • @lennoyl
    @lennoyl 4 месяца назад

    I stupidly though Parler would speak French language but it doesn't seem to...

    • @ThorstenMueller
      @ThorstenMueller  4 месяца назад

      As the name is sounding a little bit french (at least for my german ears) i understand your thought :-). According their space "trained using 45k hours of narrated English audiobooks" the available model in english only. But imho you can use their project to create a tts voice for any language. But i'll try to find out when working on Parler TTS detail video.

  • @Ravisidharthan
    @Ravisidharthan 3 месяца назад

    What is the best option for mac offline?

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Piper or Coqui TTS. But i did not give these models mentioned in the video a closer and more in detail look, so my recommendation might change afterwards ;-).

  • @gotonethatcansee
    @gotonethatcansee 2 месяца назад

    link for piper onnx ?

    • @ThorstenMueller
      @ThorstenMueller  2 месяца назад

      Piper is the software, each model has a onnx model file. You might check this: github.com/rhasspy/piper/blob/master/VOICES.md

  • @Dseen4u
    @Dseen4u 4 месяца назад

    I am bigger how i learn ai voice cloning and accent

    • @Dseen4u
      @Dseen4u 4 месяца назад

      Sir tell me how I learn plz

    • @Dseen4u
      @Dseen4u 4 месяца назад

      Sir plz I am a bigger, because I don't know about this tell me first step

  • @Mystinarium
    @Mystinarium 3 месяца назад

    Hallo Thorsten, ich habe dir eine Mail geschrieben, ich würde mich freuen, wenn du guggen könntest 😅. Es geht um dein tolles Programm und ich hab da ein Problem. Keine Angst, ich bin das Problem, nicht dein Programm. 😇 Danke dir.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      Hallo, ich habe dir auf die Mail geantwortet 😊.

  • @willthecat3861
    @willthecat3861 4 месяца назад

    I'd like to hear more about 'integration' or TTS... for reading text...not just for amusing myself, by cloning my voice.

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      You mean use cases for TTS, such as screenreaders or voice assistants?

  • @softvision3000
    @softvision3000 4 месяца назад

    Nice German accent. 😂

  • @AltMarc
    @AltMarc 4 месяца назад

    Whole video is pretty pointless, can't find out which one is better, cloning your foreign accent doesn't help much too and the programming language/OS isn't useful (would be better to know if it uses CPU/CUDA/METAL and how fast is its inference)... Try cloning the voice of the Professor in Futurama.
    Your T-shirt sums it up.

  • @not_lexxzaa
    @not_lexxzaa 4 месяца назад +1

    So i want to ask about a tool that can extract from a person. Like for example if i want a person with their specific language and they can use their voice. The tool will allow to record the voice first and automatically extract it. Once that happens, that voice can be converted into AI Generated voice on that same voice and accent in just few words.
    From this, we can test if we type a few words from text to speech. That specific custom generated AI voice that is extracted will convert the speech to the exact voice and accent itself. Is there a specific tool for that?

    • @ThorstenMueller
      @ThorstenMueller  3 месяца назад

      So, you're talking about this zero shot or speech2speech tech? No voice cloning from scratch but imitate individual voice and speech flow based on an existing model?

    • @not_lexxzaa
      @not_lexxzaa 3 месяца назад

      @@ThorstenMueller yes currently Coqui doesn't support Asia language a lot so it's limited. I just wanna know for example to to implement a custom voice that can voice clone on that same accent itself