I remember back in 1995, using the MAC TTS for the first time at the age of 12. That sense of wonder and awe... you took me back there... Thank you Thorsten!
I loved this type of content Thorsten. You made it so easy for me to test some TTS models I wanted to for using in some home automation projects. You are the best Thorsten, thank you so much 🎉 🎉🎉
Hallo Thorsten und ein herzhaftes Mopn, Moin, aus dem Norden und Danke für dieses Video! Bezüglich deiner Frage, was ich als Bestandteil deiner geplanten Folgen zu den jeweiligen TTS-Systemen gern hören/sehen würde: Für mich (und wahrscheinlich auch viele andere) wäre interessant, wie sich die jeweiligen Modelle in lokale Desktop-Anwendungen (wie etwa Open-WebUI, Text-Genneration-WebUI., LM-studio, Koboldcpp, etc.) einbinden lassen, bzw. ob das überhaupt möglich ist. Da du dich in deinen Videos häufig mit der Thematik lokal laufender Annwendungen auseinandersetzt, dürfte dies wohl sowieso ein naheliegendes Thema sein... Hello Thorsten! Greetings from the north of Germany and many thx for this video! Regarding your question about what I'd like to see covered in the upcoming videos about the the different TTS-models, that you're planning to create: I guess it's not only me who would be interested in how it will be possible (or if, anyway) to integrate those TTS-engines into desktop-apps running LLM's locally like: Open-WebUI, Text-Generation-WebUI (Oobabooga), LM-Studio, Koboldcpp, etc. Since running TTS locally seems to be the topic of several of the videos we find on your channel, this might be something that is close to you anyway...
1:39 I don't really care how some TTS sounds. Most importantly I care how easy is it to use. I currently use TTS to convert dialog heavy stories into audio. So I need support for multiple voices for a single audio file, or at least a way to generate the text of multiple people at once. Currently I use a rust program, which uses piper. It can convert a multi person text document into speech. I specify the voices in a separate markdown inspired file. When generating the speech a second time, only the edited segments are regenerated. If I edit the parameters of a voice, only the segments using this voice are regenerated.
IMHO Piper TTS has SSML support on their project roadmap. This should make things easier to switch between voices in one sentence by adjusting XML based tags.
Hi. thank you for your videos. I'm kinda new to this so I don't know much about all this. is there any "good" tts for people that have AMD gpus and are using windows? if there is, can you connect them to something like koboldAI and how?
Thanks for your feedback 😊. In that case i'd try piper tts which has "good" quality, runs performant on cpu, so without nvidia/cuda gpus, works on windows and you should be able to integrate it into other processes.
Hi Thorsten, How many hours/steps you spent to trains your DE dataset to become usable model in couqi-tts? I'm trying to do some model training with my dataset (35 minutes of audio) and I start hearing some voice on 10k steps but it is far away from what I would like to get....
I used my Thorsten-Voice datasets containing over 20k recordings and training took over 2 month (around 500k steps) on an NVIDIA Jetson AGX device. You might be able to hear better, human sounding, like results after 100k steps.
Best open source library for fine-tuning custom voices? Im currently using alltalktts and the models come out decent, just wondering if there is anything better.
@@ThorstenMueller 100% it includes a ton of documentation and helpful tips, the installer is just one click. Fine tuning a model is a breeze...they walk you through the process step by step.
Hi Thorsten, I use TTS with a different intention, my English pronunciation is not good, so I record an audio of myself speaking in English and use it as inference generating an audio with the same sentence. I currently use CoquiTTS, out of 100 audios that I generate from the same sentence, 7 have a similar intonation and emotion to the original audio 🤣. Would you have any recommendations for another TTS that can do the same better?
Thank you for going over these models! I really enjoyed it! I have a question about Parler TTS. I want to train in on languages like Arabic that don't use English letters, do you think that could be possible? I tried using Common Voice as an example but failed
Thanks for your nice feedback 😊. I'm not sure about their support for languages with non Latin letters, like Arabic. I will take a closer look to training a model from scratch using Parler TTS with my german "Thorsten-Voice" german dataset - maybe i'll find something on this process for Arabic language.
Thanks Thorsten! I'm interested in Parler, is there a way to extend the number of characters it can process. My use case is short stories to be converted to audio book. I only know basic python.
You are welcome 😊. I am just giving Pareler a closer look. But right now i can not give you a good answer. I will keep you updated on my Parler progress and when i might be able to answer your question.
What one would be able to create a cartoon character voice? I tried a couple of huggingface models but no luck getting a sample voice in to work on building a new voice.
Most spaces on huggingface provide sort of zero shot voice cloning with a few seconds of audio input. Mostly not leading to a great clone. If you want duplicate a voice (take care of permissions and legal aspects) try piper tts (see tutorials on my channel). Or maybe try parler tts by huggingface which provides a prompt to describe your (cartoon) voice. Maybe the right prompt might create the voice output you are looking for.
could you do a video about how to train TTS for our native languages. there are videos but those videos are now old and there are some updates. we would really appreciate if you do for both linux and windows
Thanks for your feedback. For which tts software do you wish an updated tutorial (Coqui or Piper)? At first i'll give other tts software (like Parler tts) a look on how to create your own tts voice. But i can try to update an existing tutorial to newer release of their software in nearer future.
hey there thorsten i just came across your channel and it so amziang i get the stuffs i was looking for ,these tts model but i have a question iis there a one where he nvidia graphics card is not necessary and it sounds very much human like with easy setup and probably a ui. thank you
Thanks for your kind feedback 😊. Without gpu/NVIDIA CUDA i would say use Piper TTS. Produces good and natural results and runs performant on CPU - even on newer revisions of a raspberry pi.
@@ThorstenMuellerWhat would you recommend for Windows? I can’t get anything at all to work for GPU based local voice cloning. Would really appreciate some guidance for late 2024 as most repos seem to be falling apart and have no documentation
@@timcollins2421 Have you tried Coqui TTS (not longer under active development) on Windows for voice cloning? ruclips.net/video/bJjzSo_fOS8/видео.htmlsi=h-oQMhlcUiO7FL3P
Hi Thorsten! I want to make a portfolio website where people can talk to myself. Id have a text to text that knows everything about me and that would go to a tts of my own voice to tell it what to say each time. My problem is hosting. I dont understand how the APIs of these tts models work and how id be able to host it as most gpu hosting websites offer per hour rates which seem very expensive.. what do i do! maybe ive got the wrong approach..
When you have a 24/7 pc running you should clone your voice using piper which runs performant and local on even without a gpu. Do you know my video about that? ruclips.net/video/b_we_jma220/видео.html
@@ThorstenMuellera bit of an update but it works amazingly!! I have a trained voice of myself and cpu inferencing pretty fast. I can talk to a small model connected to rag that then outputs to piper, it’s amazing tysm!! Now I’m only dealing with slight hallucinations…
Hi Sir, Your video is Fantastic!!! .. well done!!! The most valuable feature of TTS for me is the ability to highlight words or generate visemes (or even phone numbers) in real time as the text is spoken. This functionality is incredibly important to my work, and I am wondering if any voices or systems provide this capability. Specifically, I am looking for a method to capture spoken words, phrases, or syllables as they are being generated and displayed in real time. While I have had success with SAPI 5 on Windows for this purpose, I have been unable to find similar solutions for Linux, particularly on my Raspberry Pi setup. My goal is to run me TTS locally with a childlike voice and to extract key elements such as word highlighting or real-time Phoneme generation. Any guidance or support on achieving these tasks would be greatly appreciated. Thank you!
Thanks for your nice feedback 😊. I am not aware of a tts solution that highlights words while speaking. But as piper tts has a streaming function this might be worth taking a look.
Hallo Thorsten, dein Programm das du hier vorstellst kann leider gar kein deutsch. Aber dafür kannst du ja nichts. Hoffe das es bald bessere models gibt. xtts verschluckt in Version 2.02 leider beim generieren manchmal Wörter oder dichtet welche hinzu. Bisher habe ich kein Weg gefunden das stabil ist. Aber ich werde das weiter beobachten.
Lieber Thorsten. Als ADSler fällt es mir sehr schwer, lange Texte zu lesen. Ich kann viel besser Informationen verarbeiten, wenn ich sie höre. Die beste Sprachsynthese, wenn es schnell gehen muss, liefert meiner Erfahrung nach leider immer noch ege auf Windows. Aber ich suche regelmäßig nach einer bessern Computerstimme. XTTS war eine deutliche Verbesserung, was Betonung betrifft. Leider wurden manchmal Worte verschluckt. Ich folge deinen Videos aufmerksam und erwarte gespannt deinen Test von Meta Voice. etc. Ich finde deine Arbeit wichtig und bin dir für deine Mühe sehr dankbar. Weiter so.
If your training data contains accents the tts model will learn it from the training data. Do you know my piper tts voice cloning tutorial? ruclips.net/video/b_we_jma220/видео.htmlsi=VbiQrIY9CwEdX7z5
developers who are do open source... they don't know they might change someone live into better living... i got blind friend it is never been so happy moment for her listening humanlike speech... she said maybe someday she could get a emotional speech driven by context paragraph it read, she said imagine if she reading (listening) a novel with automatic switching voice and emotionally accurate referred by the story...
Thanks for your feedback. I agree that open source can change the world to the good. I'm pretty optimistic that emotional speech will come (in nearer future) which your blind friend can hopefully use for novel tts reading.
there is any way to train a model voice model on my own voice, after this safe the parameter of my voice safe a file and next time when i need text to speech use only these parameter to generate voice: Coqui-TTS with this model..... help me please. i search all over the internet did not find any solution
Yes, it is possible to create your own TTS voice clone and just input multiple text to synthesize. Is this what you mean? Do you know my Piper TTS voice clone tutorial? ruclips.net/video/b_we_jma220/видео.htmlsi=ZE8bSEVpum6ddMBr
As the name is sounding a little bit french (at least for my german ears) i understand your thought :-). According their space "trained using 45k hours of narrated English audiobooks" the available model in english only. But imho you can use their project to create a tts voice for any language. But i'll try to find out when working on Parler TTS detail video.
Piper or Coqui TTS. But i did not give these models mentioned in the video a closer and more in detail look, so my recommendation might change afterwards ;-).
Hallo Thorsten, ich habe dir eine Mail geschrieben, ich würde mich freuen, wenn du guggen könntest 😅. Es geht um dein tolles Programm und ich hab da ein Problem. Keine Angst, ich bin das Problem, nicht dein Programm. 😇 Danke dir.
Whole video is pretty pointless, can't find out which one is better, cloning your foreign accent doesn't help much too and the programming language/OS isn't useful (would be better to know if it uses CPU/CUDA/METAL and how fast is its inference)... Try cloning the voice of the Professor in Futurama. Your T-shirt sums it up.
So i want to ask about a tool that can extract from a person. Like for example if i want a person with their specific language and they can use their voice. The tool will allow to record the voice first and automatically extract it. Once that happens, that voice can be converted into AI Generated voice on that same voice and accent in just few words. From this, we can test if we type a few words from text to speech. That specific custom generated AI voice that is extracted will convert the speech to the exact voice and accent itself. Is there a specific tool for that?
So, you're talking about this zero shot or speech2speech tech? No voice cloning from scratch but imitate individual voice and speech flow based on an existing model?
@@ThorstenMueller yes currently Coqui doesn't support Asia language a lot so it's limited. I just wanna know for example to to implement a custom voice that can voice clone on that same accent itself
I remember back in 1995, using the MAC TTS for the first time at the age of 12. That sense of wonder and awe... you took me back there... Thank you Thorsten!
🤖 - you're very welcome 😊.
I loved this type of content Thorsten. You made it so easy for me to test some TTS models I wanted to for using in some home automation projects. You are the best Thorsten, thank you so much 🎉 🎉🎉
Wow, thanks for your amazing feedback 🥰.
Fantastic Thorsten, very useful and informative.
Thank you so much for your nice feedback 😊.
@@ThorstenMueller Which of the Free TTS is Monetizable on RUclips From your Experience ?
??
It is a nice and useful video. Thank you. I am looking at various options right now.
Thanks for your nice comment 😊.
thanks Thorsten, greetings from buenos aires, argentina
Thank you so much for your effort in this, it really helped me ❤
Thanks for your very nice feedback and you're welcome 😊.
Thank you, Man, for this wonderful infermation
You're very welcome 😊.
Hallo Thorsten und ein herzhaftes Mopn, Moin, aus dem Norden und Danke für dieses Video! Bezüglich deiner Frage, was ich als Bestandteil deiner geplanten Folgen zu den jeweiligen TTS-Systemen gern hören/sehen würde: Für mich (und wahrscheinlich auch viele andere) wäre interessant, wie sich die jeweiligen Modelle in lokale Desktop-Anwendungen (wie etwa Open-WebUI, Text-Genneration-WebUI., LM-studio, Koboldcpp, etc.) einbinden lassen, bzw. ob das überhaupt möglich ist. Da du dich in deinen Videos häufig mit der Thematik lokal laufender Annwendungen auseinandersetzt, dürfte dies wohl sowieso ein naheliegendes Thema sein...
Hello Thorsten! Greetings from the north of Germany and many thx for this video! Regarding your question about what I'd like to see covered in the upcoming videos about the the different TTS-models, that you're planning to create: I guess it's not only me who would be interested in how it will be possible (or if, anyway) to integrate those TTS-engines into desktop-apps running LLM's locally like: Open-WebUI, Text-Generation-WebUI (Oobabooga), LM-Studio, Koboldcpp, etc. Since running TTS locally seems to be the topic of several of the videos we find on your channel, this might be something that is close to you anyway...
Guude, bzw. Moin in den Norden 😊.
ich habe deine Anmerkungen mal in meinen "Katalog" für die Detail-Videos aufgenommen - vielen Dank dafür.
1:39 I don't really care how some TTS sounds. Most importantly I care how easy is it to use.
I currently use TTS to convert dialog heavy stories into audio. So I need support for multiple voices for a single audio file, or at least a way to generate the text of multiple people at once.
Currently I use a rust program, which uses piper. It can convert a multi person text document into speech. I specify the voices in a separate markdown inspired file.
When generating the speech a second time, only the edited segments are regenerated. If I edit the parameters of a voice, only the segments using this voice are regenerated.
IMHO Piper TTS has SSML support on their project roadmap. This should make things easier to switch between voices in one sentence by adjusting XML based tags.
What are you using?
@@Ravisidharthan Piper with some custom program, which is called d2s on my GitLab.
I just set it public, but it doesn't have any documentation yet.
@@Ravisidharthan Primarily Piper
which ones we can use with Swift CoreML ? Is it possible to make them run swift locally?
Hi. thank you for your videos. I'm kinda new to this so I don't know much about all this. is there any "good" tts for people that have AMD gpus and are using windows? if there is, can you connect them to something like koboldAI and how?
Thanks for your feedback 😊. In that case i'd try piper tts which has "good" quality, runs performant on cpu, so without nvidia/cuda gpus, works on windows and you should be able to integrate it into other processes.
Hi Thorsten,
How many hours/steps you spent to trains your DE dataset to become usable model in couqi-tts?
I'm trying to do some model training with my dataset (35 minutes of audio) and I start hearing some voice on 10k steps but it is far away from what I would like to get....
I used my Thorsten-Voice datasets containing over 20k recordings and training took over 2 month (around 500k steps) on an NVIDIA Jetson AGX device. You might be able to hear better, human sounding, like results after 100k steps.
Best open source library for fine-tuning custom voices? Im currently using alltalktts and the models come out decent, just wondering if there is anything better.
Thanks for your hint for "AllTalk TTS". I've heard this a few times but not taken a closer look. You think it's worth a closer look?
@@ThorstenMueller 100% it includes a ton of documentation and helpful tips, the installer is just one click. Fine tuning a model is a breeze...they walk you through the process step by step.
Hi Thorsten, I use TTS with a different intention, my English pronunciation is not good, so I record an audio of myself speaking in English and use it as inference generating an audio with the same sentence.
I currently use CoquiTTS, out of 100 audios that I generate from the same sentence, 7 have a similar intonation and emotion to the original audio 🤣.
Would you have any recommendations for another TTS that can do the same better?
Maybe take a look to F5 tts. I'm working on a video about it 😉.
Thank you for going over these models! I really enjoyed it!
I have a question about Parler TTS. I want to train in on languages like Arabic that don't use English letters, do you think that could be possible? I tried using Common Voice as an example but failed
Thanks for your nice feedback 😊. I'm not sure about their support for languages with non Latin letters, like Arabic. I will take a closer look to training a model from scratch using Parler TTS with my german "Thorsten-Voice" german dataset - maybe i'll find something on this process for Arabic language.
@@ThorstenMueller Thank you so much! Looking forward to it!
I wonder which one is better for training a Spanish model. I want to convert books to audio with s better voice than Android. Any guidance?
IMHO opinion you might look for Piper TTS. You can listen to spanish voices here: rhasspy.github.io/piper-samples/
Thanks Thorsten! I'm interested in Parler, is there a way to extend the number of characters it can process. My use case is short stories to be converted to audio book. I only know basic python.
You are welcome 😊. I am just giving Pareler a closer look. But right now i can not give you a good answer. I will keep you updated on my Parler progress and when i might be able to answer your question.
Nice explain ❤, tts voice clone + run in low end pc?????
What's the best TTS for use in an Apple and Android app locally (ie no server connecting)?
That's a good question. Honestly i have not taken a closer look to tts on smartphones so i can't tell you (yet).
Hello Thorsten, is it possible for you to show how to install and use Bark multi-lingual TTS model ?
Thanks for your comment. "Bark" is already on my TODO list ;-).
Is PiperTTS still the best to do training?
Right now i'd say yes. But this might change if i tested "Parler TTS" and "Toucan TTS" with their training features.
@@ThorstenMueller Thanks, I will take a look at them as well
What one would be able to create a cartoon character voice? I tried a couple of huggingface models but no luck getting a sample voice in to work on building a new voice.
Most spaces on huggingface provide sort of zero shot voice cloning with a few seconds of audio input. Mostly not leading to a great clone.
If you want duplicate a voice (take care of permissions and legal aspects) try piper tts (see tutorials on my channel). Or maybe try parler tts by huggingface which provides a prompt to describe your (cartoon) voice. Maybe the right prompt might create the voice output you are looking for.
could you do a video about how to train TTS for our native languages. there are videos but those videos are now old and there are some updates. we would really appreciate if you do for both linux and windows
Thanks for your feedback. For which tts software do you wish an updated tutorial (Coqui or Piper)? At first i'll give other tts software (like Parler tts) a look on how to create your own tts voice. But i can try to update an existing tutorial to newer release of their software in nearer future.
Is it possible for two speakers can you help us to find two speakers supported models?
What do you mean by "two speakers"? Do you mean switching between two different voices in one sentence?
@@ThorstenMueller yes your right.
hey there thorsten i just came across your channel and it so amziang i get the stuffs i was looking for ,these tts model but i have a question iis there a one where he nvidia graphics card is not necessary and it sounds very much human like with easy setup and probably a ui. thank you
Thanks for your kind feedback 😊. Without gpu/NVIDIA CUDA i would say use Piper TTS. Produces good and natural results and runs performant on CPU - even on newer revisions of a raspberry pi.
@@ThorstenMueller thank you so much i would give it a try
He thorsten, what is the best overhaul voice cloning ai tool both locally and remotely? RVC, tortoise tts fast, coqui, so-vits, xtts?
Right now i like Piper tts. Running local and fast - even on newer revisions of a raspberry pi. Training takes some time.
@@ThorstenMuellerWhat would you recommend for Windows? I can’t get anything at all to work for GPU based local voice cloning. Would really appreciate some guidance for late 2024 as most repos seem to be falling apart and have no documentation
I see you are on Mac for this video. What to do you recommend for ease of install and runs locally on Mac m2 ?
@@timcollins2421 Have you tried Coqui TTS (not longer under active development) on Windows for voice cloning? ruclips.net/video/bJjzSo_fOS8/видео.htmlsi=h-oQMhlcUiO7FL3P
Hi Thorsten!
I want to make a portfolio website where people can talk to myself. Id have a text to text that knows everything about me and that would go to a tts of my own voice to tell it what to say each time. My problem is hosting. I dont understand how the APIs of these tts models work and how id be able to host it as most gpu hosting websites offer per hour rates which seem very expensive.. what do i do! maybe ive got the wrong approach..
I also forgot to mention i do have a mini pc I can run 24/7 but it doesnt have a gpu
When you have a 24/7 pc running you should clone your voice using piper which runs performant and local on even without a gpu. Do you know my video about that? ruclips.net/video/b_we_jma220/видео.html
@@ThorstenMuellera bit of an update but it works amazingly!! I have a trained voice of myself and cpu inferencing pretty fast. I can talk to a small model connected to rag that then outputs to piper, it’s amazing tysm!! Now I’m only dealing with slight hallucinations…
Hi Sir, Your video is Fantastic!!! .. well done!!! The most valuable feature of TTS for me is the ability to highlight words or generate visemes (or even phone numbers) in real time as the text is spoken. This functionality is incredibly important to my work, and I am wondering if any voices or systems provide this capability. Specifically, I am looking for a method to capture spoken words, phrases, or syllables as they are being generated and displayed in real time.
While I have had success with SAPI 5 on Windows for this purpose, I have been unable to find similar solutions for Linux, particularly on my Raspberry Pi setup. My goal is to run me
TTS locally with a childlike voice and to extract key elements such as word highlighting or real-time Phoneme generation. Any guidance or support on achieving these tasks would be greatly appreciated. Thank you!
Thanks for your nice feedback 😊. I am not aware of a tts solution that highlights words while speaking. But as piper tts has a streaming function this might be worth taking a look.
Hallo Thorsten, dein Programm das du hier vorstellst kann leider gar kein deutsch. Aber dafür kannst du ja nichts. Hoffe das es bald bessere models gibt. xtts verschluckt in Version 2.02 leider beim generieren manchmal Wörter oder dichtet welche hinzu. Bisher habe ich kein Weg gefunden das stabil ist. Aber ich werde das weiter beobachten.
Lieber Thorsten. Als ADSler fällt es mir sehr schwer, lange Texte zu lesen. Ich kann viel besser Informationen verarbeiten, wenn ich sie höre. Die beste Sprachsynthese, wenn es schnell gehen muss, liefert meiner Erfahrung nach leider immer noch ege auf Windows. Aber ich suche regelmäßig nach einer bessern Computerstimme. XTTS war eine deutliche Verbesserung, was Betonung betrifft. Leider wurden manchmal Worte verschluckt. Ich folge deinen Videos aufmerksam und erwarte gespannt deinen Test von Meta Voice. etc. Ich finde deine Arbeit wichtig und bin dir für deine Mühe sehr dankbar. Weiter so.
Vielen lieben Dank für dein schönes Feedback 😊. Aktuell schaue ich mir zuerst Parler TTS an. Aber MetaVoice werde ich mir auch noch anschauen.
How I learn voice cloning and voice accent
If your training data contains accents the tts model will learn it from the training data. Do you know my piper tts voice cloning tutorial? ruclips.net/video/b_we_jma220/видео.htmlsi=VbiQrIY9CwEdX7z5
Which of these are multi-lingual? in particular those who speak Italian?
Toucan TTS support "nearly all" (around 7k) languages.
developers who are do open source... they don't know they might change someone live into better living... i got blind friend it is never been so happy moment for her listening humanlike speech... she said maybe someday she could get a emotional speech driven by context paragraph it read, she said imagine if she reading (listening) a novel with automatic switching voice and emotionally accurate referred by the story...
Thanks for your feedback. I agree that open source can change the world to the good. I'm pretty optimistic that emotional speech will come (in nearer future) which your blind friend can hopefully use for novel tts reading.
@@ThorstenMueller yes, what a beautiful future already.
there is any way to train a model voice model on my own voice, after this safe the parameter of my voice safe a file and next time when i need text to speech use only these parameter to generate voice: Coqui-TTS with this model..... help me please. i search all over the internet did not find any solution
Yes, it is possible to create your own TTS voice clone and just input multiple text to synthesize. Is this what you mean? Do you know my Piper TTS voice clone tutorial? ruclips.net/video/b_we_jma220/видео.htmlsi=ZE8bSEVpum6ddMBr
I stupidly though Parler would speak French language but it doesn't seem to...
As the name is sounding a little bit french (at least for my german ears) i understand your thought :-). According their space "trained using 45k hours of narrated English audiobooks" the available model in english only. But imho you can use their project to create a tts voice for any language. But i'll try to find out when working on Parler TTS detail video.
What is the best option for mac offline?
Piper or Coqui TTS. But i did not give these models mentioned in the video a closer and more in detail look, so my recommendation might change afterwards ;-).
link for piper onnx ?
Piper is the software, each model has a onnx model file. You might check this: github.com/rhasspy/piper/blob/master/VOICES.md
I am bigger how i learn ai voice cloning and accent
Sir tell me how I learn plz
Sir plz I am a bigger, because I don't know about this tell me first step
Hallo Thorsten, ich habe dir eine Mail geschrieben, ich würde mich freuen, wenn du guggen könntest 😅. Es geht um dein tolles Programm und ich hab da ein Problem. Keine Angst, ich bin das Problem, nicht dein Programm. 😇 Danke dir.
Hallo, ich habe dir auf die Mail geantwortet 😊.
I'd like to hear more about 'integration' or TTS... for reading text...not just for amusing myself, by cloning my voice.
You mean use cases for TTS, such as screenreaders or voice assistants?
Nice German accent. 😂
Thänk ju 😆
Whole video is pretty pointless, can't find out which one is better, cloning your foreign accent doesn't help much too and the programming language/OS isn't useful (would be better to know if it uses CPU/CUDA/METAL and how fast is its inference)... Try cloning the voice of the Professor in Futurama.
Your T-shirt sums it up.
That was mean Yoooo
It's not as pointless as ur comment
So i want to ask about a tool that can extract from a person. Like for example if i want a person with their specific language and they can use their voice. The tool will allow to record the voice first and automatically extract it. Once that happens, that voice can be converted into AI Generated voice on that same voice and accent in just few words.
From this, we can test if we type a few words from text to speech. That specific custom generated AI voice that is extracted will convert the speech to the exact voice and accent itself. Is there a specific tool for that?
So, you're talking about this zero shot or speech2speech tech? No voice cloning from scratch but imitate individual voice and speech flow based on an existing model?
@@ThorstenMueller yes currently Coqui doesn't support Asia language a lot so it's limited. I just wanna know for example to to implement a custom voice that can voice clone on that same accent itself