Awesome 👏 I was literally trying that local approach last night because collab online sheet would timeout on long text to render , so I used my own computer + 1080 ti Nvidia card so I can leverage cuda lib. But then I got memory allocation issue I was stumbling around … Your timing is just perfect once again ! Keep up the good work dear Martin 👍
Disappointing tutorial. Guides the viewer into unsuccessfully installing on an M1 Mac, and then switches to a remote VPS. Hard to follow and I wasn't able to set this up successfully.
@@martin-thissen yep, worked like i treat! I literally spent 1 week of back and fourth trying to install this, and from the looks of it, it was due to install python 3.9 instead of 8 but im glad its installed now.
I think I've discovered one.... tortoise tts is now available as a plugin for blender - complete with gui. Its kinda neat, but operates within Blender though
I appreciate you making a tutorial on this. After watching i am still very confused on the process. I think you focused too much on specific installation first without even briefly explaining the process of cloning itself until very end of the tutorial, which makes the tutorial very hard to go through. It would help a lot if you explained what do we need required modules first, what are we essentially training to do and then tell how to install them. For instance ive never used pytorch and its not clear to me whether its needed or not. Ofc im gonna do more research, but just wanted to give some input how could you make stuff more accessible in the future.
Hi Martin, first off, thank you for the video, it's the clearest one I've yet to find on setting up Tortoise. having said that. I followed your instructions, and the link to your Github, and I still can't get this to work. what causes me problems is 2 folds. 1) you keep going with something that will not work, and only afterward explains what you had to fix, which means that by then, I'm already confused as to what you are doing. While I appreciate seeing the issues you had, as a teaching method, something more streamlined, only showing the correct process would be helpful. 2) you take for granted that the watcher has the same knowledge as you do, telling us to do something in what you called 'VI' or 'ID'? I couldn't quite tell what ou said, but I also have no idea what either of them are, and you don't link to something that can help us understand. then you simply rattle off information, copy paste things, and it works. following what, I think, is the equivalent, on your Github only get's me 'command not recognized' types response. would it be possible to create a local installation instructions, in text or video, but for the person who knows nothing about this? the one who accidentally came across voice AI video and thinks it could help in his work, but has no knowledge of programing at all? again, thank you for the clear video
Great live-code type tutorial (more realistical than sound-bite videos). Request: Since a lot of development requires venv env package management and not conda, could you including some of those details for people doing production deployment? Many thanks!
This is kind of frustrating to operate, I managed to get it up and running but I closed the miniconda prompt and now I can't get back to starting it again.
I still had dependency version missmatch errors. 4 hours later, I got it running by addeing the following: conda install -c cctbx202208 numpy conda install -c main llvm pip install pydantic==1.9.1
Thank you for this guide. Can you explain how you used Lambda Cloud to run this (as per your comment in the video)? I have a M1 Macbook Pro and am constantly frustrated trying to run GenAI without the GPU. Thanks.
@@bllaqattitude759 I uninstalled everything and then installed everything again. Just for you to know, Tortoise on local pc is very slow at generating long texts, VERY SLOW.
Hallo Martin, and hank for this awesome and *only* tutorial that works, given that this stuff is not for beginnes. Could you please make a video on how to make a Tortoise model on windows? The idea is to clone my own voice, but then I want to use the model to generate text, without running the same training process every time. Is that possible?
unrelated question, is there a limit to the amount of text tortoise can process? cause i want to make entire pdf files into audiobooks, and i can see how that would become an entire problem xd
Thank you for the tutorial on this, but I was wondering if you can connect this to an ai assistant of sorts and have it look up information or respond back to you with the cloned voice without having to type out the prompts you have it turn from text to speech.
@martin-thissen Hi, no detailed explanation to run the program after installation on Windows, how to start it up after installation, and then generate speech, or what particular script to run anytime we want to generate a speech, it would be nice to see a run guide after installation, thank you..
I doubt, it may be not using the GPU I have GPU in machine which is NVIDIA RTX 4090, but utilization of GPU is NOT increasing more than 2% through out the execution. It is working very slow. what to do??
Hello Martin , would using this way of working be as good as the results of using HiFi-GAN voice modeling, to achieve the goal of being able to generate audio that is very close to the original audio input in terms of quality and naturalness ? I mean to end up with a result that has the natural voice that is as close to the original tone and pitch from the imported audio , instead of a completely synthesized sound version in the result ?
Hey, overall I can really recommend using the Tortoise-TTS model for voice cloning, because the results are really good. But the model has a few downsides. First, it lacks in diversity of speech (e.g., accents). And second, it's really slow, especially if you compare it to the HiFi-GAN model. I personally haven't worked with the HiFi-GAN model yet, so I can't really say if the results are better than the one made with HiFi-GAN model. But if you don't want to set everything up on your local computer, you can also use a Colab notebook, I made a video about it: ruclips.net/video/FN3yxL0Rr0c/видео.html
How can i continue my pre-trained model later? I want to train it 20000 epochs for most realistic quality but i need to run day by day my same pre trained model..Is it possible on Easygui Colab? I have Pro colab
How to train a new voice using long audio (ex 10 min .wav file) I get this warn in the console "Text length too long (200 < 10578), using segments: Voice sample.wav Audio not segmented, segmenting: Voice sample.wav Sliced segments: 1 => 160." I wait and sliced segments are always 1 => 160 and nothing happens
Unfortunately the Tortoise-TTS model can only generate English speech. You can insert text of other languages, but it would be pronounced wrong and would sound off (I tried it for German). Since the model was trained with a multi-speaker English dataset, it won't be able to generate proper speech of other languages. The challenge here is to first create a multi-speaker dataset for a particular language, similar to the English dataset used. Then the model would need to be trained or fine-tuned on this dataset.
Hi got everything working Ok ias in your Video but can't figure out how to upload and use my wav files I think i need to use util but don't know how But at least I have the rest working thanks for a great video
please do a video how to set it up on cpu only. I have an M2 Mac and it is an absolute nightmare to get it working. And this is coming from a DevOps engineer
Hey, can we create our own voices with tortoise tts? Like, if I want to reproduce a character's voice, can I use samples from this voice to recreate it?
You skipped over lot of steps for the Windows install.. Like path enviroments and them not even being recommended so you might not be able to access conda that easily etc..
I tried it on local installation but it took so long just to generate 7 seconds clips. It utilized the GPU and it just very slow compared to conqui TTS.
Could you told about how to create a custom language Coqui-TTS model files , there many different little language even in a country , they need their custom models , thank you very much !
I can hear you're a real genius, but I totally don't understand what you're all talking about. You don't just walk a path here on how to, but make 56744 jumps to other paths in between. Next time stay on the subject...
Okay, I have a few issues. Right off the bat. I could not find Python 3.9 so I'd went with the closest version I could find which is python 3.9.4, another issue I'm dealing with currently is that whenever I use the command to install the requirements text file, the metadata will not be downloaded. Does anybody have any fixes for that?
torch can run on cpu and tortoise tts should be able to run on cpu as TTS can. but i also ran into the local model problem that this guy ran into, so who knows, maybe somebody got it to work.. somehow (without using a cloud GPU)? it seems to have a path reference issue to the model for me. the difference indeed might be that hes using a GPU here. so i dunno if anyone has got it running locally on cpu successfully?
@@nonameman7114 last i remember, you dont need to run cuda for that. you can run it on the CPU. but for me there was an issue with finding the model or downloading it, the same one he had in the video. there are a bunch of other AI libraries that struggle or dont run w/o CUDA though.
@@julsius in that case I might try it out. I’m guessing I’ll be sacrificing speed and quality of the voice without the nividia card ? Kinda like how blender works.
@@nonameman7114 i think i saw a comment here of someone running it on cpu and it took hours. but yeh it might not sacrifice quality but certainly time/performance (that depends how its coded). GPU can do vector math more efficiently which is why all AI stuff is better done on GPU and with intel that means CUDA which yeh is same with graphics processing.
I will make one and post it in the comments. His tutorial skips over alot tho. Like , you need to install python and everything else. Completely glazes over that shit.
@@martin-thissen Hi, no detailed explanation to run the program after installation on Windows, how to start it up after installation, and then generate speech, or what particular script to run anytime we want to generate a speech, it would be nice to see a run guide after installation, thank you..
Thanks for doing the tutorial with Conda, Ill use this method and your video, To show others how to use the tech... ...Because the other method, I basically had to learn a lot about Coding and im actually not happy about it at all. I learn enough languages doing CnC Engineering and its fucking cringe. None of my peers, either, Have time to learn all of this. I would like it if people were specialized in each field. Preferably, Able to make a foolproof step by step guide, At the least, So people not in the sector, Can use the tech
I think that English is the language of global communication, so it's no wonder that models are first developed for English language. But this is just the beginning. I'm sure we'll soon see models that have multilingual capabilities. Whisper from OpenAI (which does speech-to-text) already supports many languages. But stay tuned to my channel, I will definitely do more videos about multi-language speech synthesis.
Can anyone help me please, I know very very little about python. Im getting this error when it tries to open in the browser - File "C:\Users\\miniconda3\envs\tts-fast\lib\site-packages\streamlit untime\scriptrunner\script_runner.py", line 552, in _run_script exec(code, module.__dict__) File "C:\Windows\System32\tortoise-tts-fast\scripts\app.py", line 10, in from tortoise.api import MODELS_DIR
Thank you very much for that tutorial. That's something we've been waiting for! You are a very good youtuber!
Awesome 👏 I was literally trying that local approach last night because collab online sheet would timeout on long text to render , so I used my own computer + 1080 ti Nvidia card so I can leverage cuda lib. But then I got memory allocation issue I was stumbling around … Your timing is just perfect once again ! Keep up the good work dear Martin 👍
Thank you! :-) Glad it was helpful!
i had the same issue on collab :( 5 hours in and lost everything
can you help with a step-by-step guide on how you got it all setup and running on your windows pc, please?
Disappointing tutorial. Guides the viewer into unsuccessfully installing on an M1 Mac, and then switches to a remote VPS. Hard to follow and I wasn't able to set this up successfully.
Same ):
THE ONLY FUCKING TUTORIAL AROUND MATE!
HEADACE!
THANK YOU
Haha glad you liked it! :-)
@@martin-thissen yep, worked like i treat! I literally spent 1 week of back and fourth trying to install this, and from the looks of it, it was due to install python 3.9 instead of 8 but im glad its installed now.
@@iseahosbourne9064 Oh wow, glad it was helpful and you can finally start using the model! 🙂
@@iseahosbourne9064 how did you get it to run on windows?
@@bllaqattitude759 Or tts fast
Extremely underrated youtuber.
You definitely deserve more subscribers.
Thank you, appreciate it! :-)
Thank you for this Martin. I was literally trying to do run this locally so its great timing!
Glad I could help! :-)
@@martin-thissen Have you tried the ozen toolkit? ruclips.net/video/lnIq4SFFXWs/видео.html.
Thank you for this content . I was struuggling with dependencies in Windows and this gave me the solution. Definitely subscribed to your channel !
THX/DANKE!! lerne gerade so viel von deinen Videos
Can't believe I missed this thank you!
I do wish someone will eventually create a GUI interface for tortoise-tts.
Martin's got a web interface his 5x faster voice cloning video that's just about the most user friendly interface you can ask for!
@@greenockscatman A web interface offers no privacy, which is a total no-go
@@greenockscatman where can I find this?
I think I've discovered one.... tortoise tts is now available as a plugin for blender - complete with gui. Its kinda neat, but operates within Blender though
@@grrinc Really? I work with blender A ton! I would love to know if there's a proper download for that.
You look like "Tom Holland" (Spider-Man) 😆.
Your videos are amazing. Good work.
good video, nice tutorial, all information, success
Martin what do you do for longer text for example 2000 words ? , or should you split the text training data ?
Great Video, Martin! Keep it up:) Tom's voice sounds a lot like Tom Hanks to me.
I appreciate you making a tutorial on this. After watching i am still very confused on the process. I think you focused too much on specific installation first without even briefly explaining the process of cloning itself until very end of the tutorial, which makes the tutorial very hard to go through. It would help a lot if you explained what do we need required modules first, what are we essentially training to do and then tell how to install them. For instance ive never used pytorch and its not clear to me whether its needed or not. Ofc im gonna do more research, but just wanted to give some input how could you make stuff more accessible in the future.
Agree
you are a god, thank you so much for this, i was struggling so much with the version missmatches
Loved the video! Keep it up man
Awesome. great walkthrough
Thank you! :-)
Very nice this lecture sir
Hi Martin, first off, thank you for the video, it's the clearest one I've yet to find on setting up Tortoise.
having said that. I followed your instructions, and the link to your Github, and I still can't get this to work. what causes me problems is 2 folds. 1) you keep going with something that will not work, and only afterward explains what you had to fix, which means that by then, I'm already confused as to what you are doing. While I appreciate seeing the issues you had, as a teaching method, something more streamlined, only showing the correct process would be helpful. 2) you take for granted that the watcher has the same knowledge as you do, telling us to do something in what you called 'VI' or 'ID'? I couldn't quite tell what ou said, but I also have no idea what either of them are, and you don't link to something that can help us understand. then you simply rattle off information, copy paste things, and it works.
following what, I think, is the equivalent, on your Github only get's me 'command not recognized' types response.
would it be possible to create a local installation instructions, in text or video, but for the person who knows nothing about this? the one who accidentally came across voice AI video and thinks it could help in his work, but has no knowledge of programing at all?
again, thank you for the clear video
I second this!
So nice sir
Subscribed 😊
🙌
Great Tutorial!!! THNXX!
Great live-code type tutorial (more realistical than sound-bite videos). Request: Since a lot of development requires venv env package management and not conda, could you including some of those details for people doing production deployment? Many thanks!
This is kind of frustrating to operate, I managed to get it up and running but I closed the miniconda prompt and now I can't get back to starting it again.
What about the part of cloning an existing voice. That would have been nice.
I still had dependency version missmatch errors. 4 hours later, I got it running by addeing the following:
conda install -c cctbx202208 numpy
conda install -c main llvm
pip install pydantic==1.9.1
Thanks, your adjustments and pytorch-cuda-11.8 worked for me with WSL2 and RTX 4070.
Thanks a lot! Wasted half an hour trying to figure this out.
Thank you very useful, Love you !
how did you get it to run on windows?
Please run this program on pycharm😂 I think it may useful for my college project
Remove tortoise from cache
Good job!
Thank you for this guide. Can you explain how you used Lambda Cloud to run this (as per your comment in the video)? I have a M1 Macbook Pro and am constantly frustrated trying to run GenAI without the GPU. Thanks.
As AI voice technology advances, it becomes increasingly difficult to distinguish between human and machine-generated speech.
It already is. And it's already being used for scams.
where is the colab?
10:46 How do I open the file on Windows? I got: "python: can't open file 'tts.py': [Errno 2] No such file or directory" :(
where you able to figure it out?
@@bllaqattitude759 I uninstalled everything and then installed everything again.
Just for you to know, Tortoise on local pc is very slow at generating long texts, VERY SLOW.
Thank you, could you please detail how to perform this when you have a Mac architecture ? I have a M3 mac, no GPU. Thank you
can this voice generator do in another languages?
Hallo Martin, and hank for this awesome and *only* tutorial that works, given that this stuff is not for beginnes.
Could you please make a video on how to make a Tortoise model on windows? The idea is to clone my own voice, but then I want to use the model to generate text, without running the same training process every time. Is that possible?
unrelated question, is there a limit to the amount of text tortoise can process? cause i want to make entire pdf files into audiobooks, and i can see how that would become an entire problem xd
Thank you for the tutorial on this, but I was wondering if you can connect this to an ai assistant of sorts and have it look up information or respond back to you with the cloned voice without having to type out the prompts you have it turn from text to speech.
Does it works on MacBook air m1?
+1 For Vi.
@martin-thissen Hi, no detailed explanation to run the program after installation on Windows, how to start it up after installation, and then generate speech, or what particular script to run anytime we want to generate a speech, it would be nice to see a run guide after installation, thank you..
I doubt, it may be not using the GPU I have GPU in machine which is NVIDIA RTX 4090, but utilization of GPU is NOT increasing more than 2% through out the execution. It is working very slow. what to do??
Awesome content but the installation process just gave me anxiety. Now going through existential crisis. Setting up Django seems better than this.
Hello Martin , would using this way of working be as good as the results of using HiFi-GAN voice modeling, to achieve the goal of being able to generate audio that is very close to the original audio input in terms of quality and naturalness ? I mean to end up with a result that has the natural voice that is as close to the original tone and pitch from the imported audio , instead of a completely synthesized sound version in the result ?
Hey, overall I can really recommend using the Tortoise-TTS model for voice cloning, because the results are really good. But the model has a few downsides. First, it lacks in diversity of speech (e.g., accents). And second, it's really slow, especially if you compare it to the HiFi-GAN model. I personally haven't worked with the HiFi-GAN model yet, so I can't really say if the results are better than the one made with HiFi-GAN model. But if you don't want to set everything up on your local computer, you can also use a Colab notebook, I made a video about it: ruclips.net/video/FN3yxL0Rr0c/видео.html
The lack of links in the description makes this tutorial so time consuming to follow.
How can i continue my pre-trained model later? I want to train it 20000 epochs for most realistic quality but i need to run day by day my same pre trained model..Is it possible on Easygui Colab? I have Pro colab
How to train a new voice using long audio (ex 10 min .wav file)
I get this warn in the console
"Text length too long (200 < 10578), using segments: Voice sample.wav
Audio not segmented, segmenting: Voice sample.wav
Sliced segments: 1 => 160."
I wait and sliced segments are always 1 => 160 and nothing happens
👍👍👍👍👍👍👍
So you din't install PyTorch in you Mac, right?
Does it work for languages other than English?
Unfortunately the Tortoise-TTS model can only generate English speech. You can insert text of other languages, but it would be pronounced wrong and would sound off (I tried it for German). Since the model was trained with a multi-speaker English dataset, it won't be able to generate proper speech of other languages. The challenge here is to first create a multi-speaker dataset for a particular language, similar to the English dataset used. Then the model would need to be trained or fine-tuned on this dataset.
Great work - the Web UI link generated in Colab is asking for the public IP of the tunnel creator? I tried my IPs but that does not work
should we stick with python 3.8 or use the latest?
Hi got everything working Ok ias in your Video but can't figure out how to upload and use my wav files I think i need to use util but don't know how But at least I have the rest working thanks for a great video
please do a video how to set it up on cpu only. I have an M2 Mac and it is an absolute nightmare to get it working. And this is coming from a DevOps engineer
or at least upload instruction how to do so somewhere
Sometimes I wonder why not they put all this stuff in a setup.exe than doing some master degree coding
I'm on WIndows and I get lost at this point 8:28 , how do I create the file and use it on Windows?
Hey, can we create our own voices with tortoise tts? Like, if I want to reproduce a character's voice, can I use samples from this voice to recreate it?
That is literally what Tourtise TTS Was created for
You skipped over lot of steps for the Windows install.. Like path enviroments and them not even being recommended so you might not be able to access conda that easily etc..
It's only with cc window ? Like elevenlab or others : select voice, adjust sliders, text box...
i'm currently using Cuda 12.0 do i have to downgrade to 11.8 for this model to work?
i accidentally followed to install conda with python 3.9 version. how to uninstall it and install conda with python 3.8 version instead ?
how to add more languages ?
Want complete tutorial for mac m1.
hello Martin may you can suggest for other service but not lambda ?
simply doesnt work
Thank you for the tutorial, I have the following error "ModuleNotFoundError: No module named 'pydantic.typing'", can you help?
I tried it on local installation but it took so long just to generate 7 seconds clips. It utilized the GPU and it just very slow compared to conqui TTS.
Could you told about how to create a custom language Coqui-TTS model files , there many different little language even in a country , they need their custom models , thank you very much !
im getting alot of errors in the model folder im guessing? any way we can talk and get this set up? i have a discord we could chat in.
I can hear you're a real genius, but I totally don't understand what you're all talking about.
You don't just walk a path here on how to, but make 56744 jumps to other paths in between.
Next time stay on the subject...
any luck trying to get it to run on windows? cause its so many hoops and no details aside installing
Okay, I have a few issues. Right off the bat. I could not find Python 3.9 so I'd went with the closest version I could find which is python 3.9.4, another issue I'm dealing with currently is that whenever I use the command to install the requirements text file, the metadata will not be downloaded.
Does anybody have any fixes for that?
Can you please do a tutorial on Tortoise-TTS fast? I am at an absolute loss
Could you do the Real Time Voice Cloning Spanish installation tutorial (by AlexSteveChungAlvarez)? Since it has the Spanish language
it keeps saying "Numba requires at least version 14.0.0 of LLVM."
PyTorch can’t run on the CPU?
Will it work on mac m1 or not?
Does it work for windows?
What are the hardware requirements for running Tortioise-TTS model on your local computer? Do you need a GPU?
torch can run on cpu and tortoise tts should be able to run on cpu as TTS can. but i also ran into the local model problem that this guy ran into, so who knows, maybe somebody got it to work.. somehow (without using a cloud GPU)? it seems to have a path reference issue to the model for me. the difference indeed might be that hes using a GPU here. so i dunno if anyone has got it running locally on cpu successfully?
@@julsius doesn’t it require a Nividia graphics card because it uses cuda?
@@nonameman7114 last i remember, you dont need to run cuda for that. you can run it on the CPU. but for me there was an issue with finding the model or downloading it, the same one he had in the video. there are a bunch of other AI libraries that struggle or dont run w/o CUDA though.
@@julsius in that case I might try it out. I’m guessing I’ll be sacrificing speed and quality of the voice without the nividia card ? Kinda like how blender works.
@@nonameman7114 i think i saw a comment here of someone running it on cpu and it took hours. but yeh it might not sacrifice quality but certainly time/performance (that depends how its coded). GPU can do vector math more efficiently which is why all AI stuff is better done on GPU and with intel that means CUDA which yeh is same with graphics processing.
What kind of gpu is needed for this?
not working for windows
Too technical for me. I'll need to go find another tutorial. :(
I will make one and post it in the comments. His tutorial skips over alot tho. Like , you need to install python and everything else. Completely glazes over that shit.
@@madalinecheshirentddev4276 Thanks! I'll keep an eye out! 👍
Can this be done using Google Colab?
Absolutely! Feel free to check out my video where I used Colab for exactly this: ruclips.net/video/FN3yxL0Rr0c/видео.html&ab_channel=MartinThissen
Kannst du das mal mit so-vits-svc machen? Am besten lokal und bei colabs 👀
Habe es zu meiner Liste hinzugefügt :-)
@@martin-thissen Hi, no detailed explanation to run the program after installation on Windows, how to start it up after installation, and then generate speech, or what particular script to run anytime we want to generate a speech, it would be nice to see a run guide after installation, thank you..
Thanks for doing the tutorial with Conda, Ill use this method and your video, To show others how to use the tech...
...Because the other method, I basically had to learn a lot about Coding and im actually not happy about it at all. I learn enough languages doing CnC Engineering and its fucking cringe. None of my peers, either, Have time to learn all of this. I would like it if people were specialized in each field. Preferably, Able to make a foolproof step by step guide, At the least, So people not in the sector, Can use the tech
I assume this won't work without an Nvidia gpu
Yes, unfortunately you need a Nvidia GPU for this :/
just use the mrq fork. there's finetuning also. tortoise is fun, but way too slow, even with a 3090, and the model is not great.
Is it better compared to Tortoise-TTS-fast?
probably called tortoise for a reason :) Where's the cheetah version? :)
Wish they had a exe =[
Free and open source?
You lost us at "in this video"
Better you buy windows machine with nvidia gpu first 😁
Tortoise sounds great, but with no other languages or phonemes, it is useless for anything.
I think that English is the language of global communication, so it's no wonder that models are first developed for English language. But this is just the beginning. I'm sure we'll soon see models that have multilingual capabilities. Whisper from OpenAI (which does speech-to-text) already supports many languages. But stay tuned to my channel, I will definitely do more videos about multi-language speech synthesis.
Can anyone help me please, I know very very little about python. Im getting this error when it tries to open in the browser - File "C:\Users\\miniconda3\envs\tts-fast\lib\site-packages\streamlit
untime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.__dict__)
File "C:\Windows\System32\tortoise-tts-fast\scripts\app.py", line 10, in
from tortoise.api import MODELS_DIR