Run Tortoise-TTS On Your Local Computer 🔊 | Tutorial | Voice Cloning

Martin Thissen

Просмотров 58 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 12 сен 2024

Комментарии • 166

@Ilovepapayas Год назад ⁺⁴
Thank you very much for that tutorial. That's something we've been waiting for! You are a very good youtuber!
@plouismarie Год назад ⁺⁵
Awesome 👏 I was literally trying that local approach last night because collab online sheet would timeout on long text to render , so I used my own computer + 1080 ti Nvidia card so I can leverage cuda lib. But then I got memory allocation issue I was stumbling around … Your timing is just perfect once again ! Keep up the good work dear Martin 👍
@martin-thissen Год назад ⁺¹
Thank you! :-) Glad it was helpful!
@tommyshadow66 Год назад
i had the same issue on collab :( 5 hours in and lost everything
@bllaqattitude759 Год назад
can you help with a step-by-step guide on how you got it all setup and running on your windows pc, please?
@iseahosbourne9064 Год назад ⁺⁴
THE ONLY FUCKING TUTORIAL AROUND MATE!
HEADACE!
THANK YOU
@martin-thissen Год назад
Haha glad you liked it! :-)
@iseahosbourne9064 Год назад ⁺¹
@@martin-thissen yep, worked like i treat! I literally spent 1 week of back and fourth trying to install this, and from the looks of it, it was due to install python 3.9 instead of 8 but im glad its installed now.
@martin-thissen Год назад ⁺¹
@@iseahosbourne9064 Oh wow, glad it was helpful and you can finally start using the model! 🙂
@bllaqattitude759 Год назад
@@iseahosbourne9064 how did you get it to run on windows?
@iseahosbourne9064 Год назад
@@bllaqattitude759 Or tts fast
@Gadzislaw007 Год назад ⁺²¹
I appreciate you making a tutorial on this. After watching i am still very confused on the process. I think you focused too much on specific installation first without even briefly explaining the process of cloning itself until very end of the tutorial, which makes the tutorial very hard to go through. It would help a lot if you explained what do we need required modules first, what are we essentially training to do and then tell how to install them. For instance ive never used pytorch and its not clear to me whether its needed or not. Ofc im gonna do more research, but just wanted to give some input how could you make stuff more accessible in the future.
@Daniel-fl4si Год назад
Agree
@Devalinor Год назад ⁺¹³
Extremely underrated youtuber.
You definitely deserve more subscribers.
@martin-thissen Год назад ⁺²
Thank you, appreciate it! :-)
@juliana.2120 9 месяцев назад ⁺¹
THX/DANKE!! lerne gerade so viel von deinen Videos
@NerdManReturns Год назад ⁺⁵³
I do wish someone will eventually create a GUI interface for tortoise-tts.
@greenockscatman Год назад ⁺²
Martin's got a web interface his 5x faster voice cloning video that's just about the most user friendly interface you can ask for!
@bigglyguy8429 Год назад ⁺⁵
@@greenockscatman A web interface offers no privacy, which is a total no-go
@ElMatero6 Год назад
@@greenockscatman where can I find this?
@grrinc Год назад ⁺¹
I think I've discovered one.... tortoise tts is now available as a plugin for blender - complete with gui. Its kinda neat, but operates within Blender though
@ElMatero6 Год назад ⁺¹
@@grrinc Really? I work with blender A ton! I would love to know if there's a proper download for that.
@UniquelyCaptivating Год назад ⁺⁸
Martin what do you do for longer text for example 2000 words ? , or should you split the text training data ?
@CesarCalvoCobo Год назад ⁺⁴
Thank you for this content . I was struuggling with dependencies in Windows and this gave me the solution. Definitely subscribed to your channel !
@tautegu Год назад ⁺²
Thank you for this Martin. I was literally trying to do run this locally so its great timing!
@martin-thissen Год назад
Glad I could help! :-)
@tautegu Год назад
@@martin-thissen Have you tried the ozen toolkit? ruclips.net/video/lnIq4SFFXWs/видео.html.
@ILIKECYDIA Год назад ⁺¹
Can't believe I missed this thank you!
@russelschuster8036 Месяц назад
good video, nice tutorial, all information, success
@nothde9865 Год назад ⁺³
This is kind of frustrating to operate, I managed to get it up and running but I closed the miniconda prompt and now I can't get back to starting it again.
@EvilSpeculator Год назад ⁺⁶
Disappointing tutorial. Guides the viewer into unsuccessfully installing on an M1 Mac, and then switches to a remote VPS. Hard to follow and I wasn't able to set this up successfully.
@langstonreese7077 7 месяцев назад
Same ):
@DevangRPatel 4 месяца назад
You look like "Tom Holland" (Spider-Man) 😆.
Your videos are amazing. Good work.
@ThePlayingTiger Год назад ⁺⁶
Hi Martin, first off, thank you for the video, it's the clearest one I've yet to find on setting up Tortoise.
having said that. I followed your instructions, and the link to your Github, and I still can't get this to work. what causes me problems is 2 folds. 1) you keep going with something that will not work, and only afterward explains what you had to fix, which means that by then, I'm already confused as to what you are doing. While I appreciate seeing the issues you had, as a teaching method, something more streamlined, only showing the correct process would be helpful. 2) you take for granted that the watcher has the same knowledge as you do, telling us to do something in what you called 'VI' or 'ID'? I couldn't quite tell what ou said, but I also have no idea what either of them are, and you don't link to something that can help us understand. then you simply rattle off information, copy paste things, and it works.
following what, I think, is the equivalent, on your Github only get's me 'command not recognized' types response.
would it be possible to create a local installation instructions, in text or video, but for the person who knows nothing about this? the one who accidentally came across voice AI video and thinks it could help in his work, but has no knowledge of programing at all?
again, thank you for the clear video
@bigglyguy8429 Год назад ⁺³
I second this!
@bowenzhang4565 Год назад
Great Video, Martin! Keep it up:) Tom's voice sounds a lot like Tom Hanks to me.
@onoff5604 7 месяцев назад
Great live-code type tutorial (more realistical than sound-bite videos). Request: Since a lot of development requires venv env package management and not conda, could you including some of those details for people doing production deployment? Many thanks!
@pancelalkov2070 Год назад
you are a god, thank you so much for this, i was struggling so much with the version missmatches
@StoriesWithAPurpose Год назад ⁺¹
where is the colab?
@blarfasel Год назад ⁺¹
What about the part of cloning an existing voice. That would have been nice.
@JeremyBoucherEsq 2 месяца назад
Thank you for this guide. Can you explain how you used Lambda Cloud to run this (as per your comment in the video)? I have a M1 Macbook Pro and am constantly frustrated trying to run GenAI without the GPU. Thanks.
@cadcaetutorial2039 Год назад ⁺¹
Very nice this lecture sir
@wnrandom98 Год назад ⁺²
unrelated question, is there a limit to the amount of text tortoise can process? cause i want to make entire pdf files into audiobooks, and i can see how that would become an entire problem xd
@zskater1234 Год назад
Loved the video! Keep it up man
@Some1uNo Год назад ⁺¹
Awesome. great walkthrough
@martin-thissen Год назад
Thank you! :-)
@cadcaetutorial2039 Год назад ⁺¹
So nice sir
@TheAfronymous 3 часа назад
Thank you, could you please detail how to perform this when you have a Mac architecture ? I have a M3 mac, no GPU. Thank you
@thepermen Год назад ⁺¹
can this voice generator do in another languages?
@swannschilling474 Год назад ⁺¹
Subscribed 😊
@martin-thissen Год назад ⁺¹
🙌
@AntoniusTertius Год назад ⁺²
10:46 How do I open the file on Windows? I got: "python: can't open file 'tts.py': [Errno 2] No such file or directory" :(
@bllaqattitude759 Год назад
where you able to figure it out?
@AntoniusTertius Год назад
@@bllaqattitude759 I uninstalled everything and then installed everything again.
Just for you to know, Tortoise on local pc is very slow at generating long texts, VERY SLOW.
@Agamer2907 Год назад ⁺¹
Thank you for the tutorial on this, but I was wondering if you can connect this to an ai assistant of sorts and have it look up information or respond back to you with the cloned voice without having to type out the prompts you have it turn from text to speech.
@GlenBland Год назад ⁺³
I still had dependency version missmatch errors. 4 hours later, I got it running by addeing the following:
conda install -c cctbx202208 numpy
conda install -c main llvm
pip install pydantic==1.9.1
@reinhardmatzka6537 Год назад ⁺¹
Thanks, your adjustments and pytorch-cuda-11.8 worked for me with WSL2 and RTX 4070.
@KofaOne Год назад
Thanks a lot! Wasted half an hour trying to figure this out.
@Moyil Год назад
Thank you very useful, Love you !
@bllaqattitude759 Год назад
how did you get it to run on windows?
@magenta6 Год назад
Great Tutorial!!! THNXX!
@bllaqattitude759 Год назад ⁺¹
@martin-thissen Hi, no detailed explanation to run the program after installation on Windows, how to start it up after installation, and then generate speech, or what particular script to run anytime we want to generate a speech, it would be nice to see a run guide after installation, thank you..
@premnathart3496 Год назад ⁺¹
Does it works on MacBook air m1?
@HtopSkills Год назад ⁺⁸
As AI voice technology advances, it becomes increasingly difficult to distinguish between human and machine-generated speech.
@xanksauri89 Год назад ⁺³
It already is. And it's already being used for scams.
@amitparmar8076 4 месяца назад
I doubt, it may be not using the GPU I have GPU in machine which is NVIDIA RTX 4090, but utilization of GPU is NOT increasing more than 2% through out the execution. It is working very slow. what to do??
@frattuncbas Год назад ⁺¹
How can i continue my pre-trained model later? I want to train it 20000 epochs for most realistic quality but i need to run day by day my same pre trained model..Is it possible on Easygui Colab? I have Pro colab
@ParAculam Год назад ⁺¹
Hallo Martin, and hank for this awesome and *only* tutorial that works, given that this stuff is not for beginnes.
Could you please make a video on how to make a Tortoise model on windows? The idea is to clone my own voice, but then I want to use the model to generate text, without running the same training process every time. Is that possible?
@Chriscs7 7 месяцев назад
How to train a new voice using long audio (ex 10 min .wav file)
I get this warn in the console
"Text length too long (200 < 10578), using segments: Voice sample.wav
Audio not segmented, segmenting: Voice sample.wav
Sliced segments: 1 => 160."
I wait and sliced segments are always 1 => 160 and nothing happens
@AntoniusTertius Год назад
I'm on WIndows and I get lost at this point 8:28 , how do I create the file and use it on Windows?
@InfernalPasquale Год назад
Great work - the Web UI link generated in Colab is asking for the public IP of the tunnel creator? I tried my IPs but that does not work
@premnathart3496 Год назад ⁺¹
Please run this program on pycharm😂 I think it may useful for my college project
@MoronDe Год назад
Remove tortoise from cache
@everybodyguitar5271 4 месяца назад
So you din't install PyTorch in you Mac, right?
@klaurcschwackerberg1880 Год назад ⁺¹
Hello Martin , would using this way of working be as good as the results of using HiFi-GAN voice modeling, to achieve the goal of being able to generate audio that is very close to the original audio input in terms of quality and naturalness ? I mean to end up with a result that has the natural voice that is as close to the original tone and pitch from the imported audio , instead of a completely synthesized sound version in the result ?
@martin-thissen Год назад
Hey, overall I can really recommend using the Tortoise-TTS model for voice cloning, because the results are really good. But the model has a few downsides. First, it lacks in diversity of speech (e.g., accents). And second, it's really slow, especially if you compare it to the HiFi-GAN model. I personally haven't worked with the HiFi-GAN model yet, so I can't really say if the results are better than the one made with HiFi-GAN model. But if you don't want to set everything up on your local computer, you can also use a Colab notebook, I made a video about it: ruclips.net/video/FN3yxL0Rr0c/видео.html
@simonhaddow5052 Год назад
Good job!
@123arskas Год назад ⁺²
Awesome content but the installation process just gave me anxiety. Now going through existential crisis. Setting up Django seems better than this.
@philosopherlogic Год назад
The lack of links in the description makes this tutorial so time consuming to follow.
@GregFliesVR Год назад
im getting alot of errors in the model folder im guessing? any way we can talk and get this set up? i have a discord we could chat in.
@Dante02d12 Год назад ⁺¹
Hey, can we create our own voices with tortoise tts? Like, if I want to reproduce a character's voice, can I use samples from this voice to recreate it?
@langstonreese6919 7 месяцев назад
That is literally what Tourtise TTS Was created for
@curtis2962 Год назад ⁺¹
should we stick with python 3.8 or use the latest?
@CaptainSnackbar Год назад
i'm currently using Cuda 12.0 do i have to downgrade to 11.8 for this model to work?
@MultiGameView Год назад
i accidentally followed to install conda with python 3.9 version. how to uninstall it and install conda with python 3.8 version instead ?
@touma-san91 Год назад ⁺¹
You skipped over lot of steps for the Windows install.. Like path enviroments and them not even being recommended so you might not be able to access conda that easily etc..
@latlov Год назад ⁺²
Does it work for languages other than English?
@martin-thissen Год назад ⁺³
Unfortunately the Tortoise-TTS model can only generate English speech. You can insert text of other languages, but it would be pronounced wrong and would sound off (I tried it for German). Since the model was trained with a multi-speaker English dataset, it won't be able to generate proper speech of other languages. The challenge here is to first create a multi-speaker dataset for a particular language, similar to the English dataset used. Then the model would need to be trained or fine-tuned on this dataset.
@PPDanYT 10 месяцев назад
It's only with cc window ? Like elevenlab or others : select voice, adjust sliders, text box...
@anthonyfink633 Год назад
hello Martin may you can suggest for other service but not lambda ?
@ciekawiarniapl 7 месяцев назад
how to add more languages ?
@ziyadomar63 Год назад
Thank you for the tutorial, I have the following error "ModuleNotFoundError: No module named 'pydantic.typing'", can you help?
@tuapuikia Год назад
I tried it on local installation but it took so long just to generate 7 seconds clips. It utilized the GPU and it just very slow compared to conqui TTS.
@antongritsyk3070 8 месяцев назад
please do a video how to set it up on cpu only. I have an M2 Mac and it is an absolute nightmare to get it working. And this is coming from a DevOps engineer
@antongritsyk3070 8 месяцев назад
or at least upload instruction how to do so somewhere
@TheWolverine1984 Год назад
+1 For Vi.
@matthewpeck5607 Год назад ⁺²
simply doesnt work
@user-kz1hh3jz9t Год назад
Could you told about how to create a custom language Coqui-TTS model files , there many different little language even in a country , they need their custom models , thank you very much !
@Touhou2006 9 месяцев назад
Sometimes I wonder why not they put all this stuff in a setup.exe than doing some master degree coding
@Bald_Fred Год назад
Okay, I have a few issues. Right off the bat. I could not find Python 3.9 so I'd went with the closest version I could find which is python 3.9.4, another issue I'm dealing with currently is that whenever I use the command to install the requirements text file, the metadata will not be downloaded.
Does anybody have any fixes for that?
@MADguyfps Год назад
Can you please do a tutorial on Tortoise-TTS fast? I am at an absolute loss
@scem-currentspecialist Год назад
Does it work for windows?
@terjeoseberg990 Год назад
PyTorch can’t run on the CPU?
@amitnatty4923 Год назад
Want complete tutorial for mac m1.
@peterpitcard Год назад
What kind of gpu is needed for this?
@damiangarcia285 Год назад
Could you do the Real Time Voice Cloning Spanish installation tutorial (by AlexSteveChungAlvarez)? Since it has the Spanish language
@amitnatty4923 Год назад
Will it work on mac m1 or not?
@curtis2962 Год назад
it keeps saying "Numba requires at least version 14.0.0 of LLVM."
@timg8757 Год назад
Hi got everything working Ok ias in your Video but can't figure out how to upload and use my wav files I think i need to use util but don't know how But at least I have the rest working thanks for a great video
@jodyray895 Год назад
👍👍👍👍👍👍👍
@kamildvoracek Год назад
not working for windows
@ShahirNaga Год назад
Can this be done using Google Colab?
@martin-thissen Год назад
Absolutely! Feel free to check out my video where I used Colab for exactly this: ruclips.net/video/FN3yxL0Rr0c/видео.html&ab_channel=MartinThissen
@NNokia-jz6jb Год назад
Free and open source?
@DeineFakten Год назад
Kannst du das mal mit so-vits-svc machen? Am besten lokal und bei colabs 👀
@martin-thissen Год назад
Habe es zu meiner Liste hinzugefügt :-)
@bllaqattitude759 Год назад
@@martin-thissen Hi, no detailed explanation to run the program after installation on Windows, how to start it up after installation, and then generate speech, or what particular script to run anytime we want to generate a speech, it would be nice to see a run guide after installation, thank you..
@Quamel Год назад
I assume this won't work without an Nvidia gpu
@martin-thissen Год назад
Yes, unfortunately you need a Nvidia GPU for this :/
@BeyondtheAgesStudios Год назад
Too technical for me. I'll need to go find another tutorial. :(
@madalinecheshirentddev4276 Год назад ⁺²
I will make one and post it in the comments. His tutorial skips over alot tho. Like , you need to install python and everything else. Completely glazes over that shit.
@BeyondtheAgesStudios Год назад
@@madalinecheshirentddev4276 Thanks! I'll keep an eye out! 👍
@SetoFPV Год назад
Better you buy windows machine with nvidia gpu first 😁
@winnieoriana4398 Год назад ⁺¹
I can hear you're a real genius, but I totally don't understand what you're all talking about.
You don't just walk a path here on how to, but make 56744 jumps to other paths in between.
Next time stay on the subject...
@bllaqattitude759 Год назад
any luck trying to get it to run on windows? cause its so many hoops and no details aside installing
@leavemealoneandgoaway Год назад
just use the mrq fork. there's finetuning also. tortoise is fun, but way too slow, even with a 3090, and the model is not great.
@MADguyfps Год назад
Is it better compared to Tortoise-TTS-fast?
@MichaelAsgian Год назад
probably called tortoise for a reason :) Where's the cheetah version? :)
@wisevirginsmedia Год назад
You lost us at "in this video"
@link1797 Месяц назад
Wish they had a exe =[
@konstantinrebrov675 Год назад
What are the hardware requirements for running Tortioise-TTS model on your local computer? Do you need a GPU?
@julsius Год назад
torch can run on cpu and tortoise tts should be able to run on cpu as TTS can. but i also ran into the local model problem that this guy ran into, so who knows, maybe somebody got it to work.. somehow (without using a cloud GPU)? it seems to have a path reference issue to the model for me. the difference indeed might be that hes using a GPU here. so i dunno if anyone has got it running locally on cpu successfully?
@nonameman7114 Год назад
@@julsius doesn’t it require a Nividia graphics card because it uses cuda?
@julsius Год назад
@@nonameman7114 last i remember, you dont need to run cuda for that. you can run it on the CPU. but for me there was an issue with finding the model or downloading it, the same one he had in the video. there are a bunch of other AI libraries that struggle or dont run w/o CUDA though.
@nonameman7114 Год назад
@@julsius in that case I might try it out. I’m guessing I’ll be sacrificing speed and quality of the voice without the nividia card ? Kinda like how blender works.
@julsius Год назад
@@nonameman7114 i think i saw a comment here of someone running it on cpu and it took hours. but yeh it might not sacrifice quality but certainly time/performance (that depends how its coded). GPU can do vector math more efficiently which is why all AI stuff is better done on GPU and with intel that means CUDA which yeh is same with graphics processing.
@ShreddedSteel Год назад
Thanks for doing the tutorial with Conda, Ill use this method and your video, To show others how to use the tech...
...Because the other method, I basically had to learn a lot about Coding and im actually not happy about it at all. I learn enough languages doing CnC Engineering and its fucking cringe. None of my peers, either, Have time to learn all of this. I would like it if people were specialized in each field. Preferably, Able to make a foolproof step by step guide, At the least, So people not in the sector, Can use the tech
@tiwale6387 Год назад
Tortoise sounds great, but with no other languages or phonemes, it is useless for anything.
@martin-thissen Год назад
I think that English is the language of global communication, so it's no wonder that models are first developed for English language. But this is just the beginning. I'm sure we'll soon see models that have multilingual capabilities. Whisper from OpenAI (which does speech-to-text) already supports many languages. But stay tuned to my channel, I will definitely do more videos about multi-language speech synthesis.
@trepidstation Год назад
Can anyone help me please, I know very very little about python. Im getting this error when it tries to open in the browser - File "C:\Users\\miniconda3\envs\tts-fast\lib\site-packages\streamlit
untime\scriptrunner\script_runner.py", line 552, in _run_script
exec(code, module.__dict__)
File "C:\Windows\System32\tortoise-tts-fast\scripts\app.py", line 10, in
from tortoise.api import MODELS_DIR

Следующие

Автовоспроизведение

Free Speech: Reviewing Coqui-ai, Mycroft Mimic3 and Tortoise TTS Libraries