RIP ELEVENLABS! Create BEST TTS AI Voices LOCALLY For FREE!
HTML-код
- Опубликовано: 8 май 2024
- Say goodbye to expensive AI voice generators like ELEVENLABS! In this ULTIMATE guide, I'll show you how to create the BEST text-to-speech AI voices on your local computer FOR FREE! From a super quick 10-second voice cloning to crafting the most realistic and customizable voices you've ever heard! So that no matter who you are and what your goals are, you will get the best results possible to suit your needs.
What do you think of these FREE LOCAL TTS Methods? Let me know in the comments!
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
✨ PATREON LINK: / aitrepreneur
Xtts webui: github.com/aitrepreneur/xtts-...
Xtts Finetune Webui: github.com/aitrepreneur/xtts-...
Xtts RVC UI: github.com/aitrepreneur/XTTS-...
RVC VIDEO: • CLONE ANY AI Voices fo...
Base Repo:
Xtts webui: github.com/daswer123/xtts-webui
Xtts Finetune Webui: github.com/daswer123/xtts-fin...
Xtts RVC UI: github.com/Vali-98/XTTS-RVC-UI
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
►► My PC & Favorite Gear:
i9-12900K: amzn.to/3L03tLG
RTX 3090 Gigabyte Vision OC : amzn.to/40ANaue
SAMSUNG 980 PRO SSD 2TB PCIe NVMe: amzn.to/3oBR0WO
Kingston FURY Beast 64GB 3200MHz DDR4 : amzn.to/3osdZ6z
iCUE 4000X - White: amzn.to/40y9BAk
ASRock Z690 DDR4 : amzn.to/3Amcxph
Corsair RM850 - White : amzn.to/3NbXlm2
Corsair iCUE SP120 : amzn.to/43WR9nW
Noctua NH-D15 chromax.Black : amzn.to/3H7qQSa
EDUP PCIe WiFi 6E Card Bluetooth : amzn.to/40t5Lsk
Recording Gear:
Rode PodMic : amzn.to/43ZvYlm
Rode AI-1 USB Audio Interface : amzn.to/3N6ybFk
Rode WS2 Microphone Pop Filter : amzn.to/3oIo9Qw
Elgato Wave Mic Arm : amzn.to/3LosH7D
Stagg XLR Cable - Black - 6M : amzn.to/3L5Fuue
FetHead Microphone Preamp : amzn.to/41TWQ4o
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Special thanks to Royal Emperor:
- Totoro
- TNSEE
- RG
- Gluthoric
- Peter Bernaiche
Thank you so much for your support on Patreon! You are truly a glory to behold! Your generosity is immense, and it means the world to me. Thank you for helping me keep the lights on and the content flowing. Thank you very much!
#tts #texttospeech #voiceclone #aivoices #voicecloning
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
WATCH MY MOST POPULAR VIDEOS:
RECOMMENDED WATCHING - All LLM & ChatGPT Video:
►► • CHATGPT
RECOMMENDED WATCHING - My "Tutorial" Playlist:
►► bit.ly/TuTPlaylist
Disclosure: Bear in mind that some of the links in this post are affiliate links and if you go through them to make a purchase I will earn a commission. Keep in mind that I link these companies and their products because of their quality and not because of the commission I receive from your purchases. The decision is yours, and whether or not you decide to buy something is completely up to you.
HELLO HUMANS! Thank you for watching & do NOT forget to LIKE and SUBSCRIBE For More Ai Updates. Thx
How come you are not using ai speech in your videos? I'm not trying to be mean or judge or anything, but you have a strong accent and it's sometimes hard to understand. Even though I love your content
not bad, but tortoise is much more accurate imho. check out the video 'how charsi became a blacksmith' it was made with tortoise TTS.
Would love to see how this can all be done programmatically as a next step.
MEET LOCAL TTs NEAR YOU!!!! 1-800-HUGE-BOOBAS
Hi. If I would like to add another language for training the voice for TTS. Is there any workaround or other model to be used like in this video?
XTTS has such a good voice generation. If you repeat a sentence it sounds everytime different and if you combine it with your RVC voice model you got the best thing ever. The future of AI must be open source and it is good you show everybody how to use this powerful AI technology. Greetings from Germany 🙂
Yeah I wouldnt say the 'sounding different each time', is a good thing when you want consistency though,Sorry for being negative, So far my open source TTS journey has not been great. I feel like im lucky if can generate 2 sentences in a consistent pitch and accent, maybe I need to try with an RVC model like you suggest?. Or just wait for the tech to impove a bit more.
@@v11cu96RVC method would be you recording the sentences how you'd like or using Xtts to generate a Audiofile to be used in RVC... My problem is how do I make it emotional like for example: What![Angry] Why didn't you tell me?[Desperate]
@heady2905 oes this tts actually work for you? I got errors during installation and it wont launch.
This comment sound like AI.
@@MAST i figured it out! i only had Visual studio installed and not the tools
Hey @Aitrepreneur - Quick heads up on the XTTS-RVC-UI install on Win 10. It installs protobuf 5.26.1 by default, which didn't work for me. Downgrading to protobuf 3.20.0 fixed the issue. Just thought this might help others running into the same problem!
I installed it, but the cmd window keeps closing. I press est, and cmd appears and closes quickly.
@@yoniwoker try editing the bat file, add a new line at the bottom saying Pause , save, then run it again. Then you should be able to see the error before the window closes :)
Got hella complicated by the half way point. I wish you used a different persons voice, Obama's voice almost already sounds like a robot and the last sample you showed us honestly didnt sound that great.
Whenever I see his logo image I know I want to do what he's doing but I'm not going to be able to.. Lol
Lol for real?
I mean you can stop the video when it is complicated and follow it step by step. And if it is really unclear just google the words he says.
I would not use audacity to append the voice to a 2 minute clip though. There is a reason they want 2 minutes.
Yea, i think this still too hard to do, i could follow step by step but dealing with errors is not worth the hassle if i am doing this for fun
@@derekthemagician This is the only thing that makes it worth doing rn. As soon as it becomes easy, it won't stand out much.
for everyone having setup py errors:
run the install.bat file
wait for everything to be installed
see error
close the cmd
open a new cmd inside the folder
type "call venv\Scripts\activate"
type "pip install tts"
wait for everything to be installed, close cmd
then run again the install.bat file
no more setup py errors. don't ask me why, because i don't know
This finally solved my Finetune install, now I just need a couple of days to figure out RVC version, installing things through the command prompt is a party... ... lol
👍Thanks man, you solved my problem here.
What the hell. That fixed it. I also have no idea why though
totally not working for me
@@diehgo_sp sounds strange, did you do it correctly? i know for sure that installing the tts package fixes setup py errors. but maybe you are having a different error, or simply did the procedure wrong, so could you tell me exactly what error are you having, and where?
I like the Videos but the instalations never work. :-(
Between this in six months, The SD3 Fintunes, The tools that finally are getting us to consistent characters, and the Lama 3 Fintunes.
This year's Sillytavern video Is going to be bonkers.
SD3 sadly won't work with controlnet due to its lack of UNET architecture, but hopefully something similar is shipped soon
How do we integrate this with SillyTavern so we can speak back and forth with the voice we generated?
i dont know what ure talking about but i cloned this depos and so um now I need to know what ure talking about. cuz this is rad and does tortoise level at bark speeds. god we're nerds. (sonic checking watch gif) also expecting open source udio clone (odel-may eak-lay) to get git got soon. hoping! is musicgen still the top dog?
Woah, perfect timing on the vid, was just looking this stuff up today. Thanks homie!
Awesome video! I had been waiting for just this for ages! Thank you very much!
This video is 17 minutes of "But wait, there's more!" and it's SOO good. Thanks!
Cool! I was waiting for this! Happy colored Greetinx!
Thanks so much for this one! Cannot wait to try!! 😊
Unfortunately, it cannot be used, many errors occur during installation following the steps given. Code errors, things not found, different versions. things that are difficult for non-programmers to understand. what a shame.
you are correct buddy. only a handful could have it running
i started debugging with chatgpt, followed every step and error, and after 7 hours got it working, just dont give up, especially if its for a buisness which is why im using iut
I discovered the main problem with my installation and after solving it, I installed everything without any errors. It's the Microsoft Visual C++ 14 package. It's not enough to just install Visual Studio, you have to install the package along with it, but it's not that intuitive. Look for a video "Fix: Microsoft Visual C++ 14.0 or greater is required in Python" from the "Hey, Let's Learn Something" channel that helped me. It's very simple. Then come back here and thank our friend from the channel who introduced us to this wonder.The three programs installed and are working without errors!
after two hours, i still can't get it to launch. it opens then closes
Is this approach scalable to large text sizes? Like, if I tried to TTS an entire book, would that take infinite VRAM or endless dealing with 2 minute chunks or something, or would it just work?
Awesome.. I always appreciate TTS and Voice cloning videos
@TomiTom1234 did this actually work for you? none of them work for me after installing.
@@LJames-ez9lr Two did work, but XTTS-RVC-UI didn't, I get a red error after installing it, sadly.
@@LJames-ez9lr What errors are you getting?
I'm guessing this doesn't work on AMD cards?
Kudos to Coqui TTS for making this available!
Xtts fine tuning not working for all people! This tutorial is breake
Great job 👍
How about customizing LLM next?
Thank you for the great video ! Is there a proper ATS as well? Or an app, that does Dubbing like 11Labs?
Merci ! Genial pour cette version Open ❤source
As I'm chatting in SillyTavern, I notice that the command windows try to reference emotion voice models, such as joy.pth with joy.index or surprise.pth with surprise.index. Sure, it still works without them, but do you know if those will have to be custom trained models for that character in that emotion, or is there some generalized emotion model somewhere that can be copy/pasted to multiple characters?
Thanks! I found the colab for the first one. Are there colabs for the other 2?
Hell yes, this is exactly what I wanted!
Why erverytime when I install XTTS FineTune it does not create the two folders base_models and finetune_models? When I run the start.bat it opens, but obviously I am not able to train any model.
Thank you for the information.
Perfect timing!
Goodbye Eleven Labs. They were overpriced and closed source. This XTTS model is amazing. Merci Monsieur Aitrepreneur. As a little goodbye kiss to Eleven Labs, I'm going to clone my favourite voices that they have. Lol.
Have fun ;)
Aaaaand 11Labs dropped SOTA SongGen
How do we integrate this with SillyTavern so we can speak back and forth with the voice we generated?
Sucks that they didn't put One Lab aside for something more traditional like textile manufacturing.
Which python version did you use for this? Trying to get deepspeed to work, but it's saying Python 3.9 might be too high of a version xD
Having a 8GB card running LLM, SD and this thing for some roleplaying in Silly Tavern is getting silly.
I hope next gen low-mid range cards has a decent amount of VRAM.
4060 has 16gb variant, 5060 will probably be the same just higher cost.
Mines not working and keeps spitting out gradio errors despite following the instructions to the letter. webui and finetune both.
damn,fine-tuning was really heavy problem for me,now i got it thanks
12:45 where didi you take the index and pth file?
Why do you not just use the rvc enhancement option in the xtts WebUI directly? Is it slower or of lesser quality compared to the full RVC version?
Great stuff! Anything for Mac users? Just asking... 😃
You are the AI mentor of this generation, thanks for the hard work
Thanks for the tutorial, however here is 3 env, is it possible to put them all in the same env ?
Could you increase the tts quality and likeness of the second model even more with a longer audio clip than 2 minutes? Or doesn't it make any difference above this length?
I'm wondering the same thing. Did you test it? 👀
Would it be possible to integrate this into a python script?
Wow a finetuning WEBui is awesome,in all other apps my 8gb Vram GPU was not nearly enough for training but with this Vram usage is like you said minimum,training is pretty fast and quality is amazing!
One question,what if you want to train one model to be good with multiple voices,is that even possible or you need to train new model with every new voice you are using?
Thanks!
I got it installed but when i try to upload the sample audio, it throws an error "FileNotFoundError: [WinError 2] The system cannot find the file specified"
Can use my 1060 toaster on this
or do I need a 4080?
Yeah, but can it do foreign languages as easily as Eleven labs ‘multilingual v2 or v3?
Is there a V3 for elevenlabs? I only se V2.
@@FluorescentApe there’s “multilingual v3” now
@@planetmuskvlog3047 what's weird. Can't see it. Maybe only a select portion of people can use it?
I'm curious if you can use this tts to read ebooks on your phone?
is there a difference between your github and the original xtts-webui ?
Unfortunately the tutorial is outdated and nothing works anymore. sad :(
Ok : Still works but i have to install dependency manually dont know why
how much vram needed?
Do i need to install CUDA on my pc as well?
I have tried all possible configurations in xtts-finetune-webui and all the models I create, when I want to use them in XTTS-RVC-UI, I get an error:
"ValueError: Incorrect format for ./rvcs/model.pth. Use a voice model trained using RVC v2 instead"
has it happened to anyone else? how can I fix it?
Say Our AI overlord. A mad lad that goes by gradientai on huggingface.
Made Llama 3 work with a 1 million+ token context size.
Since you have a 4090 now.
I am curious if you could see how far it can go on our local machines.
will this work on rpi 5 ? no cuda on there thats why i ask? it arm compatable?
Any ideas on how to make the voices have emotions like happiness, sadness, angry, etc
Is there a way to convert these trained models into a .tts SAPI5 voice so that other applications can use them?
sooo.....can it be stitched to the silytavern somehow?
sillytavern already has a way of adding xtts extention, which has live realtime streaming. you can find that in the plugins tab in sillytavern.
but then you'd have to go through a complicated process, at least for me. to put ST in staging etc
I wish K covered off exactly this. How do you take your Uber TTS model and then run it in Sillytavern
@@42ndMoose is there a tutorial for that?
I wonder what happens if you train on reversed audio, or animal sounds or musical instruments etc...
What is the current state of the art foss multilingual tts? In terms of quality and speed? Thabks
can i do this process on my 3060ti ?
ERROR: Could not find a version that satisfies the requirement torch==2.1.0 (from versions: none)
im getting this error and cannot move forward....
also...do you need to have an nvidia card for this too? cause i have an AMD
Rvc still creates weird artifacts. Is there anyway to add emotions that what I really want. Playht can kinda do it but I haven't seen any others tts services do it
cannot install the requirements.txt stuff. hopefully somebody knows a good fix.
What I would like is a high quality declick/denoise model, mostly against saliva/pop noises from sensitive mics.
Is it possible to use my own voice and morph it with some pre-trained models?
Since Coqui shut down, I expect XTTS to stop having support. Especially for new languages and the sort. Is there an alternative to xtts?
don't have Vietnamese, what should I do to train my voice? Can you help me?
What do I do if I get this error message? ERROR: Could not find a version that satisfies the requirement torch==2.1.0 (from versions: 2.2.0+cu118, 2.2.1+cu118, 2.2.2+cu118, 2.3.0+cu118)
ERROR: No matching distribution found for torch==2.1.0
Same here!
I get this error as well. google isn't being helpful so far
@@kevins_campfire change 2.1.0 to 2.2.0 seems to be installing then
@@kevins_campfire it probably has to do with the cu118, do you have Cuda version 11.8 installed? you can try to cheat it, by modifying the requirement file...
@@AltMarc I have Cuda 11.8 installed and added to my path, but still doesn't work :/
Hi alien.
Do you have any tips for making XTTS WebUI voice2voice work on an older 1070 GPU?
I load the voice model, choose the language and then generate.
Error --> ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
Bye and ty
im having an issue when starting xtts-webui, it says "ImportError: DLL load failed while importing transformer_inference_op: The specified module could not be found." anybody knows how to solve it
I get "ERROR: No matching distribution found for torch==2.1.1+cu118" when running the install.bat, despite intalling the pre-requisities
ever find a fix??
@@LuminRL Nope
Hello, how can I access the files of the first installation?
not working for me
i had a setup with tortoise tts + rvc, but this seems better. thankfully it also works on linux, form just watching the video i thought it might not. my tortoise thing doesn't. i'll try it later.
Thoughts on how this compares to StyleTTS 2? And can you capture / translate emotions like *sigh*, laughter and sarcasm?
now I would really like to see tutorial how to fine tune your own language model and utilize it in LLMs.
hi there, is there a way to dub locally like this way? im looking for this.
how does all these compare to appolio?
For animation I want the phoneme timing for the audio (for lipsync). You get one with the Amazon TTS. Any ideas anyone?
Thanks for the video, can you do a video installing the new local CHATGPT?
now with that model, how can I use it to sillyTarvern?
Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
Hi, do you plan to make a video on how to use LLMs and other AI things using ROCm for Radeon and Windows users?
well I don't have any AMD GPU so can't really show that
I've settled on the fact that I either have to wait a few months for AMD (or quants to enable CPU) support or get an Nvidia card.
Nvidia is still on my shit list and I won't be doing that. The company's practices haven't changed.
Theres an issue with install.bat in the fine tune webui
minimum system requirements?
Brilliant!
where i can find AI voices already made by the community?
google rvc models
Wasn't a very good idea to repeat 37 seconds of reference audio at the start. With all the time I've used 11Labs' solution, even the ultimate version you have here doesn't sound as good as even their English V1 model.
yeah it wasn't a good idea, just me being lazy but it still worked ok.
Not sure I agree with the final result, it's very similar to an elevenlabs quality and it's free and unlimited, if you want to pay to use 11labs it's your choice, I'm giving another possibility to people who can't afford it or just want to save money for a very similar level of quality
For once, I’m not clickbaited and I’m happy i dropped by
I get lost at the 6:00 point when you start talking about
xtts-finetune-webui. Where is that supposed to be?
Never mind. I got it! :)
Does this work with any gpu? Or just nvidia
I have AMD and CUDA is not having it.
where can i find ultimate tts auto installer
Doe it an api for to use for chat box?
I wish they added support for Swedish and more languages. Is there any website that has a file for Swedish language?
How can you tell if this is not a real voice? If this is a scam, then we need a way to find out. Maybe there is a way, something like watermak?
Hi Bro, how do I add other languages to this software?
Sadly I can't get this to work. On the finetune part, it crashes with an error that it can't load the cublas64_12.dll. Anyone got an idea?
Same issue here, did you solve it?
Edit: NVM, I just needed to install the Cuda Toolkit from Nvidia, and all good.
@@nocommentgamez2571 Sadly no. Went on a wild goose chase and figured I'd come back to it in a few weeks.
what is the latency like on this?
Any good tutorial video needs links to the required downloads.
exactly
Can someone give me advice, i see a lot of OpenSource AI software
Which is the minimal specs to be able to run most of them?
8 GB graphic video? More? If so, how much GB of GPU is enough?
Linux or Windows?
This is the BIG fail of his videos, he never says the GPU VRAM requirements.
@@ZeroCool22 agree
He uses a 4090 24gb, You can run the simple parts with 8gb but if you want to "Train" you want atleast 16gb. He has an older vid you can use 10sec of audio that runs fine and even switch singer in a song or audio but you need to dl the new singer, The training will work but will take like 30 hours on an 3070 8gb. So id recomened like a 4060 16gb since most of us cant swing a 1600$ 4090 lol
@@ZeroCool22Intentionally
Have you gotten your hands on Story Diffusion yet?
Difficult to follow the installation instructions
Does this work in Silly tavern?
This is what I wanted to know the entire video. How do we integrate this with SillyTavern so we can speak back and forth with the voice we generated?