My AI Audiobook Maker - Demo and Installation
HTML-код
- Опубликовано: 21 окт 2024
- Links referenced in the video:
Audiobook GitHub - github.com/Jar...
Install Tortoise TTS - • Local AI Voice Cloning...
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/Jar...
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoff...
a neat feature for this application would be adding mutliple speakers to add extra voices for stuff like character dialog.
This would be a dream! We would be able to make books into audio theater!
I just recently stumbled upon your channel and now you allready release your audiobook maker. Can't wait to test it out! So far I used rvc tts to create audiobooks, which works but this looks far superiour.
edit: "To-do: Find a way to do "multiple speakers" for dialogue in the book" - Awesoem! Just wanted to mention that multiple speakers would be the icing on the cake.
yeah really want to see multiple speakers for dialofues!
A new feature I just thought up that would be a gamechanger would be the ability to edit the text inside the audiobook maker. Sometimes sentences get cut off, or some characters are not recognized, so having a way to edit the text and add/delete sentences without having to go back and forth with between the file and the program and then regenerate everything would save a lot of time
Even though books will take me a good 20 hours + with my GPU. This project has been a total life saver for me.
Thanks so much for your hard work on this.
Appreciate it and thanks for becoming a member! 🤝
@@Jarods_Journey
I'm sorry to distract you, but what can it mean? I downloaded your 2.5 GB ARCHIVE which, but:
Failed to connect to port 7860, trying next port
Calling API with sentence:
do you know the solution? or the reason
hi did you solve that problem?
@@PROJECTSSourceEngineLessons
Now I can have shadow-Sama voice to read me a bedtime story
You are incredible, Jarod. Thank you for this
My pleasure!
Genius is an understatement - thank you!
dude, I came to your TTS tutorial because I was doing exactly this. thanks for making this open source
Just as a FYI for anybody thats having errors getting this working/generated with the "Data" or Port errors. Use 2.0 of Tortoise and not the latest release of 3.0. Hope this helps folks.
thank you kind sir
Thanks!
Just hoping the algorithm pushes him to the top with this hard work
Everytime I try to unzip, I get "Unsupported compression method" for the t64-arm, w64-arm, cli-arm64 and the gui-arm64 executables. I redownloaded the zip just in case something happened with an incomplete download, but it happened again. Is this a problem?
Hmm, make sure you downloaded the correct 7zip download for your computer, else, you might need to try another unzipper
Same, did you find the fix ?
Edit: We just need to update Winrar or 7zip
Edit2: But the port connection still doesnt work....
Can't wait to try this over the weekend.
You are the GOAT, thank you so much! Always wanted a tool like this and now it's here.
I knew you will make it! :) I'm proud of you
I salute you fellow Re:Zero fan. We could easily make decent quality audiobook versions of the webnovel with this tool
Amazing stuff, dude. Thanks.
Appreciate it a lot, thank you Doug!
🎉 Great stuff as always! 🎉
THIS is whay I was looking for. THANK YOU! :)
Incredible work! So helpful, really appreciate your sharing this! Thank you!
Amazing work. Will you also be making a version for mac in the future?
Your hard work has finally paid off, will definitely try this one out, thanks again for this! Also quick question, since the AI Hub Discord server has been taken down, do you happen to know anywhere else that provides voice models w/ indexes?
Thank you, they're backup, but I believe you have to go to the AI Brazil page first and then to the AI hub discord.
hi. whats the URL for AI Brazil page@@Jarods_Journey
thatS' insanely high quality
Is there a limit on text generation, or could i cut/paste a whole book into this and let it run overnight?
Sentences above 300 character I believe get separate out, but all the splicing is handled. As long as you have a GPU, you can generate as much as you'd like
This is just spectacular! I have been having trouble getting a good voice quality cloned. I have watched many of your videos, but I do not believe I have found one that shows me how to clone with the best possible quality. It still sounds too robotic. Do you have any advice?
Really depends on the voice your training it from, it usually adheres pretty close to it. Robotic can mean a lot of things so it's hard to diagnose, like it could be monotone, the prosody is off or it's garbled.
My advice is when training do small training sessions first and then continue for longer if you think it's sounding good. This is so you can listen to it before going all the way
@@Jarods_JourneyThanks. Do you have a video that goes over these details? I have an accent so I am thinking that may be part of the problem.
This is amazing! Question, is the generation speed faster than Tortoise?
It uses mrq's tortoise, so the same speed!
@@Jarods_Journey Awesome, thank you and thanks for your hard work!
Thanks a lot man, this is great. We continue to follow you, you always surprise us, thank you very very much.
Thank you so much dude, this is so sick.
works really great...thank you....just wondering if this would be faster using the Coqui XTTS
Thank you, Jarod 👋☺️
Ofc :)!
hi, I executed the start_package a prompt window blinked and then nothing happens, any idea? thx.
Is there any way to run anything locally that compares to ElevenLabs? I have an RTX 4090 GPU. Is ElevenLabs using Tortoise? What tips can you give?
For tortoise, does the voice that needs to be setup have to be one that I've trained myself?
Yup, either trained, or using the default autoregressive tortoise model. If you just use the default, you can put 1-3 10 second wav files in the voices folder and it'll generate with those too.
@@Jarods_Journey What name would I put in the "tort.yaml" if I choose the default autoregressive tortoise model?
Hi, Tortoise TTS is too slow for my pc. Can you make a tutorial to do it with Edge TTS instead? Thank you in advance sir.
What sort of model files are you supposed to put in the audio book voice models folder and how does the model choosen in the tortoise app effect this exaclty?
It seems like the model choosen in the audio book app is the true voice model you hear, why is this and why are two different models needed? I tried to move my .pth models from the finetunes folder into the audiobook voices folder so that i could use them but when i generate with them it causes and error. But if i use your sample azasu/pth file then it works. what is the difference? Am i copying the wrong file?
You mention something about rvc which i think might be related to all this but idk what that is nor can i find your video on it. (assuming you've made one)
You need an RVC model for the voice models in the audiobook maker, I've got several videos on the channel for it and if you search it on youtube, you should be able to by doing "RVC voice training".
Tortoise models need to stay within tortoise as I'm treating tortoise as it's own API in this case.
дядь, так мало просмотров, ты заслуживаешь большего, удачи Jarod. Привет из России.
Thank you!
so cool dude. big win.
Thank you!
Good job. This is really cool. Is it possible to hack in some sort of control over the intonation? It would be nice to be able to control the pitch per phoneme so you could manually fit the emotional tone of the text.
Unfortunately, not. It's uses a random seed for generation and there's no way to determine what it's going to output.
Can you make a video on how to use this tool in Google colab?
I cant wait to try this out! I've been looking for a way to convert school books into audioso I can study in more ways. Go figure, being a college stuedent who is only interested in learning I don't have hundreds of dollars to pay some gatekeeper.
Congrats and awesome work! Can you also include a kill switch to stop the process of generation?
Thank you, sounds like a good option!
Hey Jarods, thank you!!! Do you plan to make a tutorial about to how to train voice models on another languages?
Eventually, when I get there
Are there any free voice datasets. Just want to experiment without going to training which my PC is too slow probably to handle.
Can this be done in multiple languages? or just English for now?
"The audio quality of this is spectacular, especially since it's combining Tortoise and RVC."
Wait, wait, wait! So, that means I can use my voice from RVC (the path and index files), combine it with Tortoise, and make an audiobook with a perfect voice?!
I have a problem in your video tutorial 'Tortoise', I couldn't get cmd to recognize 'pyfastmp3decoder' despite putting the folder in 'ai-voice-cloning\modules'', not even by running pip install properly. Seriously, you are a delirium for beginners.
Amazing app, do you have any pointers to train a tortoise voice for spanish?
Not ATM, but I will when I start doing those types of training! You do need a custom tokenizer
Havent been able to get it to work for me. Still checking to see what I may be doing wrong. I'm using an Nvidia 3060. Is the voices index the wav files generated from a trained voice?
Other than that to add to the bucket list, maybe a function after compiling the audio, you can go back and change the voice of the dialog to emulate other people speaking.
Hop on over to the GitHub issues if you need assistance, maybe it's a bug!
Index is the index trained from RVC and is located in the logs folder after training. It's not necessary though
@@Jarods_Journey I got it fixed. I think its because everything wasn't up to date, so I just updated everything and reinstalled from scratch. I tested with Azasu voice and it worked out of the box. Thx bro! 😁
I referenced one of your videos in my last video (your RTX 3060 12gb vs the RTX 4090)
Haha appreciate it, not many people focusing on the AI hardware stuff needed, it's all gaming so it's good to see more content out there for it
Hey man, since past few videos I noticed that your voice is not in synch with video. In my case that was because I use voice matter and that cause delay (processing of sound). Pay attention to that element as its easy to quick fix right away when you import camera video
I saw the comment, but it doesn't look out of sync to my eyes, if it is, it's very slight by maybe a +/- .10ms or so. I do run multiple audio channels which is probably where this would stem from. Appreciate you looking out for it :)!
I've just installed it and tried it.
Is there a way to have a constant voice? Even when I set a specific seed in the tort file, the voice changes with every regeneration of the same clip.
Thank you sir
Oh, cool. way easier than manually inputting small bits on tortoise then manually giving them a pass on RVC later
what lip sync soft do you use?
Not working for me. I'll waiting for a all in one program . thanks.
Is there a voice that I can use commercially?
Awasome ......bro.
Is an Nvidia GPU absolutely necessary?
The voices are clear. Does it take cloned voices at 44.1khz?
It'll use RVC trained 48k and 40k
@@Jarods_Journey any steps on how to use existing audio ( like we do in tortoise) to have it used as a voice in 44 or 48khz) I like tortoise but 24khz is the biggest dealbreaker
str8 bangers
Any way to run this wiah an IRIS Xe GPU? I guess CUDA wouldn't work, right?
Omg coool!
Any chance of getting this running on a mac?
So now i set up windows from scratch. Total new. installed all the files on windows directory, still do i get the same error...cant anyone help?
I doesn't work. It says to much text and use smaller amounts. Ive already split the book into 14 bits. Still same error. Any advice?
Hey Vidney, I won't be able to offer much guidance here as that's an older version of the Audiobook maker, when I finish up the latest one, if this still occurs it might be something to look into. Usually, tortoise only likes a certain length of text being sent too it or else it outputs audio that is too long for it to process.
@@Jarods_Journey ok thanks
this is amazing, can someone explain the difference between the defusion samplers and wich is of better quality
You can check this out here: stable-diffusion-art.com/samplers/
Tbh, I have had both great and bad results with either, it doesn't seem to affect much but I have gotten the "best" from DDIM.
this is amazin
I'm getting this error at the moment [in my TorToise window]:
KeyError: 'insert_trained_name'
Not sure what's going on. I'm going to ask ChatGPT lol
Any help is appreciated, I can't wait to try this out.
same here?
Wow amazing stuff mans!
by any chance are there any premade models for this on some websites or something? i've noticed it takes different files for weights compared to the traditional RVC
Thank you! The audiobook maker supports RVC weights, but tortoise is a little different and requires tortoise models, currently havent seen anything online for people hosting these types of models though
Thanks
Appreciate the support! 🙏🙏
Hello, interested to know if these audiobooks will be accepted by audiobook publishers, esp findaway voices and acx are very strict, any update on that?
Depends on jurisdiction, country, etc!
it will be in USA@@Jarods_Journey
What will be the usecase for these audiobooks? is it for personal use or marketplace?@@Jarods_Journey
The quality of the audiobook is impressive so wanted to know if the book generated from here will be treated the same way as Human narrated?@@Jarods_Journey
thank youuu
>nvidia graphics card
>running on windows
😭
well i can try and get some friends to generate some for me
Whenever I download tts big file through cmd or visual studio terminal it gets stop at 99.99% . I dont know why plz help me.there occur so many error once it downloaded 99.99%
For tortoise, I'm not sure what may happening, you may need to wait it out and see if it finishes or try tortoise installation again by deleting the folder
👌👌👌
Trying Again: Audiobookmaker wont generate due to not reaching tts. TTS starts on port 7680. Audiobookmaker tries to reach but starts with port 7681, or at least it seems like that. Becaus TTS is open in the browse - can you help jarod?
did u fixed this problem? mine is running on 7860 but audiobookmaker cant connect trt other ports from 7861 upward!
@@feryserynope. Still not running. Would be so cool to have it fixed
same here @Jarods_Journey any solution?
did u resolve this? I have the same problem :(
@@yilinglaozu2754yes, 1) It works with the previous version of ai voce cloning (2.0) way more stable but also there 2) you need to make absolutely sure ai voice cloning (tts) is running and ready for generation BEFORE audiobookmaker is requesting a file from it. also the text file of audiobookmaker needs to be within the folder of audibookmaker. look for the ip address in the code of audio cloner. It must read the one that audibookmaker is listening to, while starting up, else dont even try and work on the right set up.
Friend, are you planning to add languages support rather than english?
In time, when I get there.
unable to download audio book file
why i got error when running "Start AudioBook Generation"?
error:
C:\audiobook_maker_v1.1>.
untime\python.exe .\audio_book_app_2_0.py
2024-01-27 10:45:45 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce GTX 1050 Ti
[nltk_data] Downloading package punkt to ./assets...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to ./assets...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to ./assets...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to ./assets...
[nltk_data] Package punkt is already up-to-date!
Calling API with sentence:
Failed to connect to port 7860, trying next port
Calling API with sentence:
Failed to connect to port 7861, trying next port
Calling API with sentence:
Failed to connect to port 7862, trying next port
Calling API with sentence:
Failed to connect to port 7863, trying next port
Calling API with sentence:
Failed to connect to port 7864, trying next port
Calling API with sentence:
Failed to connect to port 7865, trying next port
Attempt 1 failed, retrying...
Calling API with sentence:
Failed to connect to port 7860, trying next port
Calling API with sentence:
Failed to connect to port 7861, trying next port
Calling API with sentence:
Failed to connect to port 7862, trying next port
Calling API with sentence:
Failed to connect to port 7863, trying next port
Calling API with sentence:
what song is in the background?
I believe it's: You and me - sharou
@@Jarods_Journey ty
what about people that talks diferent languages? thx.
Not yet, but hopefully in the future~!
This is so cool! Just sent you a DM on Discord
Will this work with 8 GB NVIDIA GPU?
It should, work, but will most likely be slower so you'd have to queue it up and just let it go
Wow. I'll try. Can you add models voices Trump and Biden please?
Can we use other languages? Like turkish
English only atm, other languages tbd
@@Jarods_Journey Can i add model bymself like turkish and how do it im tryin to ai video dubbing locally rn
it sounds amazing, except I am on amd.
Is this available in Arabic, Persian, Urdu?
Why does that kind of stuff always need Nvidia? What's wrong with AMD?
Most AI project rely on libraries that take advantage of CUDA which is Nvidia proprietary. There's a lot more that goes into it but this is the base reason.
@@Jarods_Journey Damn, when I looked up the differences between AMD and Nvidia I didn't see anything about that, just graphic stuff that didn't seem to matter too much :(
I'll keep that in mind the next time I build a PC, thanks.
take away: #2. your father's advice. He's very correct.
Amd users go reeeeee
I tried it in French and Arabic. I laughed a lot because I heard languages that do not exist at all. Please, how do I run it in other languages with your program or Tortoise TTS?
Once I get to training other languages, I'll make some videos on it :)!
@@Jarods_Journey Thank you. I forgot the English language with the same settings as yours. Great smoothness and natural pronunciation. How long does it take to train your model please and what is the best way to make models like it?
Hi Jarod's journey, how are you friends? can you help me
1. To Run tortoise tts in the Hindi language I tried but open ai whisper large model does not give a good result output. It is giving an English accent effect in Hindi output and is also not reading properly. It requires a dedicated trained model for the Hindi language. is it a tough task to train a Hindi language whisper model, if you think it is not tough. can you help me to train? i will pay you for this.
And if "tortoise tts+rvc" gives better results than do have any GUI-based project to use for the Hindi language. if yes please help me on this either.
2. I want to install and use your audiobook maker but for the Hindi language can you help me with this also? I will pay you for this.
Techgigachad.
Wee
:)
So i had this working for one file, now this is all I get. Any suggestions?
C:\audiobook_maker>call .\venv\Scripts\activate.bat
2023-10-16 17:20:50 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce RTX 3060 Ti
Calling API with sentence:
An unexpected error occurred: 'data'
Calling API with sentence:
Failed to connect to port 7861, trying next port
Calling API with sentence:
Failed to connect to port 7862, trying next port
Calling API with sentence:
Failed to connect to port 7863, trying next port
Calling API with sentence:
Failed to connect to port 7864, trying next port
Calling API with sentence:
Failed to connect to port 7865, trying next port
Attempt 1 failed, retrying...
I had it working for a couple of days and loved it. Yesterday I started getting this too. Nothing changed. I had thought I must have a corrupted file or something so I reinstalled tortoise. Same thing still happening. I also notice that my 16Gb GPU goes to 98% VRAM use and when I kill the terminal for Tortoise and Audiobook maker, python is still running in the background and using 50% of my VRAM. I have to kill it using task manager.
@@carnacthemagnificent2498 @jarods_journey yeah i reinstalled all of it and it still is not working. Sad panda, because this is an amazing piece of software that Jarod has created.
@@carnacthemagnificent2498 I reinstalled python and it is now working again.
@@Saepak1977 can u give me tutorials how to reinstalled python?
or any tips solved error i have?
i have same error seems like you error when i running "Start AudioBook Generation"
error:
C:\audiobook_maker_v1.1>.
untime\python.exe .\audio_book_app_2_0.py
2024-01-27 10:45:45 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce GTX 1050 Ti
[nltk_data] Downloading package punkt to ./assets...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to ./assets...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to ./assets...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to ./assets...
[nltk_data] Package punkt is already up-to-date!
Calling API with sentence:
Failed to connect to port 7860, trying next port
Calling API with sentence:
Failed to connect to port 7861, trying next port
Calling API with sentence:
Failed to connect to port 7862, trying next port
Calling API with sentence:
Failed to connect to port 7863, trying next port
Calling API with sentence:
Failed to connect to port 7864, trying next port
Calling API with sentence:
Failed to connect to port 7865, trying next port
Attempt 1 failed, retrying...
Calling API with sentence:
Failed to connect to port 7860, trying next port
Calling API with sentence:
Failed to connect to port 7861, trying next port
Calling API with sentence:
Failed to connect to port 7862, trying next port
Calling API with sentence:
Failed to connect to port 7863, trying next port
Calling API with sentence: