Open Source AI Audiobook Maker - Installation and Usage

Jarods Journey

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 12 дек 2024

Комментарии • 104

@Jarods_Journey Месяц назад ⁺⁷
If you run into any issues or have any ideas, please open up a new issue here: github.com/JarodMica/audiobook_maker/issues
Try to make it as descriptive as possible if it's an issue and the same goes with improvements.
@aaagaming2023 Месяц назад ⁺³²
You need to add E2-F5-TTS imo.
@iaincampbell4422 Месяц назад ⁺¹
Defo I actually watched hoping it was using E2-F5 TTS!
@wongr643 Месяц назад ⁺⁴
Mate just want to say i have been following you for some time and really appreciate your tutorial on AI Voice cloning/TTS. Probably the best out there for this niche
@Jarods_Journey Месяц назад
Appreciate it :)!
@neonpowar3766 Месяц назад ⁺¹
was just following your old tutorial when i checked your channel and saw this. good luck in all your future endeavors. funily enough one of the things i will be using your audio book maker for is to turn the re zero web nobel into audio books
@Jarods_Journey Месяц назад
Ayee I approve the choice :)!
@edalot Месяц назад
Exactly what i wanted to go too 😅
@watcher_xD 20 дней назад ⁺¹
When i try to install git things mean in CMD
It detracts git installed in cmd i checked
That says nothing and goes on
When put that patch it says can't find that path in tortes
I got the problem 9:38 here.. git modules
Not installing
(venv) C:\Users\fylte\Desktop\audiobook_maker-master\modules\tortoise_tts_api> git submodule status
(venv) C:\Users\fylte\Desktop\audiobook_maker-master\modules\tortoise_tts_api
Like this response got
@Curtis25 Месяц назад ⁺¹
Thank you so much for your work. This project is just amazing. It would be cool to have the option to export it as an M4B file, instead of an mp3, or to have the option to export every chapter as a separate audio file.
@strangeboltz Месяц назад
I buy'd you a coffee for the audiobook maker! thank you so much for this.
@NFawc Месяц назад ⁺¹
This is exciting stuff. I'm more than happy to in effect pay once for the project if it's then onwardly supported/developed. ;)
@ASlaveToReason Месяц назад ⁺¹
Sweet thanks for the update. I cannot wait till theres some ai agent which can parse different characters in books so we can feed it into tjisn
@djwhispers3157 Месяц назад
this is a great tool for the project that need, which is a short story. i cannot wait to use it. I just need a tool to help clone and create voice models to read the story in different characters.
@richardkuhne5054 Месяц назад ⁺²
I‘m looking for some kind of an immersive reader with a decent text to speech system but also highlights the words so the text is there as a support if you want to follow. Any suggestions for this?
@daywizzle Месяц назад
Probably speechify tbh?
@donmarshal2070 Месяц назад
Balabolka checks all of your needs. If you set it up correctly then you are good to go for 100 years.(Speaking from experience)😂
(Just get decent Natural voice or see how to use Natural Edge voice in balabolka)
@mikeutoob Месяц назад
Read Aloud using Edge web browser
@yanglangfu773 Месяц назад ⁺¹
Is that Kenjiro Tsuda speaking English? So cool, like, 99% clone 🤯
@Jarods_Journey Месяц назад
yup!
@daviddurand1656 Месяц назад ⁺⁴
Hi Jarods, Is there any foreign languages available like French ? Thnx
@Jarods_Journey Месяц назад
Each of the open source engines (tortoise, xtts, styletts, f5tts), whatever languages those support will be supported. This includes custom models that a user may have trained.
@puntogcb Месяц назад
@@Jarods_Journeythat means we can add text in spanish (my interest) and that will do? I live your work, and fir sure i will buy you a coffee! Thank you very much for this!
@tempertephra Месяц назад
Thank you for creating this 🥳🎉 Is it only my impression or is RVC functioning worse than in the cloner? Most of my model don't give the genuine voices they used to. Is there a way to adjust this?
@Melike-oh1ir Месяц назад ⁺¹
Incredible project and amazing achievements tbh congrats man. My only issue is that no matter what model I choose my voices always end up super dark pitched(like sauron lol) any clues as of why? I've played around with Pitch and pich methods to no avail. Tried over 4 cusotm trained models EDIT: This only happens with RVC enabled. EDIT2: Feel so stupid it was the sample rate I had to change. Cheers!
@Jarods_Journey Месяц назад ⁺¹
Yeah, it's currently a small bug in the rvc library! I'll have to fix it, but SR can be lowered to 0 to resolve it for now
@stevewarby12 Месяц назад
Great will buy later.
On the text files it would make sense to allocate voices in there.
Eg if generating from AI ask it to use format
V1: audio text1
V2: audio text 2
V1: audio text3
Then these would auto map to the selected voices index.
Eg if the first voice is me all lines with V1: will use this voice.
This would save a lot of time manually selecting each voice per line.
@stevewarby12 Месяц назад
Even if the story has been written openai could re format the text.
@Jarods_Journey Месяц назад
Yeah, I'm thinking about how to incorporate it. I could support a custom speaker import option, but I have to think on how I want to make this option available in the audiobook maker
@stevewarby12 Месяц назад
@@Jarods_Journey On the voice selection per line. Loads of different colours looks very confusing. Have a separate column simply with the speaker name and or meme of the speaker.
PS For anyone else... Pay the $14.99 it just worked. No spending hours setting up environments and pip installing for ages.....
@puntogcb Месяц назад
I will be purchasing the package to practice using it soon. I would like it to have a language selection option, not only for the entire audiobook, but for some sentences as well. I am interested in Latin Spanish, and with variations of accents, for example Argentine. Would you add this functionality to this project? Thank you very much again for all this incredible work.
@wakasm Месяц назад
I have a use case where I have a database of like 1000 different lines or paragraphs. Is there a way to just jump to a specific line and play that (even maybe through some sort of API or through the interface?) - And specifically map the entries to specific labels? (not nessisarily 1-1000, but maybe some numbers skipped or even stuff labeled like A1, A2, etc) - Think choose your own adventure, that's kind of close to the use case I would try this for.
@Jarods_Journey Месяц назад
I'm not quit sure I understand the use case here, but there's only a scroll bar in the table right now that you can use to go up and down. Custom labeling other than speakers is not supported atm
@ming3706 Месяц назад ⁺²
So can i download this audio after it is done and upload to my phone to listen?
@Jarods_Journey Месяц назад
Yup! It's all yours so do with it what you will
@KnutNukem Месяц назад
Great project!
@skistenl6566 Месяц назад
Could you please explain how RVC settings and Tortoise Settings are different? I put in my RVC model in the settings, check Use s2s Engine. But the result is still the random voice from Tortoise
@phenix5609 Месяц назад
That really impressive, i couldn’t watch the full video yet, maybe you talk about it inside, but did you had time to include the e2,f5 new tts voice cloning app you show in one of your video ? Because there "podcast" option need to have the text formatted with the name of each speaker at the first word of there sentence, like: speaker1: …., speaker2:…, and then you give them an audio sample of 10 sec for each voice , and they do like you show at the start of the video. Really impressive. But i only try with English as it say, it work with English and chinese, and i didn’t try yet to see a result for Japanese or french, for me, not sure it would work great, and don’t know how to train a voice with their tech.
@Jarods_Journey Месяц назад
F5 will be included in the audiobook maker, other people seem hard at work to adding more languages for it though rn
@mauricio9581 Месяц назад
Let's say I am not happy about how the Narrator is saying the sentence with its emotion. Can i use my own voice in combination with the Narrators voice to improve emotional way it says a sentence? How can I implement it?
Amazing Tool btw! Love your content
@f4ture Месяц назад
hmmm why would i need to purchase the install package when someone pull requested a open source installer at your github?
@SavvyStaks Месяц назад
I think if you run various softwares on a cloud machine like Google Collab, Lightning AI, Kaggle, etc., then everyone will have the opportunity to use the software, because not everyone has a PC with high configuration.
@Jarods_Journey Месяц назад ⁺¹
It's possible to outsource the generation to cloud compute, but unfortunately, I don't wanna play around with making an application compatible with cloud machines as I'd have to maintain it and I personally don't use much cloud myself. I'm a big fan of having things locally and as open source gets better, models also get more efficient.
@tripleheadedmonkey6613 Месяц назад
I thought I recognized that first voice. So much more familiar speaking in japanese lol.
@Jarods_Journey Месяц назад ⁺¹
If you've watched any anime in the past 5 years, you'll have encountered him lol
@tripleheadedmonkey6613 Месяц назад
@@Jarods_Journey Yeah, ever since he showed up in "Demon Lord Retry" I've been seeing him in literally every anime.
@davidmilligan4751 Месяц назад
getting alot of errors trying to install the rvc files i bought the packaged files and it seems i dont have something right. please help
@Jarods_Journey Месяц назад
Hey David, please open up a new issue on the GitHub issues tab and share the error that your getting in the terminal so that we might be able to figure out what's going on: github.com/JarodMica/audiobook_maker/issues
@mash9653 Месяц назад
when i restart this project it's show " Configuration file/tts_config.json not found"
But
@Metarig Месяц назад
Hey, why not use ChatGPT's advanced voice and then switch the voice later with ElevenLabs?
@mucool328 Месяц назад ⁺¹
Expensive?
@Metarig Месяц назад
@@mucool328 like $20?
@Metarig Месяц назад
@@mucool328 it's only $20.
@Metarig Месяц назад
@@mucool328 To create an audiobook, I'd spend about $50 to get better quality.
@Metarig Месяц назад
@@mucool328 Are you serious?
@danieldorszu1317 Месяц назад
Hey Is it possible for a program to automatically select different voices for txt an e-book? Than just dialing manually ?
@webinatic216 Месяц назад
Imagine writing a book then.
@Jarods_Journey Месяц назад
Yes, a proof of concept has been proven with chatgpt in it's ability to label sentences. But I need a specific format, so working through some ideas on that
@yingjiang5724 25 дней назад
Still, Can you make it easlier to use?
@SaschaFrenzer Месяц назад
Great! I wish I could use it in German. Future update for multilanguage maybe?
@Jarods_Journey Месяц назад ⁺¹
Possibly! XTTS would support that I believe, but that one is that last engine I'll be adding in
@SaschaFrenzer Месяц назад
@@Jarods_Journey That would be brilliant. I write short stories for my nieces (3 and 6 years old) and have already recorded several of them. Unfortunately, I am increasingly lacking the time for this, which is why I have been watching your videos about the audio book maker for a long time.
@MrEffectfilms Месяц назад
I have an Nvidia GPU but how important is the 8gb of vram? I have a GTX 1660 super which has 6. Will this just not work?
@Jarods_Journey Месяц назад ⁺¹
That should be able to work, you might top out though if using tortoise TTS. If you're familiar with styletts2, when I release the engine for that, it should be able to inference on that without issues.
@anthonygranda7977 10 дней назад
For some reason it's not finding tortoise when I try to load voices
@holdthetruthhostage Месяц назад
My question is what's the limit per word, because with Eleven Labs it starts breaking down pass 800 words
@Jarods_Journey Месяц назад
Give or take 15-20 seconds max for a tortoise tts example, 20-30 seconds with styletts, and up to 30 seconds with f5tts. Not too sure on the breakdown when it comes to words though.
@martinbobis6764 Месяц назад
where do i see the PC req for running this?
@martinbobis6764 Месяц назад
nvm, i'll give it a try with an old gtx 1070 8gb i dont need to generate that much anyways
@donmarshal2070 Месяц назад
Question:- I only have 6gb Vram but have 64gb ddr5 Physical RAM.
So will it work on my system or it just works on Vram?🧐
1. Its laptop not pc so no gpu upgrade 😭
2. I 40B parameters LLM model on it without any hitch ups, (70B with 40 wpm). And it works as when Vram fills out it utilizes Physical RAM.
So will this work like same ie. Will Use Physical RAM after Vram is completely utilized?
@Jarods_Journey Месяц назад
6gb of vram should work, I think you'll just be topping out a bit with tortoise tts. I think though all of the engines I'm planning on adding inference with at most 4gb of vram needed. It will overflow to ram though if it gets completely utilized afaik.
@donmarshal2070 Месяц назад
@@Jarods_Journey thanks man! I'm currently doing LLM training on laptop & all TTS I've been training are Given input only in "IPA" not "Text typing", so I'm getting better results in form of pronunciation but as far as TRAINING voice using audio clip is not working due to Vram limitation.
So I've updated physical ram to compensate for it.
So hope it works
@zanshibumi Месяц назад
Why is each line read in a different voice? There seem to be 2-3 voices and each line is read by one of them.
@Jarods_Journey Месяц назад
In the video? Well, I selected them. If you're running with random, it will change voices as well.
@zanshibumi Месяц назад
@@Jarods_Journey I don't understand how to not run it random. With one narrator and nothing added, the tortoise panel doesn't allow other option than random. How can I set it up so every sentence is read in a single voice?
@akum4501 3 дня назад
@@zanshibumi did you find a solution? Having the same problem
@zanshibumi 3 дня назад
@@akum4501 No. I assume it's an inevitable consequence of generating the voice independently by sentences.
@davepierunc Месяц назад
So a 4gb Nvidia won't cut it, right?
@Jarods_Journey Месяц назад
Possibly, I don't think you'll be able to do it too well with tortoiseTTS, but when I finish the styleTTS2 engine, 4gb would be fine
@DarinLawsonHosking Месяц назад
Any chance this runs on AMD RX 6800 XT?
@Jarods_Journey Месяц назад
Unfortunately not, AMD support is limited on most of these engines and as well, I don't have AMD to test on either. Sorry!
@iaincampbell4422 Месяц назад
Thinking of purchasing two quick questions, are you planning to implement E2 F5 TTS at some point it's way more expressive! Also will it work on Apple Silicon? (Im on an M1 chip!). Thanks for a great project!
@Jarods_Journey Месяц назад
E2/F5 will be implemented soon, currently finishing up styletts then I'll work on that.
Unfortunately, no Mac support atm! It may work if you hack around, but I don't have a mac and haven't tested that use case.
@vidneypopples Месяц назад
I've accidentally paid for this on buy me a coffee page but I'm a pay monthly user. Can I be refunded the $14.99 please?
@rajumolla7059 Месяц назад
Thank you INFLUENCER PANEL for your tremendous support of my AudioBooks Channel are on All Social Networks face, RUclips.
@abhaygholap3613 Месяц назад
Who will support which languages?
@Jarods_Journey Месяц назад
If you're familiar with these open source engines, it supports whichever language your chosen engine will support. The parser is designed for english right now though, so best compatibility with english.
@Point.Aveugle 27 дней назад
I've been using this with the new F5TTS engine and captures the person so much better for some people. You basically have to enable "use duration prediction model?" to get speaker to actually nail the sentence at normal pace, but if the sentence is too long it starts skipping words.... Didn't ever experience this with the gradio demo they released. Also thought we would escape from the issue of long wav files having to be converted every new sentence. Luckily I was storing in that same folder all the voices split into 30 seconds segments, so I just needed to rename the folder to F5TTS. One question I do have is how to re-enable deepspeed for tortoise? Is it as simple as uninstalling 2.4 and installing pytorch 2.3? Is it even worth it?
@TejasSurvase-vf1pw Месяц назад
Thank you INFLUENCER PANEL for your tremendous support of my AudioBooks Channel are on All Social Networks face, RUclips.

Следующие

Автовоспроизведение

Shoot EPIC MOVIES with this FREE AI Tool! [ComfyUI Tutorial + Free Workflow]