How I Do Voice Cloning in Other Languages with Tortoise TTS - Dataset and Tokenizer
HTML-код
- Опубликовано: 20 сен 2024
- Links referenced in the video:
Github - github.com/Jar...
Karpathy's tokenizer video - • Let's build the GPT To...
Timestamps:
0:40 - Explaining the process
1:23 - ytdlp script
3:30 - Transcription script with whisperx
7:20 - Merge folders after transcription
8:30 - Resampling to 22k hz
13:25 - Uploaded scripts :)!
14:03 - Making a tokenizer in another language
16:30 - What is the tokenizer for?
21:05 - Quick explanation on tortoise cleaners
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/Jar...
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoff... - Наука
Will this gonna be on github later?Also I appreciate your effort on making these kind of video. Keep up the good work.
I don't understand where I went wrong. I'm training Vietnamese language. I used about 1 hour of my voice for training, created tokenzier with your python file for Vietnamese language "vi". Then I tested it with a sentence that was already in the audio sample. It produced a sound that was my voice. However, the sound produced was meaningless, not Vietnamese at all. Please tell me where I went wrong??
Thanks 🙏 you deserve more subscribers.
I deserve more subscribers, too, not only Jarods.
Is there some place we can just download some working voices? I don't needs a "specific" voice just something as polished as possible. I'm wondering if I can use this to do "higher" quality TTS to listen to documents or ebooks. The processing time seems like it is going to just make that impossible regardless but I'd like to have some reasonable voices in the can just to play with... I've tried making a couple voices... they work... they're not great. Just want to download a sample voice that is polished if possible.
Wow! Outstanding! Can you please tell me while taking the playlists for training, were those from single speakers or several ones?
Nice explanation
Jarod, what do you think, if there is a haginface dataset that contains audio tracks and decryption text for them, is it possible to use such a dataset with this project, without having to extract audio from it?
P.S. A very useful video, especially about how english_cleaners breaks non-English languages) I'm going to screw on the Slavic tokenizer))
P.P.S. I'm looking forward to the second part of the preparation!)))
I'm also thinking about checking the decryption of audio tracks from such datasets. Since I saw for myself that in some cases the transcription and the sound in the audio do not match (sometimes people mess around and record just unrelated sounds). Well, exclude tracks that mostly do not match what is indicated in the transcript.
Hi sir!
You're doing a great job with TTS. Are you planning to release the Hindi TTS model?
Wont be releasing the model, but the code to train it will all be available
Thank you! what is your recommendation for the dataset length for high quality result?
do we need to make new token if the language only using latin character?
Suppose I have Hindi language audios with transcription I manually created or use any script,
You changed the code to the point that many stuffs are broken. For now, its unusable
Can you share your code using in this video
Appreciate your work, but it's complicated to understand could you please explain with just simple examples