Yannic, thanks for doing this! Quick question: why instead of fiddling with the aligner they did not start training from smaller samples like one phoneme long and then as loss drops gradually increase the sample length to 2, 3, etc? It seems too much black magic going on in training a tts model. Do you have a suggestion for the most clean architecture that works well? Is there a good review of one step tts models? How can a speaker embedding can be integrated for voice cloning into such model? Sorry for too many questions…
Quality wise it’s not better than tacotron (see MOS scores in the paper-tacotron is about 4.5, this approach is about 4.0). But unlike tacotron, it’s not autoregressive, so inference can be much faster.
people want realistic TTS voice that sounds high-quality humans. not robot voice. Robot Voice is usually free. But it's stupid. Most businesses use high-quality human voice synthesis.
I absolutely love the content but I vote for not saying "As always if you like this work subscribe" I believe if people are exploring AI videos, they probably know where the subscribe button is, and if they like the videos, they'll probably subscribe. Plus we've heard it a billion times in every video on RUclips ever made. It just becomes noise at a certain point. At this point I’m thinking of training an AI to skip every time someone says that. Nevertheless, they're your videos, and a personal choice, not a democracy. Feel free to disagree. Don't mean to be mean or anything.
Hi but I also don't agree with you, When I am doing any kind of research I just open 10s of tab and start exploring it one by one and sometimes if I get the right content I learn the stuff and leave, Also there are analytics that youtube provide which might show that most of his viewers are not his subscribers.
Lakshay Chhabra You think the majority of people will subscribe because he reminded them to subscribe? I don’t doubt that that might occur as I really have no way of checking that. I agree that some way of determining that from the analytics would be better.
For me I usually have a few trial videos before I subscribe but I must admit being told to subscribe lets me evaluate if I should subscribe instead of just exiting like what Lakshay suggested. I agree with not telling people where the subscribe button is though.
@@siyn007 Oh sorry but I don't think Yannic specified where the subscribe button was. I just meant to point to saying whether or not to subscribe. I see. Good to know that there's the opposite take there. It's definitely not the end of the world. :D I still love Yannic and his videos.
This is one of the things that, yes, is slightly annoying, but you'd be surprised how many people who aren't subscribed go "oh yes, I could do that". So I try to give you the high level before I say that so that you can decide to skip the video without having to listen to it :)
Hey Yannic!
just wanted to thank you for the excellent content that you provide.
Keep it up man :)
Never enjoyed paper explanations this much.
Thanks, Yannic!
Wow... Why not more TTS papers explanation
Yannic, thanks for doing this! Quick question: why instead of fiddling with the aligner they did not start training from smaller samples like one phoneme long and then as loss drops gradually increase the sample length to 2, 3, etc? It seems too much black magic going on in training a tts model. Do you have a suggestion for the most clean architecture that works well? Is there a good review of one step tts models? How can a speaker embedding can be integrated for voice cloning into such model? Sorry for too many questions…
Awesome. Superb explanation. Love the channel and content 👍👏🙂
Thank you! It includes so many ad-hoc. I wonder why it's better than combination of Tacotron+WaveNet?
Quality wise it’s not better than tacotron (see MOS scores in the paper-tacotron is about 4.5, this approach is about 4.0). But unlike tacotron, it’s not autoregressive, so inference can be much faster.
ah always so fast, I heard the google released pre trained weights for big transfer, could you also make a video on BiT?
@Mallow Marsh oh ok, I'll go through his videos then
Whose videos? The comment was deleted. Thanks.
anyone knows where to find this code implementation
Visual Transformers tomorrow?
Great explaination Thank you so much
Great work!
You're the best
Nice !
Amazing! Do we have any GitHub code or pretrained model weights available?
I don't think so
Can I try it somewhere?
Not sure. I've linked their website in the description
Awesome.
Thank you!!
Its working in real time ?
Anything can be real time if you have enough compute
I don't think so
@@herp_derpingson the singularity is far 😵
@@YannicKilcher Hi :) why do you think so? this seems non-autoregressive model and I think its inferences are so fast...
people want realistic TTS voice that sounds high-quality humans. not robot voice. Robot Voice is usually free. But it's stupid.
Most businesses use high-quality human voice synthesis.
I think Tacotron sounds better
First !
I absolutely love the content but I vote for not saying "As always if you like this work subscribe" I believe if people are exploring AI videos, they probably know where the subscribe button is, and if they like the videos, they'll probably subscribe. Plus we've heard it a billion times in every video on RUclips ever made. It just becomes noise at a certain point. At this point I’m thinking of training an AI to skip every time someone says that.
Nevertheless, they're your videos, and a personal choice, not a democracy. Feel free to disagree. Don't mean to be mean or anything.
Hi but I also don't agree with you, When I am doing any kind of research I just open 10s of tab and start exploring it one by one and sometimes if I get the right content I learn the stuff and leave, Also there are analytics that youtube provide which might show that most of his viewers are not his subscribers.
Lakshay Chhabra You think the majority of people will subscribe because he reminded them to subscribe? I don’t doubt that that might occur as I really have no way of checking that. I agree that some way of determining that from the analytics would be better.
For me I usually have a few trial videos before I subscribe but I must admit being told to subscribe lets me evaluate if I should subscribe instead of just exiting like what Lakshay suggested. I agree with not telling people where the subscribe button is though.
@@siyn007 Oh sorry but I don't think Yannic specified where the subscribe button was. I just meant to point to saying whether or not to subscribe.
I see. Good to know that there's the opposite take there. It's definitely not the end of the world. :D I still love Yannic and his videos.
This is one of the things that, yes, is slightly annoying, but you'd be surprised how many people who aren't subscribed go "oh yes, I could do that". So I try to give you the high level before I say that so that you can decide to skip the video without having to listen to it :)