Hello, here is an Applio developer. It is a great achievement for us that a programme developed by four young people is compared to a service with millions of investments. In our first year of development, we have reached goals that some companies do not achieve. We have many more goals to achieve and many more products to create, always free and open source for the community. Thanks for your review Bob! 💚🚀
@bygimenez Hi, I really like Applio. Imo its the best tts with rvc out there, because its so simple to use while it delivers good results. But I cant find any tutorials how to train a model properly. Bob already helped me alot with his video, but I need more. Have you any link where I can find more about Applio?
@@smite4318 Can you tell me where you're getting slowed down in the process? I feel like I go through the process in this video, so I'm curious what's missing for you that I can clarify.
@@BobDoyleMedia Thx for the fast reply :). I watched your other video about Applio but I questioned myself about following settings (training): How does the custom pretrained work? Does batchsize: 1 delivers better results then 4? (I mean 4 is better then 8? but slower) Does the amount of epoch matter, when overtraining ends before: for example "Luke_136_xxx.pth" is the latest file and i selected 500 epoch before but with overtraining detector, it doesnt matter right? Also I dont know if you did a video about the audio: ofc it has to be clean, but does it matter when I choose 25min audio instead of 50 short audio files with the same time? and is it better when the speaker speaks constantly in the same tone or should it be different?
I feel like that was where I forgot to change the language to German, so it would have been holding onto whatever the setting was before. That way my bad...
@@BobDoyleMedia Yes was about to say the same, but the accent was american, the language japanese, so I don't know if it was because of that. Unfortunately I cannot find Kits on pinokio so I go for Applio, thank you!
In my opinion, the German translated version of Kits sounded more natural The Applio version was not bad and understandable, but sounded like the artificial overdubs you find on many youtube videos. Thanks for the comparison. Greetings Martin
Text to speech in Udio is literally the best text to speech. It’s the most expressive. You should focus on testing Udio PURELY on spoken word. What’s great is it can go from spoken word to musical, it’s extremely expressive. After playing with this for a while I’m convinced it has some understanding of the meaning of what text is there. Which is incredible. Like it can’t just be random. It may however be less consistent if you need that and you may have to work with it to get the exact thing you want. But I wouldn’t ever use a different one now. Who’d have thought a music Ai can do spoken word, standup, arguments, lectures etc There so much creative detailed stuff you can do here. We’ve not even touched the surface here!
My hesitation with getting too invested in creating content I'd use commercially with Udio, is the legal aspect of ownership and copyright. I"ve been watching a fair amount of videos about the legal aspects, and the fine print in the Udio user agreement. But I agree that Udio creates wonderful TTS results. Comedy routines, musicals with different singers and speech - you know all this. It's amazing.
The Kits French sounds more accurate/arrogant :D Seems there's an opportunity their for someone to make an interface where you just choose the 5 languages you want and press "Generate". It would only be a matter of linking these through an API. Btw, perplexity seems to do better job than Google translate, if the Danish translations are anything to go on. I guess in AI the context is more understood? Or maybe it's just Gemini :)
This is a great job, very useful. Thank you! Have you tried other techs and how they compare and/or integrate with these? Tortoise, OpenVoice, Piper, RVC... and ultimately compare the result of an open source workflow with the exact texts and voice on Eleven Labs.
Applio is basically RVC. It just puts everything together in a nice free package. I've played with Tortoise a lot, but it's been a while. Still not happy with the quality of the clones. Don't know about Piper - will check that out, and haven't looked at OpenVoice either, at least not that I remember. And I've done a LOT with ElevenLabs. My video on that actually jumpstarted this channel a good bit. I'm just always on he lookout for open source alternatives. I would be worth doing a side by side with 11 labs...and there is also Play.ht (I think they just modified their name a bit) which I've also done a video on.
Bob Im 79 got a 24 track home studio and had a few covers with a London publisher Rondor early 80s also recorded an Elvis tribute song at Rockfield studios in 1977..Why Im telling you all this is the questions I sometimes ask you might think comes from a dozo!! Although Ive been in the music business for quite sometime Ive been lazy taking in information which is much worse now at 79..You are a great presenter who doesnt rush things and its great for an oldie like me..I did manage to work out Relay and want to know do you do a tutorial for ACE Studio Thanks Frankie
Dear Bob. when ever I let Applio speak german it has a very american accent. But your test sound very german. After watching your video multiple times I do not find the reason. I make sure that the TTS Voices is a german one, but still it has an accent. Does the Model has to be german, that would make sense, but as you used your own... ...idk. Do you have any idea. Best regards Anton
Ironically, as I remember, I think I actually forgot to change the voice to German for the TTS, didn’t I? So I think it was actually working with the Japanese voice or the English voice. I’d have to go back and look closely.
Sadly the wav2lip looks terrible and it's not usable in any real life production .(just like in the other videos I've search about it). It' s very sad we still don't have a reliable audio 2 lipsync technology available for use today.
The last French example with the lipsync sounds very good. It has a slight American accent, but not as strong as the one on GPT app. Eleven Labs still sounds better, but this is very good. Maybe the accent is due to its training?
@@ludoviclebleu If you're talking about the French example I ended up using for the last video, I realized I ALSO forgot to change the language to French in Kits...so that could explain the dialect problem. I think it was still set on Japanese or something.
Hello, here is an Applio developer. It is a great achievement for us that a programme developed by four young people is compared to a service with millions of investments. In our first year of development, we have reached goals that some companies do not achieve.
We have many more goals to achieve and many more products to create, always free and open source for the community.
Thanks for your review Bob! 💚🚀
It's a super useful toolset!
@bygimenez Hi, I really like Applio. Imo its the best tts with rvc out there, because its so simple to use while it delivers good results. But I cant find any tutorials how to train a model properly. Bob already helped me alot with his video, but I need more. Have you any link where I can find more about Applio?
@@smite4318 Can you tell me where you're getting slowed down in the process? I feel like I go through the process in this video, so I'm curious what's missing for you that I can clarify.
@@BobDoyleMedia Thx for the fast reply :). I watched your other video about Applio but I questioned myself about following settings (training): How does the custom pretrained work? Does batchsize: 1 delivers better results then 4? (I mean 4 is better then 8? but slower) Does the amount of epoch matter, when overtraining ends before: for example "Luke_136_xxx.pth" is the latest file and i selected 500 epoch before but with overtraining detector, it doesnt matter right? Also I dont know if you did a video about the audio: ofc it has to be clean, but does it matter when I choose 25min audio instead of 50 short audio files with the same time? and is it better when the speaker speaks constantly in the same tone or should it be different?
For German, the Applio voice sounded more synthetic and the Kitt voice had a bit of an English accent. But not as synthetic. Thx for your great work!
I feel like that was where I forgot to change the language to German, so it would have been holding onto whatever the setting was before. That way my bad...
@@BobDoyleMedia Yes was about to say the same, but the accent was american, the language japanese, so I don't know if it was because of that. Unfortunately I cannot find Kits on pinokio so I go for Applio, thank you!
In my opinion, the German translated version of Kits sounded more natural
The Applio version was not bad and understandable, but sounded like the artificial overdubs you find on many youtube videos.
Thanks for the comparison.
Greetings
Martin
Thank you for your feedback!
Thanks again for all these videos, they've been great for my channel.
Text to speech in Udio is literally the best text to speech. It’s the most expressive. You should focus on testing Udio PURELY on spoken word. What’s great is it can go from spoken word to musical, it’s extremely expressive. After playing with this for a while I’m convinced it has some understanding of the meaning of what text is there. Which is incredible. Like it can’t just be random.
It may however be less consistent if you need that and you may have to work with it to get the exact thing you want. But I wouldn’t ever use a different one now.
Who’d have thought a music Ai can do spoken word, standup, arguments, lectures etc
There so much creative detailed stuff you can do here. We’ve not even touched the surface here!
My hesitation with getting too invested in creating content I'd use commercially with Udio, is the legal aspect of ownership and copyright. I"ve been watching a fair amount of videos about the legal aspects, and the fine print in the Udio user agreement.
But I agree that Udio creates wonderful TTS results. Comedy routines, musicals with different singers and speech - you know all this. It's amazing.
Udio is not for music?
@@hikmetmertdincer6816 yes, it’s for music. It just happens to also do this
@@BobDoyleMedia Try to compare this to 11eleven T2S
The Kits French sounds more accurate/arrogant :D
Seems there's an opportunity their for someone to make an interface where you just choose the 5 languages you want and press "Generate". It would only be a matter of linking these through an API.
Btw, perplexity seems to do better job than Google translate, if the Danish translations are anything to go on. I guess in AI the context is more understood? Or maybe it's just Gemini :)
Thanks so much for your work bob❤😊
Thank you!
Can the face fusion be still used in webcam and if yes I couldn’t find it after the installation. Please why’s that or am I doing something wrong
This is a great job, very useful. Thank you!
Have you tried other techs and how they compare and/or integrate with these?
Tortoise, OpenVoice, Piper, RVC... and ultimately compare the result of an open source workflow with the exact texts and voice on Eleven Labs.
Applio is basically RVC. It just puts everything together in a nice free package. I've played with Tortoise a lot, but it's been a while. Still not happy with the quality of the clones. Don't know about Piper - will check that out, and haven't looked at OpenVoice either, at least not that I remember.
And I've done a LOT with ElevenLabs. My video on that actually jumpstarted this channel a good bit. I'm just always on he lookout for open source alternatives.
I would be worth doing a side by side with 11 labs...and there is also Play.ht (I think they just modified their name a bit) which I've also done a video on.
Bob Im 79 got a 24 track home studio and had a few covers with a London publisher Rondor early 80s also recorded an Elvis tribute song at Rockfield studios in 1977..Why Im telling you all this is the questions I sometimes ask you might think comes from a dozo!! Although Ive been in the music business for quite sometime Ive been lazy taking in information which is much worse now at 79..You are a great presenter who doesnt rush things and its great for an oldie like me..I did manage to work out Relay and want to know do you do a tutorial for ACE Studio Thanks Frankie
I do. Here's one: ruclips.net/video/7oY8pFhPoK4/видео.html
@@BobDoyleMedia excellent thanks
please more free alternatife for Applio and Kits
I‘m german. The Applio Version was better. The word Ort sounds like american. thank you for your great videos!
Thanks for taking the time to leave your feedback!
Dear Bob. when ever I let Applio speak german it has a very american accent. But your test sound very german. After watching your video multiple times I do not find the reason. I make sure that the TTS Voices is a german one, but still it has an accent. Does the Model has to be german, that would make sense, but as you used your own... ...idk. Do you have any idea. Best regards Anton
Ironically, as I remember, I think I actually forgot to change the voice to German for the TTS, didn’t I? So I think it was actually working with the Japanese voice or the English voice. I’d have to go back and look closely.
Sadly the wav2lip looks terrible and it's not usable in any real life production .(just like in the other videos I've search about it). It' s very sad we still don't have a reliable audio 2 lipsync technology available for use today.
This whole video is created using face fusion for dubbing of the song .
ruclips.net/video/8VVmWHvQ_sA/видео.html
Hindi audio can generate Applio..?
"and in seconds youll see the face is replaced". I didnt see any "face swap". It just looked like you with a subtle mustache added.
If you need to see more dramatic examples: ruclips.net/video/PwA14GOX1mI/видео.htmlsi=sCRZ6tyw4i2BGrCg
In german applio sounds better, but still very like the usual TTS voices.
horrible lipsinc, they need better one
French Canadian here ... the French pronounciation on both systems is a bit off. Kits is MARGINALLY better.
Thank you!
The last French example with the lipsync sounds very good. It has a slight American accent, but not as strong as the one on GPT app. Eleven Labs still sounds better, but this is very good.
Maybe the accent is due to its training?
@@ludoviclebleu If you're talking about the French example I ended up using for the last video, I realized I ALSO forgot to change the language to French in Kits...so that could explain the dialect problem. I think it was still set on Japanese or something.
LOL, ok.
That would make sense, but it's still an American accent, not Japanese. Strange.
Do portuguese
Brazilian