Applio vs. Kits: Multilingual TTS (and lip sync Face Swap!)

Поделиться
HTML-код
  • Опубликовано: 5 ноя 2024

Комментарии • 45

  • @bygimenez
    @bygimenez 5 месяцев назад +15

    Hello, here is an Applio developer. It is a great achievement for us that a programme developed by four young people is compared to a service with millions of investments. In our first year of development, we have reached goals that some companies do not achieve.
    We have many more goals to achieve and many more products to create, always free and open source for the community.
    Thanks for your review Bob! 💚🚀

    • @BobDoyleMedia
      @BobDoyleMedia  5 месяцев назад +4

      It's a super useful toolset!

    • @smite4318
      @smite4318 2 месяца назад

      @bygimenez Hi, I really like Applio. Imo its the best tts with rvc out there, because its so simple to use while it delivers good results. But I cant find any tutorials how to train a model properly. Bob already helped me alot with his video, but I need more. Have you any link where I can find more about Applio?

    • @BobDoyleMedia
      @BobDoyleMedia  2 месяца назад +1

      @@smite4318 Can you tell me where you're getting slowed down in the process? I feel like I go through the process in this video, so I'm curious what's missing for you that I can clarify.

    • @smite4318
      @smite4318 2 месяца назад

      @@BobDoyleMedia Thx for the fast reply :). I watched your other video about Applio but I questioned myself about following settings (training): How does the custom pretrained work? Does batchsize: 1 delivers better results then 4? (I mean 4 is better then 8? but slower) Does the amount of epoch matter, when overtraining ends before: for example "Luke_136_xxx.pth" is the latest file and i selected 500 epoch before but with overtraining detector, it doesnt matter right? Also I dont know if you did a video about the audio: ofc it has to be clean, but does it matter when I choose 25min audio instead of 50 short audio files with the same time? and is it better when the speaker speaks constantly in the same tone or should it be different?

  • @NeedaNewAlias
    @NeedaNewAlias 6 месяцев назад +3

    For German, the Applio voice sounded more synthetic and the Kitt voice had a bit of an English accent. But not as synthetic. Thx for your great work!

    • @BobDoyleMedia
      @BobDoyleMedia  6 месяцев назад +2

      I feel like that was where I forgot to change the language to German, so it would have been holding onto whatever the setting was before. That way my bad...

    • @ChrissyAiven
      @ChrissyAiven 8 дней назад

      @@BobDoyleMedia Yes was about to say the same, but the accent was american, the language japanese, so I don't know if it was because of that. Unfortunately I cannot find Kits on pinokio so I go for Applio, thank you!

  • @RockyBMusic
    @RockyBMusic 6 месяцев назад +3

    In my opinion, the German translated version of Kits sounded more natural
    The Applio version was not bad and understandable, but sounded like the artificial overdubs you find on many youtube videos.
    Thanks for the comparison.
    Greetings
    Martin

  • @Mimic_217
    @Mimic_217 6 месяцев назад +2

    Thanks again for all these videos, they've been great for my channel.

  • @Edbrad
    @Edbrad 6 месяцев назад +1

    Text to speech in Udio is literally the best text to speech. It’s the most expressive. You should focus on testing Udio PURELY on spoken word. What’s great is it can go from spoken word to musical, it’s extremely expressive. After playing with this for a while I’m convinced it has some understanding of the meaning of what text is there. Which is incredible. Like it can’t just be random.
    It may however be less consistent if you need that and you may have to work with it to get the exact thing you want. But I wouldn’t ever use a different one now.
    Who’d have thought a music Ai can do spoken word, standup, arguments, lectures etc
    There so much creative detailed stuff you can do here. We’ve not even touched the surface here!

    • @BobDoyleMedia
      @BobDoyleMedia  6 месяцев назад +1

      My hesitation with getting too invested in creating content I'd use commercially with Udio, is the legal aspect of ownership and copyright. I"ve been watching a fair amount of videos about the legal aspects, and the fine print in the Udio user agreement.
      But I agree that Udio creates wonderful TTS results. Comedy routines, musicals with different singers and speech - you know all this. It's amazing.

    • @hikmetmertdincer6816
      @hikmetmertdincer6816 5 месяцев назад

      Udio is not for music?

    • @BobDoyleMedia
      @BobDoyleMedia  5 месяцев назад

      @@hikmetmertdincer6816 yes, it’s for music. It just happens to also do this

    • @NeedaNewAlias
      @NeedaNewAlias 5 месяцев назад

      @@BobDoyleMedia Try to compare this to 11eleven T2S

  • @PeterStrmberg007
    @PeterStrmberg007 2 месяца назад

    The Kits French sounds more accurate/arrogant :D
    Seems there's an opportunity their for someone to make an interface where you just choose the 5 languages you want and press "Generate". It would only be a matter of linking these through an API.
    Btw, perplexity seems to do better job than Google translate, if the Danish translations are anything to go on. I guess in AI the context is more understood? Or maybe it's just Gemini :)

  • @saas_money
    @saas_money 5 месяцев назад +2

    Thanks so much for your work bob❤😊

  • @JamieJames-m5r
    @JamieJames-m5r 2 месяца назад

    Can the face fusion be still used in webcam and if yes I couldn’t find it after the installation. Please why’s that or am I doing something wrong

  • @ludoviclebleu
    @ludoviclebleu 6 месяцев назад

    This is a great job, very useful. Thank you!
    Have you tried other techs and how they compare and/or integrate with these?
    Tortoise, OpenVoice, Piper, RVC... and ultimately compare the result of an open source workflow with the exact texts and voice on Eleven Labs.

    • @BobDoyleMedia
      @BobDoyleMedia  6 месяцев назад

      Applio is basically RVC. It just puts everything together in a nice free package. I've played with Tortoise a lot, but it's been a while. Still not happy with the quality of the clones. Don't know about Piper - will check that out, and haven't looked at OpenVoice either, at least not that I remember.
      And I've done a LOT with ElevenLabs. My video on that actually jumpstarted this channel a good bit. I'm just always on he lookout for open source alternatives.
      I would be worth doing a side by side with 11 labs...and there is also Play.ht (I think they just modified their name a bit) which I've also done a video on.

  • @frankieAllan-vr6qn
    @frankieAllan-vr6qn 5 месяцев назад

    Bob Im 79 got a 24 track home studio and had a few covers with a London publisher Rondor early 80s also recorded an Elvis tribute song at Rockfield studios in 1977..Why Im telling you all this is the questions I sometimes ask you might think comes from a dozo!! Although Ive been in the music business for quite sometime Ive been lazy taking in information which is much worse now at 79..You are a great presenter who doesnt rush things and its great for an oldie like me..I did manage to work out Relay and want to know do you do a tutorial for ACE Studio Thanks Frankie

    • @BobDoyleMedia
      @BobDoyleMedia  5 месяцев назад

      I do. Here's one: ruclips.net/video/7oY8pFhPoK4/видео.html

    • @frankieAllan-vr6qn
      @frankieAllan-vr6qn 5 месяцев назад

      @@BobDoyleMedia excellent thanks

  • @SyamsQbattar
    @SyamsQbattar 2 месяца назад

    please more free alternatife for Applio and Kits

  • @alexscooterfan
    @alexscooterfan 6 месяцев назад +4

    I‘m german. The Applio Version was better. The word Ort sounds like american. thank you for your great videos!

    • @BobDoyleMedia
      @BobDoyleMedia  6 месяцев назад

      Thanks for taking the time to leave your feedback!

  • @idontknowmorenames
    @idontknowmorenames 5 месяцев назад

    Dear Bob. when ever I let Applio speak german it has a very american accent. But your test sound very german. After watching your video multiple times I do not find the reason. I make sure that the TTS Voices is a german one, but still it has an accent. Does the Model has to be german, that would make sense, but as you used your own... ...idk. Do you have any idea. Best regards Anton

    • @BobDoyleMedia
      @BobDoyleMedia  5 месяцев назад

      Ironically, as I remember, I think I actually forgot to change the voice to German for the TTS, didn’t I? So I think it was actually working with the Japanese voice or the English voice. I’d have to go back and look closely.

  • @StringerBell
    @StringerBell 6 месяцев назад

    Sadly the wav2lip looks terrible and it's not usable in any real life production .(just like in the other videos I've search about it). It' s very sad we still don't have a reliable audio 2 lipsync technology available for use today.

  • @FSK2
    @FSK2 6 месяцев назад +1

    This whole video is created using face fusion for dubbing of the song .
    ruclips.net/video/8VVmWHvQ_sA/видео.html

  • @khajask8113
    @khajask8113 5 месяцев назад

    Hindi audio can generate Applio..?

  • @Michael_Lak
    @Michael_Lak 6 месяцев назад

    "and in seconds youll see the face is replaced". I didnt see any "face swap". It just looked like you with a subtle mustache added.

    • @BobDoyleMedia
      @BobDoyleMedia  6 месяцев назад

      If you need to see more dramatic examples: ruclips.net/video/PwA14GOX1mI/видео.htmlsi=sCRZ6tyw4i2BGrCg

  • @BStudioT
    @BStudioT 6 месяцев назад

    In german applio sounds better, but still very like the usual TTS voices.

  • @JessicaSinclairDoomsday2024
    @JessicaSinclairDoomsday2024 Месяц назад

    horrible lipsinc, they need better one

  • @EricLefebvrePhotography
    @EricLefebvrePhotography 6 месяцев назад

    French Canadian here ... the French pronounciation on both systems is a bit off. Kits is MARGINALLY better.

    • @BobDoyleMedia
      @BobDoyleMedia  6 месяцев назад

      Thank you!

    • @ludoviclebleu
      @ludoviclebleu 6 месяцев назад +1

      The last French example with the lipsync sounds very good. It has a slight American accent, but not as strong as the one on GPT app. Eleven Labs still sounds better, but this is very good.
      Maybe the accent is due to its training?

    • @BobDoyleMedia
      @BobDoyleMedia  6 месяцев назад

      @@ludoviclebleu If you're talking about the French example I ended up using for the last video, I realized I ALSO forgot to change the language to French in Kits...so that could explain the dialect problem. I think it was still set on Japanese or something.

    • @ludoviclebleu
      @ludoviclebleu 6 месяцев назад

      LOL, ok.
      That would make sense, but it's still an American accent, not Japanese. Strange.

  • @ElaraArale
    @ElaraArale 6 месяцев назад

    Do portuguese