End-to-End Adversarial Text-to-Speech (Paper Explained)

Поделиться
HTML-код
  • Опубликовано: 20 сен 2024

Комментарии • 39

  • @MrAmirhossein1
    @MrAmirhossein1 4 года назад +6

    Hey Yannic!
    just wanted to thank you for the excellent content that you provide.
    Keep it up man :)

  • @HarisGulzar-d9c
    @HarisGulzar-d9c 6 месяцев назад

    Never enjoyed paper explanations this much.
    Thanks, Yannic!

  • @rishabhkumar722
    @rishabhkumar722 7 месяцев назад +1

    Wow... Why not more TTS papers explanation

  • @zhivebelarus560
    @zhivebelarus560 Месяц назад

    Yannic, thanks for doing this! Quick question: why instead of fiddling with the aligner they did not start training from smaller samples like one phoneme long and then as loss drops gradually increase the sample length to 2, 3, etc? It seems too much black magic going on in training a tts model. Do you have a suggestion for the most clean architecture that works well? Is there a good review of one step tts models? How can a speaker embedding can be integrated for voice cloning into such model? Sorry for too many questions…

  • @rvalusa
    @rvalusa 4 года назад +3

    Awesome. Superb explanation. Love the channel and content 👍👏🙂

  • @kimchi_taco
    @kimchi_taco 4 года назад +1

    Thank you! It includes so many ad-hoc. I wonder why it's better than combination of Tacotron+WaveNet?

    • @motherbear55
      @motherbear55 3 года назад

      Quality wise it’s not better than tacotron (see MOS scores in the paper-tacotron is about 4.5, this approach is about 4.0). But unlike tacotron, it’s not autoregressive, so inference can be much faster.

  • @alaapdhall8541
    @alaapdhall8541 4 года назад +3

    ah always so fast, I heard the google released pre trained weights for big transfer, could you also make a video on BiT?

    • @alaapdhall8541
      @alaapdhall8541 4 года назад

      @Mallow Marsh oh ok, I'll go through his videos then

    • @Haapavuo
      @Haapavuo 2 года назад

      Whose videos? The comment was deleted. Thanks.

  • @revanthadiga329
    @revanthadiga329 2 года назад +1

    anyone knows where to find this code implementation

  • @hannesstark5024
    @hannesstark5024 4 года назад +1

    Visual Transformers tomorrow?

  • @ushasr2821
    @ushasr2821 3 года назад

    Great explaination Thank you so much

  • @avihudekel4709
    @avihudekel4709 3 года назад

    Great work!

  • @henkjekel4081
    @henkjekel4081 6 месяцев назад

    You're the best

  • @myungchulkang5716
    @myungchulkang5716 4 года назад +1

    Nice !

  • @shivamraisharma1474
    @shivamraisharma1474 4 года назад +2

    Amazing! Do we have any GitHub code or pretrained model weights available?

  • @СергейПавлович-г2и
    @СергейПавлович-г2и 4 года назад +1

    Can I try it somewhere?

    • @YannicKilcher
      @YannicKilcher  4 года назад

      Not sure. I've linked their website in the description

  • @ziqiangshi8167
    @ziqiangshi8167 4 года назад

    Awesome.

  • @DinaEl-Kholy--
    @DinaEl-Kholy-- 3 года назад

    Thank you!!

  • @bossgd100
    @bossgd100 4 года назад +1

    Its working in real time ?

    • @herp_derpingson
      @herp_derpingson 4 года назад +5

      Anything can be real time if you have enough compute

    • @YannicKilcher
      @YannicKilcher  4 года назад +2

      I don't think so

    • @bossgd100
      @bossgd100 4 года назад

      @@herp_derpingson the singularity is far 😵

    • @koheimatsuura3610
      @koheimatsuura3610 4 года назад

      @@YannicKilcher Hi :) why do you think so? this seems non-autoregressive model and I think its inferences are so fast...

  • @screenapple1660
    @screenapple1660 4 года назад +1

    people want realistic TTS voice that sounds high-quality humans. not robot voice. Robot Voice is usually free. But it's stupid.
    Most businesses use high-quality human voice synthesis.

  • @snippletrap
    @snippletrap 4 года назад +1

    I think Tacotron sounds better

  • @bossgd100
    @bossgd100 4 года назад +1

    First !

  • @yabdelm
    @yabdelm 4 года назад +5

    I absolutely love the content but I vote for not saying "As always if you like this work subscribe" I believe if people are exploring AI videos, they probably know where the subscribe button is, and if they like the videos, they'll probably subscribe. Plus we've heard it a billion times in every video on RUclips ever made. It just becomes noise at a certain point. At this point I’m thinking of training an AI to skip every time someone says that.
    Nevertheless, they're your videos, and a personal choice, not a democracy. Feel free to disagree. Don't mean to be mean or anything.

    • @lakshay510
      @lakshay510 4 года назад +4

      Hi but I also don't agree with you, When I am doing any kind of research I just open 10s of tab and start exploring it one by one and sometimes if I get the right content I learn the stuff and leave, Also there are analytics that youtube provide which might show that most of his viewers are not his subscribers.

    • @yabdelm
      @yabdelm 4 года назад +1

      Lakshay Chhabra You think the majority of people will subscribe because he reminded them to subscribe? I don’t doubt that that might occur as I really have no way of checking that. I agree that some way of determining that from the analytics would be better.

    • @siyn007
      @siyn007 4 года назад +2

      For me I usually have a few trial videos before I subscribe but I must admit being told to subscribe lets me evaluate if I should subscribe instead of just exiting like what Lakshay suggested. I agree with not telling people where the subscribe button is though.

    • @yabdelm
      @yabdelm 4 года назад +2

      @@siyn007 Oh sorry but I don't think Yannic specified where the subscribe button was. I just meant to point to saying whether or not to subscribe.
      I see. Good to know that there's the opposite take there. It's definitely not the end of the world. :D I still love Yannic and his videos.

    • @YannicKilcher
      @YannicKilcher  4 года назад +4

      This is one of the things that, yes, is slightly annoying, but you'd be surprised how many people who aren't subscribed go "oh yes, I could do that". So I try to give you the high level before I say that so that you can decide to skip the video without having to listen to it :)