AudioGen: Textually Guided Audio Generation | Text To Audio | Paper Explained

Поделиться
HTML-код
  • Опубликовано: 5 ноя 2024

Комментарии • 13

  • @TheAIEpiphany
    @TheAIEpiphany  2 года назад +11

    Text To Image, Text To Audio, Text To Video - multimodal research is going strong

    • @calebmcirvin7708
      @calebmcirvin7708 2 года назад

      Multimodal algorithms are fascinating! I've recently started playing around with DALL-E 2, and I'm just having a great time making art I wouldn't be able to create on my own and seeing the ideas others in the community have come up with. Textually guided audio generation definitely sounds like it has useful applications as well - can't wait to see the directions additional research takes!

  • @swyxTV
    @swyxTV 2 года назад +4

    love you for doing all these paper walkthrus man, keep it up

  • @jianghong6444
    @jianghong6444 Год назад +2

    I believe audiogen uses encoder and decoder from encodec, a successor of soundstream made by meta ai, but for some reason, by the time AudioGen is published, encodec is not published yet, and encodec is not referred in the paper. If you look at encodec's paper you find that the LSTM at the 2nd to last layer is their invention

  • @momirmilutinovic2161
    @momirmilutinovic2161 2 года назад +1

    I think I understood the explanation of complex-valued STFTs, so I'll try to back it up with an example of my own.
    The problem with using amplitude as a representation of complex numbers is that is purely based on the distance of a given point from the origin. What this means is that 1, -1, i, -i, and the rest of the points on the unit circle (circle with radius 1 and center at origin) are treated as the same number i.e. 1, when they aren't the same. When we use both the real and imaginary parts to represent these complex numbers, they'll become different values. Otherwise, we end up squashing a 2D plane into a 1D line, so some information must be lost.

  • @fahnub
    @fahnub 8 месяцев назад

    Thank you for doing this

  • @convolutionalnn2582
    @convolutionalnn2582 2 года назад +1

    I wanna do research in RL...I have learn most Supervised and Unsupervised Algorithms and is able to implement,know some of the maths and use it.....I am now learning maths for ml book...
    1) Should i do Deep Learning course or wait until I could derive the whole maths behind Supervised and Unsupervised Algorithms ?
    2) Is your blog How to start RL can be follow by me whose aim is research in RL?

  • @Neptutron
    @Neptutron 2 года назад +2

    Whoah, thank you for bringing this to my attention, I've been waiting for something like this! How did you first hear of this paper? (I'm looking for ways to keep up to date on things lol)

  • @mehular0ra
    @mehular0ra Год назад

    Amazing video! Pls do the code walkthrough too

  • @johnpope1473
    @johnpope1473 2 года назад

    I wonder if augmenting whisper text encoder is going to yield a breakthrough in this space…

  • @jmoneydroid
    @jmoneydroid 2 года назад

    Any idea if work is being done on Human Behavior modality? Imagine this could be hugely powerful for both past triggers, as well as forward predictive analysis

  • @frankhovis
    @frankhovis 6 месяцев назад

    Simple eh?