Is the new Image-to-Music algorithm Really That Good?

Поделиться
HTML-код
  • Опубликовано: 29 окт 2022
  • A new algorithm has just surfaced which converts images into short songs. In this video we use this algorithm to make some songs, and then we explore how it works under the hood. TL;DR, it's not very good.
    Discord: / discord
    ------- Links -------
    Image to music generator: huggingface.co/spaces/fffilon...
    Img2Music Tweet: / 1585698118137483276
    CLIP Interrogator huggingface : huggingface.co/spaces/pharma/...
    CLIP Interrogator github: github.com/pharmapsychotic/cl...
    t=zJm2wThqCMKmDRFsMuIcYg&s=19
    Attempt at reverse-engineering Mubert: medium.com/@alexbainter/how-m....
    Mubert Website: mubert.com/
    Mubert Github: github.com/MubertAI/Mubert-Te...
    Explanation of SBERT: towardsdatascience.com/an-int....
    BLIP paper: arxiv.org/abs/2201.12086
    Great video on BLIP (Yannic you beautiful, beautiful Swiss): • BLIP: Bootstrapping La...
    ------- Music -------
    Music from freetousemusic.com
    ‘Daily’ by ‘LuKremBo’: • (no copyright music) c...
    ‘Marshmallow’ by ‘LuKremBo’: • lukrembo - marshmallow...
    ‘Onion’ by LuKremBo: • (no copyright music) l...
    ‘Rose’ by ‘LuKremBo’: • lukrembo - rose (royal...
    ‘Snow’ by LuKremBo: • lukrembo - snow (royal...
    ‘Sunset’ by ‘LuKremBo’: • (no copyright music) j...
    ‘Branch’ by ‘LuKremBo’: • (no copyright music) c...
    Many thanks to LuKremBo
    #aiart #mubert #aimusic #technology #art
  • НаукаНаука

Комментарии • 51

  • @Shadow_Shinigami
    @Shadow_Shinigami Год назад +7

    Where is the blood curdling scream for intro though?

    • @lewingtonn
      @lewingtonn  Год назад +1

      I'm hoping the numbers tip toward something less off-putting before the poll closes in 24 hours

    • @lewingtonn
      @lewingtonn  Год назад

      🙏📿

    • @nikki7305
      @nikki7305 Год назад +1

      @@lewingtonnI'm not

  • @ZeroIQ2
    @ZeroIQ2 Год назад +15

    When I first heard about image to music, I assumed that the process would be, first convert the image into text (using something like CLIP Interrogator), then (and this is where I went totally wrong) use that text to put different weights on instruments and/or genre and maybe shape the theme of the music based on those weights.
    Like if you had a picture of a vampire, then the style weight from the text would make spooky music. If the picture had drums, then the music would use more drums etc...
    Instead as your deeper dive shows, the text from the image is just a seed for whatever randomizer they are using. Which means you are just going to get a random piece of music from exactly the same image, which is a real shame.
    The more I think about it the more it annoys me, because the tools are already there to make something much better and this is just a really lazy attempt at trying to cash in on all the current AI stuff.

    • @lewingtonn
      @lewingtonn  Год назад +3

      yeah, I was kind of annoyed too, though to be fair the Image to music pipeline is just an open source project done by a private citizen, at least it's not a big company trying to trick people (yet)

    • @icinemagr4621
      @icinemagr4621 3 месяца назад

      in 1986 when i started using computers and starting programmin i realize that the diference between a wav and a bmp is the headers 400 bytes on top of the file.
      so i change the headers from a bitmap and i listen a NOISE. then i did that with avi so i hade many noises together.
      The different of Music and noice is the repeat patterns. so i gave up because any image i took i hear NOISE.
      in 1986 all those.

  • @fjpaz
    @fjpaz Год назад +1

    Excellent breakdown of each of the components of this project, much appreciated

  • @cyremur
    @cyremur Год назад

    Good takes on information density. Great excalidraw diagrams. :)

  • @paulaewington3138
    @paulaewington3138 Год назад +1

    I really like this- thanks- your explanations are great!

  • @drekenproductions
    @drekenproductions Год назад +2

    thanks i will be using this. i like the google colab where you can set it to be a full length song and have more control over the interrogate feature having some things go wrong. like a spooky demon and interrogator calls it a nice goat

    • @lewingtonn
      @lewingtonn  Год назад

      ... did you watch the video???? don't use it lol it's nonsense

    • @drekenproductions
      @drekenproductions Год назад +1

      @@lewingtonn mubert works well but the image to music is dumb. mubert also sucks though i wish i could just pick genres manually. im still having to find words that produce styles i want. words have no meaning to the ai it just randomly outputs 3 genres based on what you say can be one word or a whole book the ai will just pick random genres. it will always be the same genres but its pretty random.

  • @akumaking1
    @akumaking1 Год назад

    I keep trying the Huggingface program but I can never get the end result

  • @AB-wf8ek
    @AB-wf8ek Год назад

    I came to this conclusion by just using the plain mubert text-to-music notebook. It soon became apparent after trying a few prompts that it wasn't much more than a random music generator

  • @lightbulb9046
    @lightbulb9046 Год назад

    I really appreciate the effort you put into making your videos informative and engaging - I learned a lot from it! I was wondering if you could do me a favor and share your excalidraw?
    Thank you so much for sharing your knowledge with us and keep up the great work!

  • @wywywywywywywy
    @wywywywywywywy Год назад +1

    I hope we'll see an open source alternative to MuBERT soon

  • @Beyondarmonia
    @Beyondarmonia Год назад +2

    Wtf? If you already have a list of tags , why wouldn't you just use CLIP to check those directly against the image?

  • @TheCopernicus1
    @TheCopernicus1 Год назад +1

    Struth mate wow!

    • @lewingtonn
      @lewingtonn  Год назад +1

      too right ya bloody drongo!

    • @2PeteShakur
      @2PeteShakur Год назад

      @@lewingtonn cor blimey guvnor, we got some live ones ere! ;)

    • @TheCopernicus1
      @TheCopernicus1 Год назад

      @@lewingtonn lol legend

  • @praxis22
    @praxis22 Год назад +1

    Embedded memes FTW!

  • @IamPotato_007
    @IamPotato_007 Год назад

    I can't believe this is happening. Beyond Musical fountains ⛲️ 🎶

    • @lewingtonn
      @lewingtonn  Год назад

      bro did you watch the video? It's nonsense, it basically just feeds you some random music!

    • @IamPotato_007
      @IamPotato_007 Год назад +1

      @@lewingtonn I didn't see your video cos I know it's not real. Music and fountains are real - just like music coming from bowls of water containing different amount of water.

    • @lewingtonn
      @lewingtonn  Год назад

      @@IamPotato_007 huh...

  • @CynicalWilson
    @CynicalWilson Год назад +2

    Now, IF they really keep it that simple, it'd be in fact "garbage" (*edit: just repeating language used in video, generally no work of dedicated developers should be labeled that way IMO) . I'd have hoped the process would at least include a few additional steps. It probably would make a bit more sense if the training set for music would include a huge sampling of songs from, let's say, Spotify...and it would take all the lyrics and first run those though the one of the GPT (not sure which one would lend itself best) implementations and let it get summarized by GPT. Now you could let that summary be processed by S-BERT, just like the CLIP interrogator output of the image, and perform the comparison between image and song (or at least the summary of the lyrics), and find similarities there. Then you'd probably need to have versions of the original songs, this time not an analysis of lyrics, but of melody, rythm, speed, harmony and so on, and relate those outputs and create or find music that aligns with it.... Sorry, I left it a lot of proper verbiage, but I'd ultimately be a bit like the "suggestions for other songs" that Spotify provides whenever you hear some song, so they already employ a technology that puts every song through an engine that analyzes the song and find similarities...a good portion of that could probably be integrated here. Certainly not saying that this would result in fantastic, actually input image related music, but it would at least not be as irrelevant as simply, and sort of randomly, associate the input with a list of 130 "types" of music....

    • @lewingtonn
      @lewingtonn  Год назад +2

      yeah, I agree, like, the place where it really falls down is the 130 genres, there are a lot of cool ways you could turn a text description into a cool piece of music, or event a recommendation for an existing piece

    • @Dancing_teeth
      @Dancing_teeth Год назад +2

      Open AI Jukebox works similar to your description, and it cost hundreds of millions dollars. Now this neural network generation approach isn’t ready for mass adoption because it’s expensive and slow. Our published text-to-music demo is the first step to develop a new service, and we know how to make it more relevant and effective. Thank you for all your ideas and comments, we appreciate that. We are making Mubert for five years already and researching generative music for a while. All of these features that you mentioned can be real, but give us some time guys 😁🙏🏻

    • @CynicalWilson
      @CynicalWilson Год назад +1

      @@Dancing_teeth I'm sorry, I should have worded my comment differently. Thank you still taking the time to provide feedback :) generally I'm super excited about all these new AI/ML developments that are popping up seemingly all around us, especially this year :) I amended my comment, clarifying my usage of the negative label of "garbage".

    • @Dancing_teeth
      @Dancing_teeth Год назад

      ​@@CynicalWilson No worries, I understand. Thanks for the feedback again 🙌🏻

  • @xmattar
    @xmattar Год назад +1

    i love you

  • @keisaboru1155
    @keisaboru1155 Год назад

    try RIFFUSION ! ;D

  • @generalawareness101
    @generalawareness101 Год назад +3

    No, it is just a gimmick from what I heard.

  • @EliSpizzichino
    @EliSpizzichino Год назад +1

    JUST TRASH... thanks for exposing it

  • @MrSongib
    @MrSongib Год назад

    Man, they could just scrape all the music videos on youtube and the meme video as well and work from there. and no one gonna complain at this point about where they got their data from. except for universal music for now, just watch the news today. xd

  • @SethCohn23
    @SethCohn23 Год назад +1

    Now do HarmonyAI

  • @ISnuckADuck
    @ISnuckADuck Год назад +1

    i can show you how to hack in to muberts private servers and beta music generator which they worked on 3 years ago and used as the basis for their ai.
    in those servers there are a couple of tutorials for the beta testers. videos which are unlisted and unfindable elsewhere.
    in those videos you can somewhat understand how mubert is built.
    let me know if you would be interested in seeing it

  • @sakib.9419
    @sakib.9419 Год назад

    Did you say you used to run an AI company?

    • @lewingtonn
      @lewingtonn  Год назад

      yeeeeeeeeas... whose asking?

  • @NekoDezzu
    @NekoDezzu Год назад

    AI is advancing !!!

    • @lewingtonn
      @lewingtonn  Год назад

      does nobody actually WATCH these videos???

  • @metanulski
    @metanulski Год назад +2

    I don't think you are correct. It did not work because there is no "pancake cat" music. It should work if you use pictures that have a least a little bit connection to music. Try a concert hall, a man with a flute, bar at the beach as input pictures.

    • @lewingtonn
      @lewingtonn  Год назад

      I am 100% certain you are wrong here. Whatever image you give it, it will reduce that image to like 5 genres out of a possible 130, without any weighting to those genres or anything. Even if the genre reduction is perfect, it's still borderline nonsensical to reduce an image to 5 genres. You can then generate whatever music you like from those 5 and they'll be equally unrelated to the original image.

    • @lewingtonn
      @lewingtonn  Год назад +3

      Here, to prove my point I literally googled "bar at the beach" and got this image: hiddencitysecrets.com.au/wp-content/uploads/2017/10/Sandbar-Beach-Bar-Cafe-Best-Top-Melbourne-Port-Melbourne-Restaurant-.jpg, which I then fed to the interface TWICE, first I got something sad and orchestral, second I got something upbeat and industrial. You can try it yourself if you like.

    • @metanulski
      @metanulski Год назад +4

      @@lewingtonn I did. when I upload pictures of an orchestra playing I get orchestra music. If I uplaod pitures of Bob Marley I get reggae. Its very simple but it seems to work.

    • @lewingtonn
      @lewingtonn  Год назад +1

      OK, so that makes sense, if you upload an image that directly corresponds to one of the 130 tags then the track you get should be coherent. But at that point you may as well just ask for the tags directly lol