Experimenting With LCM Models (Meta's Alternative To LLM Models)

Поделиться
HTML-код
  • Опубликовано: 8 фев 2025
  • Link to Colab Notebook: colab.research...
    In this video, Richard Aragon explores Large Concept Models (LCMs), a new architecture developed by Meta. He explains that LCMs are not Transformer models, but rather utilize a small portion of Transformers. Aragon discusses the architecture of LCMs, highlighting their unique approach to combining linear and probabilistic processes. He then shares his experiments with modifying the architecture, including adding curvature to the space and introducing a fractal hidden space. Aragon demonstrates the impressive results of his experiments, showcasing the potential of LCMs to outperform traditional Transformer models.
    This video is a must-watch for anyone interested in the latest advancements in natural language processing.
    Keywords: LCM models, Meta, LLM models, Transformer models, natural language processing, deep learning, artificial intelligence

Комментарии • 43

  • @warpdrive9229
    @warpdrive9229 Месяц назад +18

    It's rather fitting to say LCMs are an upgrade to LLMs. Meta has been killing it lately with its Byte Latent Transformer and Continuous Chain of Thought paper. Btw, great explanation. Learning a lot from you! Much love from India :)

  • @lordshifu
    @lordshifu 26 дней назад +1

    you are a mad scientist for sure!! and I stand in support of your madness, please keep travelling in the ML wilderness and inventing exotic architectures

  • @ethanw8130
    @ethanw8130 24 дня назад

    Absolutely fascinating! Thank you for providing the code as well. So much to unpack.

  • @tekimax697
    @tekimax697 Месяц назад +3

    Thank you. This is the first time I've seen this channel, so I just subscribed. Thanks for the detailed explanation!

  • @CharleyTurner
    @CharleyTurner Месяц назад +2

    Subscribed. Exactly the type of material I love.

  • @shrirangmahajan9502
    @shrirangmahajan9502 Месяц назад +3

    Isn't this similar to Byte Latent Transformer?
    Correct me if I am wrong. But, I found that the architectures are very similar.
    1. There are central big models - Latent Transformer in BLT and LCM.
    2. There are secondary small transformers models - Local encoder, decoder in BLT and Concept Encoder, Decoder.
    3. Local Encoder / Decoder deal with patches (a bigger unit that singular token.) and Concept Encoder, Decoder deal with concepts (a sentences embedding).
    4. Both papers propose more abstract techniques to deal with tokens.
    I hope somebody is willing to have a discussion regarding this.

  • @aspboss1973
    @aspboss1973 10 дней назад +1

    Where can i find pre trained LCM models ?

  • @DhaneshKasinathanlove
    @DhaneshKasinathanlove Месяц назад +2

    Waiting for it. Thanks

  • @redroom07
    @redroom07 Месяц назад +1

    Great session, keep up the good work ❤

  • @DerDieDasBoB
    @DerDieDasBoB Месяц назад +1

    Great Video, really one point. Thanks for sharing your code, very interesting

  • @Lorentz_Factor
    @Lorentz_Factor Месяц назад +2

    I wouldn't per se it is quantum probabilities. There is some wild stuff, going on. But let's not confuse people. Good video otherwise, though. the concept space utilizing sentence atoms , is a awesome. Note, they may not be full sentences think more of an atomized sentence, without the further reduction into tokenezation. this is really cook actually, and surprised it took so long to even conisder it. the diffuser usage is kind of interesting also. Currently I'm working on a cross attention implementation to map visual atoms, (using florence2 segmentaiton) of conceptual vision. Now what's cool is the diffuser usage allows for some hackery that can be forward or reverse through diffusion for both generative and visual encoding. I've had some success with a few attempts, but it's kind of beefy...too beefy, so I am going to have to shave it off a bit and try to get it to use a lot less vram.

  • @varietygifts
    @varietygifts Месяц назад +1

    daddy's about to come in from creative mode to get in on this breakthrough 🤩

  • @Kevencebazile
    @Kevencebazile Месяц назад

    Thank you i have reviewed and subbed great content !!!!

  • @andrewlewin6525
    @andrewlewin6525 Месяц назад

    I really enjoyed this video and your explanations within. Subscribed and will be checking out your other content
    Keep it up !!

  • @RickySupriyadi
    @RickySupriyadi Месяц назад +6

    dude saying it passionately but i failed to understand... i wish i have bigger brain

    • @HologramSashi
      @HologramSashi Месяц назад

      Apparently, you don't need a bigger brain. You just need the brain you have to curve itself

  • @alhajee
    @alhajee Месяц назад

    This is beautiful 👏
    Your understanding of how the model works and being able to improve it is impressive.
    Are you considering writing a paper on this?

  • @hasanaqeelabd-alabbas3180
    @hasanaqeelabd-alabbas3180 Месяц назад

    Keep on the good work

  • @honkytonk4465
    @honkytonk4465 Месяц назад +2

    What,schroedinger's box, curviture...???

    • @ArianeQube
      @ArianeQube 23 дня назад

      I understood nothing from his explanation... and LCMs have nothing to do with quantum mechanics

  • @codedp
    @codedp Месяц назад +2

    To be honest this sounds like what I was trying to approach in terms of encrypting interactions with LLM's like ChatGPT-- I was trying to figure out how to steal the processing of their "logic clouds" without collapsing it into definite outputs--
    Basically means I turn everything I am talking about into a shape, I tell chatgpt what to do with that shape, it processes the change in the shape, and that shape is returned, and then the key points of the shape is turned into symbols which then collapses into logic and reason at the level of words--
    The whole idea here was to keep the central processing completely unaware of what I was doing, while giving it the fundamentals to process--
    However, I was thinking about how to do this purely from a client side; where as this is doing the same thing from an architectural side--
    Do I sound entirely off base?

    • @richardaragon8471
      @richardaragon8471  Месяц назад +1

      You sound entirely on base. I can put you in touch with a team of people literally building exactly this. That's incredible lol.

  • @VictorGallagherCarvings
    @VictorGallagherCarvings Месяц назад +5

    I just finished having a discussion with Grok about the differences between Latent Consistency Models (LCMs) and Large Concept Models (LCMs) as well as what would be involved in combining the two architectures.

    • @px1690
      @px1690 18 дней назад +1

      the fun and games is that with language a sentence stands in context and that combination defines the conceptual definition and scope.
      before Ai can grasp that I think we need something a bit more combinational I'm completely new to this matter and i'm fascinated by the fact that these models work as well as they do despite the lack of true understanding if the fundamentals... it's like training a parrot to do circus tricks they can do the trick but are not actually "aware" of what they are dealing with.

  • @hamnamalik5998
    @hamnamalik5998 Месяц назад

    This video made my day. Or may be if things go right then its a life saving video for me. Allah bless you.

  • @augmentos
    @augmentos Месяц назад

    Love it thanks

  • @kellymoses8566
    @kellymoses8566 Месяц назад

    Can LCM be combined with the Byte Latent Transformer and the Continuous Train of Thought?

  • @skeletonmasterkiller
    @skeletonmasterkiller Месяц назад

    Thanks for this we should really try building a concept embedding using high dimensional vectors

  • @AMR2442
    @AMR2442 Месяц назад +1

    I don’t know what you are talking about but I am going to subscribe anyway.

  • @jawadmansoor6064
    @jawadmansoor6064 Месяц назад

    What would be the accuracy with best performing/architecture (Falcon3 or Lllama3)?

  • @iliasaarab7922
    @iliasaarab7922 Месяц назад

    What’s the accuracy metric measuring in this case? Whether the model can successfully predict the next (target) embedding (implying the output of interest is an entire embedding vector)? And are these numbers with respect to the training or test set?

    • @richardaragon8471
      @richardaragon8471  Месяц назад

      All of these wonderful questions are exactly why I provide you the code! Do you have any questions about the code?

  • @VURITISAIPRANAYCSE2021VelTechC
    @VURITISAIPRANAYCSE2021VelTechC Месяц назад +2

    you really have to explain things in detail. i didn't understand most of what you said.

    • @pieterhaegeman3538
      @pieterhaegeman3538 Месяц назад +2

      You're expecting one source to bring everyone that visits it to the same level of understanding? Feels like a very passive way to approach learning imo.

  • @PtYt24
    @PtYt24 Месяц назад

    That light mode made me blind at middle of the night : (

  • @naturefriendly887
    @naturefriendly887 Месяц назад

    Someone actually explaining the research paper code ❤

  • @ayrengreber5738
    @ayrengreber5738 Месяц назад

    Would this architecture still support GPU splitting? In other words would a 70b model fit onto two cards with 24 gb each
    If total size of model was 30-40 gb after quantitization … and could you even quantize it?

    • @richardaragon8471
      @richardaragon8471  Месяц назад

      I added one more cell to the code for you. Yes, you could do this in super cool ways actually. You could put the hidden dimensions on a separate GPU than the rest of the layers for example. It is all reliant on PyTorch. Thanks for inspiring me to think about this! It hadn't crossed my mind.

    • @ayrengreber5738
      @ayrengreber5738 Месяц назад

      @@richardaragon8471 This is a huge weakness for image generation tools. They typically don't support multiple cards, but most current LLMs do support splitting; so I was worried since this sounds like it has a similar design to image generation. Glad you found a way to do it! You're awesome. Can't wait to see this model grow in popularity.

  • @syntaxstreets
    @syntaxstreets Месяц назад

    Wow

  • @phmfthacim
    @phmfthacim Месяц назад

    Wait what