Google's NEW TITANS: Transformer w/ RNN Memory

Поделиться
HTML-код
  • Опубликовано: 30 янв 2025

Комментарии • 29

  • @user-pt1kj5uw3b
    @user-pt1kj5uw3b 12 дней назад +31

    Thanks for the video. Don’t listen to the guy asking for shorter videos

    • @patruff
      @patruff 12 дней назад +1

      Yeah I would ignore 99% of the comments (except this one and the original one this is replying to)

    • @patruff
      @patruff 12 дней назад

      Actually ignore the above comment it's redundant

    • @patruff
      @patruff 12 дней назад

      This too

  • @irbsurfer1585
    @irbsurfer1585 12 дней назад +15

    It seems everyone is talking about Titans on RUclips right now but this video stands out as one of the few that not only thoroughly explains it's architecture but also provides a deep, insightful dive into its mechanisms and applications. It strikes a perfect balance between tecnhical depth and accessibility, making it an invaluable resource for both newcomers and those looking to deepen their understanding. Excellent work on breaking down such a complex topic into something comprehensible yet intellectually engaging!

    • @vrc5674
      @vrc5674 11 дней назад

      Honestly, I'm pretty confused. I watched this video twice, pausing and re-reading the slides as I went along. I was really struggling when trying to ground what was described with actual implementation details in terms of starting with a pre-trained LLM like Llama and trying to figure out where the individual mechanisms described would be introduced in a Titans system. Where and when would the various mechanisms be implemented and executed? I was trying to picture how a typical session with a Titans LLM system would proceed and what the flow of execution might look like. Hopefully DiscoverAI will do a follow-up video at some point and clear up some of the ambiguities with perhaps a concrete example. Great video though.

    • @irbsurfer1585
      @irbsurfer1585 11 дней назад

      @@vrc5674 Grokking has put me a little behind so I have some catching up to do. As soon as I wrap up Grokking I will be doing transformers2 and TITANS. At that point I can be of more assistance and I will chime back in to provide you with any assistance that I can. I do apologize for the untimeliness but will try to be expedient.

    • @irbsurfer1585
      @irbsurfer1585 10 дней назад

      @@vrc5674 I have decided to apply TITANS against RWKV (one of my favorite SLMs). I will be starting the project soon. I have NO idea if it will work but if it does, oh boy! Ill share the notebook. I am busy grokking RWKV right now. lol

    • @irbsurfer1585
      @irbsurfer1585 8 дней назад

      @ I keep flip flopping on it but I am again contemplating experimenting with RWKV to make a 4 million context window with it using TITANS concepts. It is soooo tempting to at least try it.

    • @rmt3589
      @rmt3589 5 дней назад

      ​@@vrc5674 The biggest thing is transformers make LLMS, to put it simply.
      I wouldn't look at this as working with current LLMs. It's like asking a new factory to work with a different product you bought at the store. Kinda a weird ask, and that may be leading to your issues.

  • @truliapro7112
    @truliapro7112 12 дней назад +2

    Sir you are doing great. And do not shorten length of videos. Your videos are outstanding. Love it and alot of appreciation.

  • @ardentiousX
    @ardentiousX 11 дней назад

    Dude... You are my new favorite Brain Candy. I'm on a quest for knowledge and found your presentation of information amongst the most absorbable and valuable in my obscenely large word cloud of subjects to explore.

  • @Wobbothe3rd
    @Wobbothe3rd 12 дней назад +2

    I always knew RNNs would make a comeback. The human brain itself is a RNN, not a convolutional NN or a transformer.

  • @ibgib
    @ibgib 9 дней назад

    Ty for doing this one!
    It seems to me that MAG (gating) and MAL (layering) are very similar to a multi-agent architecture where the outer agent passes in relevant context to the inner agent. This is similar to how I'm integrating my git-like ibgib protocol, which is a content-addressed graph mechanism. So context is "compressed" to the graph address with some metadata. Then new contexts can be dynamically composed when passing between agents.
    Thanks again!

  • @Nnm26
    @Nnm26 12 дней назад +5

    Am I crazy or this is huge

  • @davidwynter6856
    @davidwynter6856 11 дней назад

    What would be extremely useful is a summary video, maybe once a month. This would cover both which findings from which papers work together and which are either superseded by a new technology or would clash when combined. My intuition suggest that the combination of TITANS with the In Context Learning where the threshold of 1000 tokens allows the LLMs to override the learning from it's pre-training with the learning provided in the context and add the use of StableMax would be a great combination. But you have deeper knowledge of all of these technologies so your insights/thoughts on what combinations of the technologies work together rather than clash you cover in your videos would be amazing.

  • @dudicrous
    @dudicrous 12 дней назад

    Looking forward to the publication of the code! Meanwhile: awesome content, thanks!

  • @oursbrun4243
    @oursbrun4243 12 дней назад

    12:10 Aahah; thank you ! I was about to ask that question 😂

  • @fdavis1555
    @fdavis1555 12 дней назад +1

    A step closer to self improvementing models and shortly after that, ASI!

  • @En1Gm4A
    @En1Gm4A 12 дней назад +1

    The field seems fast if you hopped on at GPT3 release date

    • @rmt3589
      @rmt3589 5 дней назад

      Yes. Got my heart set on GPT-J back then. Was gonna use it anyways because how small it is, but smaller versions of DeepSeek r1 might change my mind.

    • @En1Gm4A
      @En1Gm4A 5 дней назад

      @rmt3589 try LM studio and choose Deepseek according to your hardware

  • @maertscisum
    @maertscisum 12 дней назад

    I think this is it. This will lead to the right direction.

  • @HeyFaheem
    @HeyFaheem 12 дней назад +4

    Cool, but many came and gone. (Mamba, Jamba, ...) Let's see if it sticks along. Even so, adaptability is a concern too. Coz you know, the OS is too much integrated with the transformer architecture. It has to do very very well to start getting adapted.
    Btw, your videos are too good. Your presentation is amazing. Keep up the work. But one thing that is little down with your videos is they are little much longer. It would be great, if you could publish written notes like an article or a blog. That would be very helpful to read, refer and learn.

  • @CharlotteLopez-n3i
    @CharlotteLopez-n3i 12 дней назад

    Great insight into TITANS! Could this mean a shift in handling model evaluation? Any thoughts on the impact on existing AI systems?

    • @viky2002
      @viky2002 12 дней назад

      i think we can edit the memory weight ourselves

  • @shaneoseasnain9730
    @shaneoseasnain9730 12 дней назад +1

    The Titan architecture’s modular approach to short- and long-term memory differs from Kahneman’s model, where immediate recall (System 1) is fast and intuitive, while higher-level reasoning (System 2) is slower and effortful. A hidden layer in both systems determines which mechanism to use, resembling Weick’s concept of sensemaking through storytelling and surprise. This suggests AI could benefit from similar dynamic prioritization. While Titan shows progress, we are still at the early stages of developing architectures that fully mimic human memory and reasoning.

  • @timothywcrane
    @timothywcrane 12 дней назад

    Every day some new way to train on the data, infer the data, reinterpret the data, compress the data, etc... are you working on the packaging and presentation, or if not in commerce, other use cases for your organic or synthetic data product yet?

  • @shaneoseasnain9730
    @shaneoseasnain9730 12 дней назад +2

    The Titan architecture’s modular approach to short- and long-term memory differs from Kahneman’s model, where immediate recall (System 1) is fast and intuitive, while higher-level reasoning (System 2) is slower and effortful. A hidden layer in both systems determines which mechanism to use, resembling Weick’s concept of sensemaking through storytelling and surprise. This suggests AI could benefit from similar dynamic prioritization. While Titan shows progress, we are still at the early stages of developing architectures that fully mimic human memory and reasoning.