It seems everyone is talking about Titans on RUclips right now but this video stands out as one of the few that not only thoroughly explains it's architecture but also provides a deep, insightful dive into its mechanisms and applications. It strikes a perfect balance between tecnhical depth and accessibility, making it an invaluable resource for both newcomers and those looking to deepen their understanding. Excellent work on breaking down such a complex topic into something comprehensible yet intellectually engaging!
Honestly, I'm pretty confused. I watched this video twice, pausing and re-reading the slides as I went along. I was really struggling when trying to ground what was described with actual implementation details in terms of starting with a pre-trained LLM like Llama and trying to figure out where the individual mechanisms described would be introduced in a Titans system. Where and when would the various mechanisms be implemented and executed? I was trying to picture how a typical session with a Titans LLM system would proceed and what the flow of execution might look like. Hopefully DiscoverAI will do a follow-up video at some point and clear up some of the ambiguities with perhaps a concrete example. Great video though.
@@vrc5674 Grokking has put me a little behind so I have some catching up to do. As soon as I wrap up Grokking I will be doing transformers2 and TITANS. At that point I can be of more assistance and I will chime back in to provide you with any assistance that I can. I do apologize for the untimeliness but will try to be expedient.
@@vrc5674 I have decided to apply TITANS against RWKV (one of my favorite SLMs). I will be starting the project soon. I have NO idea if it will work but if it does, oh boy! Ill share the notebook. I am busy grokking RWKV right now. lol
@ I keep flip flopping on it but I am again contemplating experimenting with RWKV to make a 4 million context window with it using TITANS concepts. It is soooo tempting to at least try it.
@@vrc5674 The biggest thing is transformers make LLMS, to put it simply. I wouldn't look at this as working with current LLMs. It's like asking a new factory to work with a different product you bought at the store. Kinda a weird ask, and that may be leading to your issues.
Dude... You are my new favorite Brain Candy. I'm on a quest for knowledge and found your presentation of information amongst the most absorbable and valuable in my obscenely large word cloud of subjects to explore.
Ty for doing this one! It seems to me that MAG (gating) and MAL (layering) are very similar to a multi-agent architecture where the outer agent passes in relevant context to the inner agent. This is similar to how I'm integrating my git-like ibgib protocol, which is a content-addressed graph mechanism. So context is "compressed" to the graph address with some metadata. Then new contexts can be dynamically composed when passing between agents. Thanks again!
What would be extremely useful is a summary video, maybe once a month. This would cover both which findings from which papers work together and which are either superseded by a new technology or would clash when combined. My intuition suggest that the combination of TITANS with the In Context Learning where the threshold of 1000 tokens allows the LLMs to override the learning from it's pre-training with the learning provided in the context and add the use of StableMax would be a great combination. But you have deeper knowledge of all of these technologies so your insights/thoughts on what combinations of the technologies work together rather than clash you cover in your videos would be amazing.
Cool, but many came and gone. (Mamba, Jamba, ...) Let's see if it sticks along. Even so, adaptability is a concern too. Coz you know, the OS is too much integrated with the transformer architecture. It has to do very very well to start getting adapted. Btw, your videos are too good. Your presentation is amazing. Keep up the work. But one thing that is little down with your videos is they are little much longer. It would be great, if you could publish written notes like an article or a blog. That would be very helpful to read, refer and learn.
The Titan architecture’s modular approach to short- and long-term memory differs from Kahneman’s model, where immediate recall (System 1) is fast and intuitive, while higher-level reasoning (System 2) is slower and effortful. A hidden layer in both systems determines which mechanism to use, resembling Weick’s concept of sensemaking through storytelling and surprise. This suggests AI could benefit from similar dynamic prioritization. While Titan shows progress, we are still at the early stages of developing architectures that fully mimic human memory and reasoning.
Every day some new way to train on the data, infer the data, reinterpret the data, compress the data, etc... are you working on the packaging and presentation, or if not in commerce, other use cases for your organic or synthetic data product yet?
The Titan architecture’s modular approach to short- and long-term memory differs from Kahneman’s model, where immediate recall (System 1) is fast and intuitive, while higher-level reasoning (System 2) is slower and effortful. A hidden layer in both systems determines which mechanism to use, resembling Weick’s concept of sensemaking through storytelling and surprise. This suggests AI could benefit from similar dynamic prioritization. While Titan shows progress, we are still at the early stages of developing architectures that fully mimic human memory and reasoning.
Thanks for the video. Don’t listen to the guy asking for shorter videos
Yeah I would ignore 99% of the comments (except this one and the original one this is replying to)
Actually ignore the above comment it's redundant
This too
It seems everyone is talking about Titans on RUclips right now but this video stands out as one of the few that not only thoroughly explains it's architecture but also provides a deep, insightful dive into its mechanisms and applications. It strikes a perfect balance between tecnhical depth and accessibility, making it an invaluable resource for both newcomers and those looking to deepen their understanding. Excellent work on breaking down such a complex topic into something comprehensible yet intellectually engaging!
Honestly, I'm pretty confused. I watched this video twice, pausing and re-reading the slides as I went along. I was really struggling when trying to ground what was described with actual implementation details in terms of starting with a pre-trained LLM like Llama and trying to figure out where the individual mechanisms described would be introduced in a Titans system. Where and when would the various mechanisms be implemented and executed? I was trying to picture how a typical session with a Titans LLM system would proceed and what the flow of execution might look like. Hopefully DiscoverAI will do a follow-up video at some point and clear up some of the ambiguities with perhaps a concrete example. Great video though.
@@vrc5674 Grokking has put me a little behind so I have some catching up to do. As soon as I wrap up Grokking I will be doing transformers2 and TITANS. At that point I can be of more assistance and I will chime back in to provide you with any assistance that I can. I do apologize for the untimeliness but will try to be expedient.
@@vrc5674 I have decided to apply TITANS against RWKV (one of my favorite SLMs). I will be starting the project soon. I have NO idea if it will work but if it does, oh boy! Ill share the notebook. I am busy grokking RWKV right now. lol
@ I keep flip flopping on it but I am again contemplating experimenting with RWKV to make a 4 million context window with it using TITANS concepts. It is soooo tempting to at least try it.
@@vrc5674 The biggest thing is transformers make LLMS, to put it simply.
I wouldn't look at this as working with current LLMs. It's like asking a new factory to work with a different product you bought at the store. Kinda a weird ask, and that may be leading to your issues.
Sir you are doing great. And do not shorten length of videos. Your videos are outstanding. Love it and alot of appreciation.
Dude... You are my new favorite Brain Candy. I'm on a quest for knowledge and found your presentation of information amongst the most absorbable and valuable in my obscenely large word cloud of subjects to explore.
I always knew RNNs would make a comeback. The human brain itself is a RNN, not a convolutional NN or a transformer.
Ty for doing this one!
It seems to me that MAG (gating) and MAL (layering) are very similar to a multi-agent architecture where the outer agent passes in relevant context to the inner agent. This is similar to how I'm integrating my git-like ibgib protocol, which is a content-addressed graph mechanism. So context is "compressed" to the graph address with some metadata. Then new contexts can be dynamically composed when passing between agents.
Thanks again!
Am I crazy or this is huge
What would be extremely useful is a summary video, maybe once a month. This would cover both which findings from which papers work together and which are either superseded by a new technology or would clash when combined. My intuition suggest that the combination of TITANS with the In Context Learning where the threshold of 1000 tokens allows the LLMs to override the learning from it's pre-training with the learning provided in the context and add the use of StableMax would be a great combination. But you have deeper knowledge of all of these technologies so your insights/thoughts on what combinations of the technologies work together rather than clash you cover in your videos would be amazing.
Looking forward to the publication of the code! Meanwhile: awesome content, thanks!
12:10 Aahah; thank you ! I was about to ask that question 😂
A step closer to self improvementing models and shortly after that, ASI!
The field seems fast if you hopped on at GPT3 release date
Yes. Got my heart set on GPT-J back then. Was gonna use it anyways because how small it is, but smaller versions of DeepSeek r1 might change my mind.
@rmt3589 try LM studio and choose Deepseek according to your hardware
I think this is it. This will lead to the right direction.
Cool, but many came and gone. (Mamba, Jamba, ...) Let's see if it sticks along. Even so, adaptability is a concern too. Coz you know, the OS is too much integrated with the transformer architecture. It has to do very very well to start getting adapted.
Btw, your videos are too good. Your presentation is amazing. Keep up the work. But one thing that is little down with your videos is they are little much longer. It would be great, if you could publish written notes like an article or a blog. That would be very helpful to read, refer and learn.
Great insight into TITANS! Could this mean a shift in handling model evaluation? Any thoughts on the impact on existing AI systems?
i think we can edit the memory weight ourselves
The Titan architecture’s modular approach to short- and long-term memory differs from Kahneman’s model, where immediate recall (System 1) is fast and intuitive, while higher-level reasoning (System 2) is slower and effortful. A hidden layer in both systems determines which mechanism to use, resembling Weick’s concept of sensemaking through storytelling and surprise. This suggests AI could benefit from similar dynamic prioritization. While Titan shows progress, we are still at the early stages of developing architectures that fully mimic human memory and reasoning.
Every day some new way to train on the data, infer the data, reinterpret the data, compress the data, etc... are you working on the packaging and presentation, or if not in commerce, other use cases for your organic or synthetic data product yet?
The Titan architecture’s modular approach to short- and long-term memory differs from Kahneman’s model, where immediate recall (System 1) is fast and intuitive, while higher-level reasoning (System 2) is slower and effortful. A hidden layer in both systems determines which mechanism to use, resembling Weick’s concept of sensemaking through storytelling and surprise. This suggests AI could benefit from similar dynamic prioritization. While Titan shows progress, we are still at the early stages of developing architectures that fully mimic human memory and reasoning.