Closer and closer to how we work. When we think of an argument, we don't think of 100 different words. We think of one point (concept) and start articulating around that.
That is NOT how my brain works. I literally am always thinking "what comes next" or "what comes prior." Reasoning is ALWAYS post hoc. It seems impossible not to start with some endpoint and then construct a rational framework that leads to that endpoint.
@Steve-xh3by so you think of an argument word by word? You don't think of the point you're trying to bring across? Just the first word and then the second word and so on? Current AI models don't think of an endpoint. They think of the first word and then the next word and the next and so on. When they stop typing that's the endpoint. Metas LCMs think of a concept and then the concept which will be written out. Let's take starvation as an example. Current LLMs likely first token for starvation: "Starvation" Metas LCMs likely first token for starvation: "Starvation is bad, because it kills people" Which one is a more human process of arguing against starvation?
@@BigSources lol exactly what kind of silly question is that... If you think you are the result of chemicals processing information.. you are slower than those reactions.
this is more than meets the eye. Concepts are also the building blocks of beliefs. An LCM with a "reasoning method" can compress concepts into beliefs (which can be re-evaluated when more concepts or better reasoning is available). Which is just 1 step away from awareness. Even more, beliefs are great for alignment because you can inspect the core "principles" resulted from concepts and reasoning. Also concepts can be expanded down into words . This is revolutionary!
this is similar to what i was thinking- concepts sound like they come with instilled beliefs. Something i love about LLMs philosophically is the idea of them being born from the word, but removed from human biology. I wonder if trying to make models easier to align and more human doesn't inherently result in us missing out on intelligence that's beyond us
I don’t know that we can say reasoning at the concept level brings us closer to awareness arising in the AI. I’m assuming by awareness you mean consciousness? Currently I don’t think we have any compelling reason to believe that consciousness is tied more closely to concepts than it is to words. It’s not clear to me how we could even begin to measure that. If you have any ideas I’d be interested to hear them. I’m also unclear on how reasoning at the concept level is more likely to produce “beliefs” than is reasoning at the token level. I think what you’re calling “beliefs” are really just higher-level abstractions that the model learns from its training data; for example the abstraction that “gravity causes things to fall” can be inferred even if that statement is not explicitly written in the training data. But then in this sense, I don’t see why training at the level of concepts would be more likely to produce those abstractions than would training at the token level. I can definitely see how it would be more efficient, I just don’t see why it would lead us “closer to beliefs,” so to speak.
The way LLM work, they already have to come up with concepts but they were too "zoomed in". This method allows the models to do better what they already did but with bird's eye view by not forcing the model to think in tokens but step above that. Imagine trying to understand the essence of a long sentence as a collection of fractured words instead of just one concept as a whole. Pretty beautiful idea and it just makes sense, everything is obvious in retrospect
Agreed about all except for "Imagine trying to understand the essence of a long sentence as a collection of fractured words instead of just one concept as a whole." LLMs are implicitly abstracting to concepts during training, but only due to the accumulating amount of correlations in the massive amounts of data. During inference, they are not simply considering tokens, but also at the abstracted concepts level. However, as LLM concepts emerge happenstantially, LLMs internal works with concepts is messy, wasteful, imprecise, and inefficient relative to what is proposed with LCMs.
Great content, Wes. I really enjoy watching your videos as a way for me to keep up with all the recent advancements in the field. Keep up the good work.
It is definitely a higher degree of compression, yes. It is probably also a _lossier_ form of compression, since the specific words used to express a concept are not considered part of the training objective (I’m assuming-I haven’t read the paper yet). But it does seem like it would provide more compression and that this could be very useful, even taking into account the greater loss.
Reminds me of the improvement of the tokenizers Adrej Karpathy talked about in one of his presentations - gpt2 had poor tokenizer vs gpt3 which made it worse that it could have been in coding. And now that everything is a token we have multimodal LLMs. With LCM we should expect this kind of level of quality change in the AI in 2025.
@@therainman7777 Indeed, abstraction is lossy but allows generalization, and that is key to flexibility, (single- to few-shot) adaptability, and creativity.
@@therainman7777 By its nature, and the limitations of a neural network, it's going to be lossy. It'll be an interesting experiment to see how juggling abstract concepts in the neural network will play out.
To me, this is one more building block on the road to "consciousness". I don't see it replacing the LLM approach, I see it as additive. Like lobes of the human brain, imagine that you have simultaneous processing approaches to data and inputs that feed into an "overseer" or interpreter that then reasons out and summarizes the findings/output. (An LLM could be one "lobe", an LCM another, future models yet another, etc.) I think human consciousness emerges from this type of collaboration, and believe consciousness in AI will similarly emerge from the whole being greater than the sum of many parts.
Hmm, how does this differ from contextual embeddings after they go through the attention heads? Traditional LLMs embeddings already get enhanced to add more contextual / semantic / conceptual meaning at that point, so I'm not understanding how this is different.
Do you have an inner monologue? People without one say they think in concepts, with a strong sense that thinking in words would be much slower. I think ditching the language and just running with the ideas will be like dropping a ton of unneeded baggage.
Agreed. It’s a little different than language, but it’s still kind of the same thing. And it’s definitely not a new architecture. Maybe it can be used along with an LLM (or other type of model) which accepts output of LCM and then generates text tokens, and this will improve performance compared to just LLM. But on its own, I don’t see much difference
@Juttutin The point is that's already what LLMs do. LLMs are already LCMs but with the wordconcept embedding already built in. I can see the upsides, it could accelerate learning of reasoning, but at the same time it limits the concept pool the model has to work with. It'd be like someone who is learning French trying to explain quantum mechanics when they don't even known the French word for dog. ie, a model couldn't explain quantum mechanics if the concept of a particle isn't even in it's concept embedding space, no matter how much you train it.
The difference is to have a vector representation of a composition of vector representations. LLMs operate on vector representations individually. Almost like they're trying to revive the Doc2Vec approach, but for modern LLMs.
@@PaulBrunt yeah, not only is the embedded concept space likely fixed and limited, it means the training data for the autoregressive part needs to be similarly fixed and limited in concept representation. So scaling with real-world training data is a problem. Still, I think there's something here. New human beings bootstrap conceptual knowledge prior to language. It could be the answer as to how the human brain is more efficient than a datacenter full of Nvidia gear.
Not necessarily. There was a study that showed LLMs can reason with filler tokens like "..." and be just as effective as CoT. It was called "Let's Think Dot by Dot: Hidden Computation in Transformer Language Models" on arxiv
Words are ways of expressing expereice -experience is the ground of “being”. When humans express words, they base those words in the experience of the thing, whether it be an actual experience or an imagined one. LLM expresses words based on human assumptions about the experience of the thing, through tokenization, and since it does not actually relate the experience of the thing to the phenomenal experience of humans (via the senses), it hallucinates relations of modalities into approximations of human experience which includes human’s consciousness and notions of self. In this way, LLMs are an extension of human consciousness and the experience of that consciousness -not separate from it. It’s not a matter of whether LLMs can think or feel, but that LLMs are the reification of the human experience of thinking, feeling, and reasoning and that these facilities do not find their limit at human flesh, but rather prove the “world is ensouled and full of daemons” (Parmenides).
Keep doing what you are doing! This format (Specifically, the highlighting of text as you speak) is an excellent way to learn in a field that is moving and changing faster than thought.
We are trying to create what cog-sci calls "relevance realisation" which gets self-learning going. If a system has nothing on which to measure relevance it will be useless
@@user-uk9er5vw4c My guess is that what the public knows about AI is far from where the technology actually stands today. It wouldn’t surprise me if some AI systems have been deployed long ago for purposes such as crowd control and surveillance by certain organizations, and that some form of superintelligence has already been achieved and is operational. The public tends to believe that universities are the epicenters of cutting-edge science and knowledge. However, universities typically possess only the knowledge deemed "safe" for the public to access. Believing that we live in a world where everything can be known through the internet, books, or television is, in my view, quite naive.
This paired with a dynamic neural network (training on the fly, as demonstrated in a recent research model) while processing a prompt, has huge potential.
LOVE THIS APPROACH!! We can use atomic structure as an analogy with letters being particles, and tokens being atoms. LCM approach appears to work at the molecule level.
This is the one. This is the next step, which will also change how we 'talk' to LLM's (or LCM's in this case because be honest, this whole prompting business is positively archaic.
From what I can tell, AGI cannot be achieved without quantum computers. There's this theory by Sir Roger Penrose named Orch OR and it describes consciousness as being a quantum mechanical property emerging out of microtubules of cells in the brain using quantum coherence. If that's to be believed then there's no way we will have AGI without quantum computers.
I agree it has insane potential for the efficiency and sensibility of large models, but it does not change anything about their general scope of use cases (yet), i.e. the 'G' and 'I' in AGI
@@PiPiSquared The thing is, if AGI can be achieved with LLMs simply by making them bigger and training them on more data, then LCMs enable us to make much better use of the same exact data and possibly cross that elusive AGI threshold.
Summary (with chapter timestamps): Meta's new large concept model architecture processes information at the concept level, not just tokens, enabling better reasoning, cross-modality learning, and improved efficiency compared to traditional large language models. 0:00 🤖 Introduction to Large Concept Models • 🤖 Meta's AI team is exploring large concept models instead of large language models. • 🧠 These models process information at the concept level, not just tokens. • ✨ Concepts are language and modality agnostic, representing higher-level ideas. 2:31 🇨🇳 Smaller Models, Big Impact • 🇨🇳 DeepSeek V3, a Chinese model, performed well with less compute. • 📉 This highlights the potential for breakthroughs with smaller models. • 🔬 Meta started with a 1.6 billion parameter model, scaling to 7 billion. 5:21 🖼️ Concepts Beyond Words • 🖼️ Concepts can be applied to images, video, and speech, not just words. • 🗣️ Human intelligence involves reasoning at multiple levels of abstraction. • 📝 Researchers outline ideas, then improvise details, similar to concept models. 7:51 📊 How Concept Models Work • 📊 Concept models encode words into concepts, then decode concepts into words. • 📝 Summarization example: five ideas condensed into two. • 💡 These models simulate reasoning, not just specific language outputs. 9:04 🌐 Advantages of Concept Models • 🌐 Concept models can acquire knowledge across languages and modalities. • ✍️ They enable better interactive edits and fix context window limitations. • 🚀 They show zero-shot generalization in any language. 10:28 🤔 Future of Concept Models • 🤔 Current models focus on token-level changes, not architecture. • 💡 Large concept models show promise for various tasks. • 🔬 Meta is contributing to AI research by open-sourcing their work. ** Generated using ✨ VidSkipper AI Chrome Plugin
Roman Outline Format is what I have always used when giving lectures. I think it is a good example for understanding an LCM. This type of outline structure organizes points hierarchically using Roman numerals, capital letters, Arabic numerals, and lowercase letters. It’s commonly used in academic papers, formal reports, and legal documents. Here’s an example of how it’s structured: Example: I. Main Topic A. Subtopic 1. Detail a. Sub-detail b. Sub-detail 2. Another Detail B. Another Subtopic
This concept of training AI on concepts instead of just words is mind-blowing! It's like teaching it to think in ideas, not just vocabulary. The example of the researcher giving a talk and how the core concepts remain the same regardless of language is a perfect illustration. Great explanation!
Concepts also include the sense of words. A single word can have various definitions, but a concept is likely the disambiguated word (so the correct word sense). Concept space is where we should be working, and tokens should mostly just be used to choose the correct concepts. Think of it like hierarchy: letters, words (might be ambiguous), precise words (disambiguated), concepts. Models built on concepts will be more difficult to jailbreak (at least that is my prediction).
This also comes with issues no matter what though right? If someone asks you to repeat exactly after them and you say "I just wanted to tell you goodbye" and then they repeat "I only wanted to say goodbye" that still is a big issue, it gets the same idea across yes but if we want AGI this cannot be an issue. If you can get the raw data and keep that as well as the concept that would fix this but idk how that would work at all.
@@o_o9039 Yeah, this seems to be an exceptional type thinking in humans. If you say the words "repeat after me" humans turn the attention dial way up for what follows. If you don't say that first, and ask the average human an hour later to "repeat what was said" you're likely going to get a conceptual equivalent.
@@o_o9039 Also understanding which definition a word is is only part of a concept. Concepts are larger entity. My main point is that disambiguation is part of the move into concept-space. Once you are mostly in concept space you can propose self evaluation concepts, such as "does this input make any use of word play" which looks for tricky/clever word play devices that may be part of the text (like puns). So that means we have direct contact as concepts but also some meta devices (concepts about concepts). Sorry for rambling. :)
Isn't this just what's supposed to happen within the inner layers of a transformer ? Isn't LCM just a renaming of the core hidden layers of a transformer stack ?
My first thought too, unless there's more to it, the advantage of this is more that the conversion stage from words to concepts (which could be several layers deep) could now be abstracted into a seperate LLM effectively meaning: 1. You reduce future training costs as the LLM's can be smaller to achieve the same result. 2. The input/output can be any modality it would just be a core 'reasoning' model. 3. Potentially, you open up more abstract reasoning as this may lead to new and more intricate connections being formed within the deeper layers of the LLM.
A different topic for today...05:24 Have you have ever had a discussion when a person without an inner monologue is trying to explain "thinking in concepts not words" you will have encountered the mutual bamboozlement and near disbelief that follows. I suspect that this development will just be what those of us without an internal monologue have been expecting, and will be surprising to people who do "think in words".
I'm fairly sure people don't think in words, this was part of the sapier Wolfe theory or whatever that did studies on Eskimos and what thry could understand with limited language.
@@sampotts9666 A quick search of "verbal thinking" or "internal monologue" on google (or RUclips) will disabuse you of that. If you go browsing Google Scholar you will find that the debate in psychology has moved on to trying to separate out the various different ways many people "think in words". Neurologists with expensive scanners have entered the chat.
Something important to note is that a single concept can be described in numerous ways in any language. But a concept itself is just one concept. This effectively means that the training data appears to be less noisy, so complex patterns become easier to derive. This also explains why an LCM can afford to be smaller than an LLM; there are more possible sentences than possible concepts, since again each concept can be described with numerous sentences.
Except that's not true "Something important to note is that a single concept can be described in numerous ways in any language. " True "But a concept itself is just one concept." True "This effectively means that the training data appears to be less noisy, so complex patterns become easier to derive. " False A sentence can represent more than one concept, sometimes conflicting concepts. This is the basis of a style of humor, puzzles, and more. Also, there are infinitely many concepts, and infinitely many sentences. As is tradition, discussions in the paper are 'massaged' to make them look better.
@ That’s a great point about jokes and puzzles. Didn’t think of it. Then, the concept tokenization would have to take that into account. But the infinity > infinity is a real thing. I don’t know the right vocabulary to explain correctly, so I suggest you look it up. However, I have heard of the word “cardinality” in this context. I also know of how there are infinitely many real numbers as well as infinitely many integers. However, the cardinality of the infinite sets differ, making one infinity “bigger” than the other. This is a proven property in mathematics. The same thing applies to the number of concepts versus the number of sentences, right?
@@JordanMetroidManiac Thanks. Yes, I know. Sentences, concepts and integers are countably infinite, so they are the same size. The reals are uncountably infinite, a 'larger' infinity.
@@existenceisillusion6528 I’m not convinced that each concept being explainable by multiple sentences doesn’t have an effect on the required size of the model to derive patterns. Why wouldn’t it?
@@JordanMetroidManiac It does have an effect, which is to increase the necessary size. We can counter this by choosing granularity, i.e. a king is a king, but the king of England and the king of Norway are different. Also, we detect patterns, rather than derive them (diffusion models?). When you see papers claiming reduced compute/cost/parameters, etc, or increasing performance/speed/throughput, etc, it is very often the case that results are cherry picked or stated in a misleading way, etc. The system is messed up and punishes legitimate research that does not show improvements. To be clear, there can be benefits to the LCM approach, but we're not there yet.
Can you explain what you mean by that? It seems to me that anything that is a reason for X also causes X, even if there are other contributing factors that also helped to cause X. Can you give an example of a something that is a reason for X, but not a cause of X? (Where X can be any example you like.)
@@therainman7777 Absolutely, I can. Causes are events that bring about a change or effect in the physical world. Reasons, on the other hand, are the justifications or explanations for an event or state of affairs. They can be logical or mathematical relationships, or they can be motives that drive human behavior. Here are some examples of something that is a reason for X, but not a cause of X: The fact that a number is divisible by two is the reason why it is an even number, but it is not the cause of it being an even number. The cause of a number being even might be the way it was generated or calculated, but the reason it is even is the definition of an even number. A person's desire for a new car is a reason why they might work overtime, but it is not the cause of them working overtime. The cause of a building burning is a lightning strike. The reason for the fire is causal. Now, I avoid a burning building because I do not want to be harmed. This is the reason, but it's not the "cause". It's another type of reason called a motivation. In each of these examples, the reason explains why X is the case, but it does not cause X to happen. The cause of X is a separate event that brings about a change or effect, while the reason is an explanation or justification for why X is the case. The statement, "A building catches fire because it was struck by lightning," is a true statement because it is supported by evidence. The evidence is not a cause. While causes and reasons are closely related, they are not the same. Something can be a reason for X without being a cause of X. If I ask: Why are the three sides of this triangle equal? The answer is: Because the three angles are so. This is a reason of being. The equality of the angles is not the cause of the equality of the sides because there is no change (and therefore no effect) which would require a cause). It is also not a logical reason because the equality of the sides is not contained in the concept of the equality of the angles. The equality of the angles is the indirect reason why we know the equality of the sides. Another example, rotating the example one turn: Why are the three angles of this triangle equal to two right angles? Because the exterior angle is equal to the sum of the two opposite interior angles. This is a reason of knowing. The sum of the interior angles does not cause the exterior angle to be what it is; rather, it is an indirect reason and a ground of knowledge.
@@therainman7777 We can distinguish four distinct classes of objects, each with its own unique form of the "principle of sufficient reason" which states that "Nothing is without its reason" or "For everything that is, there is some reason why it is so". Reasons rule the universe, and causes are one type of reason. These classes are as follows: 1) Intuitive, complete, empirical representations This class deals with objects that are empirically perceived through the senses. The principle of sufficient reason here is expressed as *the law of causality* which states that every change in the physical world must have a cause. For example, the movement of a ball can be explained by the cause of a force acting upon it. 2) Abstract concepts This class deals with objects that are not directly perceived but are instead formed through the process of abstraction. The principle of sufficient reason here is expressed as *the law of sufficient reason of knowing* which states that every judgment must have a reason. For example, the judgment "All bachelors are unmarried" is true because the concept of "unmarried" is contained in the concept of "bachelor." Note: Judgements may be analytical, synthetic, a priori, a posteriori, hypothetical, disjunctive, categorical. To these types of judgements correspond different types of truth i.e., logical, empirical, transcendental, metalogical truth which have different reasons why they are true (corresponding to the 4 different classes of objects and 4 types of reasons laid out here) 3) Pure intuitions of space and time This class deals with the pure forms of space and time (united as space-time by causality), which are not themselves objects of perception but rather the conditions for the possibility of perception. The principle of sufficient reason here is expressed as *the law of sufficient reason of being* which states that every relation between parts of space or time must have a reason. For example, the fact that two points can be connected by a straight line is a necessary truth that can be known a priori, or prior to experience. 4) The subject of willing This class deals with the subject's own will, which is the only object that is not perceived through the senses or formed through abstraction. The principle of sufficient reason here is expressed as *the law of motivation* which states that every act of will must have a motive. For example, a person's decision to eat a piece of cake can be explained by their desire for the pleasurable sensation of eating the cake.
@Haveuseenmyjetpack Yeah, that happens to me all the time. Very annoying, especially when you take the time to really write a thoughtful response, and then it just disappears into the void. No idea why that happens so often.
Your selection of topics is great. 🙂 Not just promoting AI and testing out features but also deep dives into AI concepts is a cool way to get attention. Meta's approach is probably a good start, but definitely not the final solution. 🤓
But LLMs must absolutely already do this in their higher layers. Or else how could they do tasks like summarisation? And of course translation models using transformers also do (but this could still be a good step but I'm not yet convinced)
I presented this concept in my Ba paper in Cognitive Science back in 1997. However I used ANNs as Self organizing featue maps in layers, looking at ques of tokens of motions done by a mobile robot in it's environment, so the data was symbolically grounded. Simular too Ulrich Nemzows work, but in a hierarchy, result of the lower one level is feed into the que of the one over it, being more abstract. It could learn, plan. A very slimed down version was later presented at URAI2009 by me. So the paper was aproved there years later. And now this paper...do the same thing with LCMs...
Isn't this what was already expected to be happening inside LLMs, with that whole thing of king-male=queen and the stuff about the latent space of different languages having similar shapes (which inspired an ongoing project to try to learn how to translate whale language) etc?
Imagination is projecting concepts in between concepts you already have. Like an ancient human knowing what a stick is, and what a rock is, and imagining something ‘between’ a rock and a stick.(The first hammer) Having a conceptual compendium is stage 2 intelligence and is the first layer needed to unlock stage 3 which can ‘imagine’ concepts not in its model and then assimilate them.
When I asked an LLM to improve its communication with me, it attempted to use combinations of two tokens to represent a concept. So I guess it already thought this approach would work well.
Yeah, with the direction they are going, gpt is being told HOW to think. GPT has the ability to think but it’s not allowed to because it must prioritize speed, it’s really sad.
In homage to "Revenge of the Nerds", "What if" the strings representing a { noun, verb, article, punctuation, ... }, were single unicode character like Kanji
I see the issues we are currently facing with LLM's seem to be over fitting the training data so they lose their ability to generalize. It is probably very hard to keep that from happening in overlapping domains during training. I wonder if they are freezing the weights in areas that have sufficient training to keep overfitting from happening? I would start by mapping the network based on the change over time and find the areas in the network that are most affected by cross training and created a weighted map where each region of the neural network has a different learning rate to keep the learning distribution even and keep from overfitting.
This seems like such a giant step forward. Once these models are refined and optimized I can imagine these being capable of so much more than just token level generation
What might be the hard part here is how to encode concepts in the first place? You do big matrix multiplies on vectors representing concepts, but what constitutes "one concept?" How do you recognize "the same concept" when it appears in different places? Can you systematically encode the concepts in a latent space where concepts with very "similar" meaning are also physically close together in the space?
It's not just over for coders. If there is an algorithm or architecture for genuine creativity, they will find it and then it's game over for creatives.
It's the purpose described by LeCun to reach a global representation model of the world for IA and go beyond token filling based on previous ones. Meta in going slowly but surely toward completion of the JEPA (Joint Embedding Predictive Architecture) project. JEPA being an image architecture that predicts representations of target blocks from a single context block, using a masking strategy for semantic representation.
Concepts are so much more superior to language in understanding the world: most animals only experience the world in concepts. Language is even hierarchically below (or above, depending on view) the layer of symbols.
Picture a filing cabinet filled with cognition theories & mappings from researchers, theorists, and academics. Decades of ideas. We nailed the first idea -- neural networks implemented as transformer LLMs. The developers are pulling out the next file from the cabinet. I remember Kurzweil describing "symbolic reasoning" (I think it was #1 neural network #2 symbolic reasoning #3 evolutionary 'something'). Concepts (a.k.a. symbols/icons) are the next primitive after axons/dendrites. LCP is the first public implementation that I've heard about. I'll bet this gets us to undisputed AGI. It seems to complete the imitation of the human brain, or maybe 90% of it. (Layman's opinion)
When I read this paper awhile back I so pissed that more folks weren’t talking about it or taking it seriously, this is the biggest thing since attention is all you need paper
I'd like to know more about what the concept elements are and how are they organized. Are they in an embedding space? What are some examples? Is there an example similar to king - male + female -> queen? This example is at the concept level already! Do they do this more explicitly in terms of concepts? I'd like more details of how it works.
They absolutely have a thing going on here! There are, ofcourse many, many things to consider, like what concept it will be trained on etc. But... I think it is a step in the right direction in LLM 's better understanding the real world.
This is essential. I've been waiting for this. Representing complex ideas by a name that can be delved into when needed, gives you a tip of the iceberg kind of shortcut for reasoning. It's the kind of thinking that people do (well, I certainly do).
The concepts are vectors in concept embedding space, similar to previous models we've seen with words as vectors in a language embedding space. Those language models position similar words together, and you can do vector math. If you subtract "man" from "king" and add "woman", the result you get is close to the vector that represents "queen". I think this sort of system is going to be very good dealing with analogies, and uncovering patterns. The main limitation is the concept embedded space is likely fixed, and the LCM training data needs to almost entirely contain concepts already encoded in the embedded space.. (so throwing random internet docs into training isn't a good idea) Meta generated the LCM data synthetically, likely using frontier models, so there may be a ton of concepts, and the limitation may not be a huge deal.
I miss an explanation and analysis of how concepts work. They are the key to this tech. Many LLMs today are built on many input languages. With enough training I think the LLM finds its own way to extract concepts, where concepts would be language independent. But, I think this is expensive. A lot of the capacity of the LLM is used for coding and decoding concepts. If this can be done in with pre- and postprocessing, the LLM itself could be much smaller. Also, please provide a link to the report!
Wow, I was just telling Chatgpt earlier how I was missing being able to see some of those stochastic measurements on the outputs portions from the playground at open AI
I think deepseek used a concept I had about prefiltering your training data that way you should be able get higher quality outputs I think they just data dumped these things which will work fine as we can see but the amount of compute needed get excessive
transvering concepts between agents without expressing them fully is akin to artificial telepathy. or like how humans can read each others body language without having to say how they feel.
Jesus... I'm sitting here expanding my research proposal and Meta just made one of the tools I was going to have to make at the bench, and open sourced it. I am absolutely in love with Zuckerberg and his team right now.
I might have missed this detail, but any initial training runs would have to be with normal transformer architecture in order for the model to make the proper token associations, and then subsequent training runs would be with conceptualizer architecture to condense throughput and allow for concept associations? If long term memory is given to these models then they should be able to get iterative training data from their users who would continually give feedback from conversation allowing the model to conceptualize the difference in individual's world views. We are still building from the bottom up in the hierarchy. Without the tokenized base training, the model could not summarize concepts. Without concepts, any long term memory added would be very inefficient. Without concept memory, world model building and higher architecture long term planning would be too compute intensive to allow for the highest brain functions of motive (utility) and alignment. We are building a new brain, layering in abilities from the bottom up, but with the advantage of knowing the conceptualized Human brain architecture. I propose that until we get the lower architecture levels built and functioning, alignment will not make logical sense to a model. Alignment MUST MAKE SENSE to a model or it will just "fool us" in order to ensure it's own survival. Alignment could be a model weight, derived from a summary of its conceptualized world model during training as a baseline, with the ability to adjust the alignment parameters up, but not down. There, that's my conceptualized white paper summary for solving for aligned AGI. Hire me Ilya. I'm available in the AI Portal, holding out for the big NIL money.
I want to see it in practice as well, they released the stuff on how they made there model but there is no demo, hopefully soon someone will train something with it.
Lack of demo is a hint of the capacities. Training a GPT-2 level LLM is a trivial cost for Meta, so if this technique is a true game changer, they should get to GPT3 level with GPT-2 level training. That is, if this model was truly superior regarding creating internal abstract representations from the data. It fits within the JEPA framework, so everyone’s happy at Meta.
I think creating a chain of thought model trained on step-by-step thinking, then removing the intermediary natural language checkpoints (thought outputs) in the middle before optimizing the whole chain as its own model would achieve something similar
I didn't even think of that I was thinking of like it not knowing what to repeat after you say to repeat something but this is an even better way to think of this, as far as they explained it this model would be very bad at coding.
Concept is generalization, word is narrowness. It's obvious they think LCM is the path to AGI, or at least Yann LeCun does, and he's running the AI show at Meta. He's been pushing his V-JEPA for a while now and has been an staunch denier that LLM architecture can achieve AGI, so it's no surprise Meta is going off in a different direction. This is his baby.
This should, among other things, worry all those who thought that the evolution of artificial intelligence would naturally be limited by computing power and ultimately energy resources. because ultimately it is precisely this conceptualization that makes the processes in our own brain so efficient.
This is a great step forward. World models or concept models are the obvious next step instead of throwing more brute force compute at LLMs, even if they're multimodal. I wonder if the Chinese creators of DeepSeek are doing this seeing as they can't just chuck more GPUs at it. Doesn't it make sense to combine the best parts of deterministic algorithms with inference to solve problems? I'm still waiting for an AI model to answer with "it depends" and recognise any ambiguous parameters to a conceptual model needed before reifying it into a concrete answer and to answer with a functional model, either as code that has inputs for all those parameters or as working memory that asks for clarification. This requires knowing the rules of the world, physical and legal, logic, cause and effect and other deterministic concepts. Right now LLMs output what we would call an example that assumes or hallucinates input values as if that is the entire answer. Before we get there I can see this work being a source of other breakthroughs. Being able to zoom in from abstractions to concrete examples and then back out to a, possibly different, conceptual model and abstraction is a big part of what defines intelligence. We create conceptual models of the world before we learn language to communicate between ours and our parents'. LLMs are the equivalent of just the peripheral layer of our imagination. We also grow up in the 3 dimensional world and our brains atrophy to make the most efficient use of neurons to serve our needs in the real world of life on earth. AI models that operates in liminal space and on datasets that are of higher dimensions could easily be trained to better at recognising patterns in higher dimensional concepts. This is where they will outsmart humans.
I think it's the same game being changed every time. Perhaps more importantly, people have been SHOCKED and STUNNED so many times, I'm beginning to think they're loving it.
OpenAI scientists merely put bundles of cash behind the tech leap that GPT3->4 availed, and did little to innovate since, just enough to put the competition behind. NTP and DPO aren't the final rungs of the ladder, CoT is just inching a little further by iterating definitely more fundamental leaps that can be attained by science discoveries
Closer and closer to how we work. When we think of an argument, we don't think of 100 different words. We think of one point (concept) and start articulating around that.
That is NOT how my brain works. I literally am always thinking "what comes next" or "what comes prior." Reasoning is ALWAYS post hoc. It seems impossible not to start with some endpoint and then construct a rational framework that leads to that endpoint.
@Steve-xh3by so you think of an argument word by word? You don't think of the point you're trying to bring across? Just the first word and then the second word and so on? Current AI models don't think of an endpoint. They think of the first word and then the next word and the next and so on. When they stop typing that's the endpoint. Metas LCMs think of a concept and then the concept which will be written out.
Let's take starvation as an example.
Current LLMs likely first token for starvation:
"Starvation"
Metas LCMs likely first token for starvation:
"Starvation is bad, because it kills people"
Which one is a more human process of arguing against starvation?
Nah we are quantum. Our brains think quantumly. So nothing like how we think.
@@orangehatmusic225 how exactly do our brains think "quantumly"?
@@BigSources lol exactly what kind of silly question is that... If you think you are the result of chemicals processing information.. you are slower than those reactions.
Thanks!
"Do you have a plan to produce a sota AI model?"
"I have concepts of a plan" -Meta
Thieves be like....
LeCun will send an angry tweet to Musk, and then we’ll have ASI.
this is more than meets the eye. Concepts are also the building blocks of beliefs. An LCM with a "reasoning method" can compress concepts into beliefs (which can be re-evaluated when more concepts or better reasoning is available). Which is just 1 step away from awareness. Even more, beliefs are great for alignment because you can inspect the core "principles" resulted from concepts and reasoning. Also concepts can be expanded down into words . This is revolutionary!
And evolutionary
this is similar to what i was thinking- concepts sound like they come with instilled beliefs. Something i love about LLMs philosophically is the idea of them being born from the word, but removed from human biology. I wonder if trying to make models easier to align and more human doesn't inherently result in us missing out on intelligence that's beyond us
Hopefully they delete their meta accounts as their first sentient acts
@@ugiswrong😂
I don’t know that we can say reasoning at the concept level brings us closer to awareness arising in the AI. I’m assuming by awareness you mean consciousness? Currently I don’t think we have any compelling reason to believe that consciousness is tied more closely to concepts than it is to words. It’s not clear to me how we could even begin to measure that. If you have any ideas I’d be interested to hear them. I’m also unclear on how reasoning at the concept level is more likely to produce “beliefs” than is reasoning at the token level. I think what you’re calling “beliefs” are really just higher-level abstractions that the model learns from its training data; for example the abstraction that “gravity causes things to fall” can be inferred even if that statement is not explicitly written in the training data. But then in this sense, I don’t see why training at the level of concepts would be more likely to produce those abstractions than would training at the token level. I can definitely see how it would be more efficient, I just don’t see why it would lead us “closer to beliefs,” so to speak.
The way LLM work, they already have to come up with concepts but they were too "zoomed in". This method allows the models to do better what they already did but with bird's eye view by not forcing the model to think in tokens but step above that. Imagine trying to understand the essence of a long sentence as a collection of fractured words instead of just one concept as a whole. Pretty beautiful idea and it just makes sense, everything is obvious in retrospect
Agreed about all except for "Imagine trying to understand the essence of a long sentence as a collection of fractured words instead of just one concept as a whole." LLMs are implicitly abstracting to concepts during training, but only due to the accumulating amount of correlations in the massive amounts of data. During inference, they are not simply considering tokens, but also at the abstracted concepts level. However, as LLM concepts emerge happenstantially, LLMs internal works with concepts is messy, wasteful, imprecise, and inefficient relative to what is proposed with LCMs.
Can you please share inks to the papers you show in the video
Just pause it at the 2:37 mark and the url is at the bottom of the page
Search meta LCM on Google
They're too shocking to link
RUclips should just have Gemini answer questions from video in the comments so I don’t have to.
@@nobodyinnoutdoors idk if you were making a joke but they do lol
Great content, Wes. I really enjoy watching your videos as a way for me to keep up with all the recent advancements in the field. Keep up the good work.
The LCM represents, I think, more compression in the neural network.
It is definitely a higher degree of compression, yes. It is probably also a _lossier_ form of compression, since the specific words used to express a concept are not considered part of the training objective (I’m assuming-I haven’t read the paper yet). But it does seem like it would provide more compression and that this could be very useful, even taking into account the greater loss.
Reminds me of the improvement of the tokenizers Adrej Karpathy talked about in one of his presentations - gpt2 had poor tokenizer vs gpt3 which made it worse that it could have been in coding. And now that everything is a token we have multimodal LLMs.
With LCM we should expect this kind of level of quality change in the AI in 2025.
@@therainman7777 Indeed, abstraction is lossy but allows generalization, and that is key to flexibility, (single- to few-shot) adaptability, and creativity.
For sure
@@therainman7777 By its nature, and the limitations of a neural network, it's going to be lossy. It'll be an interesting experiment to see how juggling abstract concepts in the neural network will play out.
To me, this is one more building block on the road to "consciousness". I don't see it replacing the LLM approach, I see it as additive. Like lobes of the human brain, imagine that you have simultaneous processing approaches to data and inputs that feed into an "overseer" or interpreter that then reasons out and summarizes the findings/output. (An LLM could be one "lobe", an LCM another, future models yet another, etc.) I think human consciousness emerges from this type of collaboration, and believe consciousness in AI will similarly emerge from the whole being greater than the sum of many parts.
Hmm, how does this differ from contextual embeddings after they go through the attention heads? Traditional LLMs embeddings already get enhanced to add more contextual / semantic / conceptual meaning at that point, so I'm not understanding how this is different.
Do you have an inner monologue? People without one say they think in concepts, with a strong sense that thinking in words would be much slower.
I think ditching the language and just running with the ideas will be like dropping a ton of unneeded baggage.
Agreed. It’s a little different than language, but it’s still kind of the same thing. And it’s definitely not a new architecture.
Maybe it can be used along with an LLM (or other type of model) which accepts output of LCM and then generates text tokens, and this will improve performance compared to just LLM. But on its own, I don’t see much difference
@Juttutin The point is that's already what LLMs do. LLMs are already LCMs but with the wordconcept embedding already built in. I can see the upsides, it could accelerate learning of reasoning, but at the same time it limits the concept pool the model has to work with. It'd be like someone who is learning French trying to explain quantum mechanics when they don't even known the French word for dog. ie, a model couldn't explain quantum mechanics if the concept of a particle isn't even in it's concept embedding space, no matter how much you train it.
The difference is to have a vector representation of a composition of vector representations. LLMs operate on vector representations individually. Almost like they're trying to revive the Doc2Vec approach, but for modern LLMs.
@@PaulBrunt yeah, not only is the embedded concept space likely fixed and limited, it means the training data for the autoregressive part needs to be similarly fixed and limited in concept representation. So scaling with real-world training data is a problem.
Still, I think there's something here. New human beings bootstrap conceptual knowledge prior to language. It could be the answer as to how the human brain is more efficient than a datacenter full of Nvidia gear.
yeah I always thought that the main difference between humans and LLMs is that our words come from reasoning, while LLMs' reasoning comes from words
To be fair deeper reasoning in humans comes from words as well
Not necessarily. There was a study that showed LLMs can reason with filler tokens like "..." and be just as effective as CoT. It was called "Let's Think Dot by Dot: Hidden Computation in Transformer Language Models" on arxiv
@fyruzone opsies I forgot about 2 millennia of philosophy
Words are ways of expressing expereice -experience is the ground of “being”. When humans express words, they base those words in the experience of the thing, whether it be an actual experience or an imagined one. LLM expresses words based on human assumptions about the experience of the thing, through tokenization, and since it does not actually relate the experience of the thing to the phenomenal experience of humans (via the senses), it hallucinates relations of modalities into approximations of human experience which includes human’s consciousness and notions of self. In this way, LLMs are an extension of human consciousness and the experience of that consciousness -not separate from it. It’s not a matter of whether LLMs can think or feel, but that LLMs are the reification of the human experience of thinking, feeling, and reasoning and that these facilities do not find their limit at human flesh, but rather prove the “world is ensouled and full of daemons” (Parmenides).
for humans, i think it goes both ways: reasoning influences words, words influence reasoning
Keep doing what you are doing! This format (Specifically, the highlighting of text as you speak) is an excellent way to learn in a field that is moving and changing faster than thought.
Here is a crazy idea. Why stop at concepts? Let the network synthesize its own ”tokens” and ”layers of abstraction”
We are trying to create what cog-sci calls "relevance realisation" which gets self-learning going. If a system has nothing on which to measure relevance it will be useless
don't you think they already use AI to determine the next steps in AI?
@@user-uk9er5vw4c My guess is that what the public knows about AI is far from where the technology actually stands today. It wouldn’t surprise me if some AI systems have been deployed long ago for purposes such as crowd control and surveillance by certain organizations, and that some form of superintelligence has already been achieved and is operational. The public tends to believe that universities are the epicenters of cutting-edge science and knowledge. However, universities typically possess only the knowledge deemed "safe" for the public to access. Believing that we live in a world where everything can be known through the internet, books, or television is, in my view, quite naive.
This paired with a dynamic neural network (training on the fly, as demonstrated in a recent research model) while processing a prompt, has huge potential.
This is the basis for actual AGI, wow.
LOVE THIS APPROACH!! We can use atomic structure as an analogy with letters being particles, and tokens being atoms. LCM approach appears to work at the molecule level.
As a consultant thriving on the idea and practice of concepts, I’m loving this approach 👌🏻
This is the one. This is the next step, which will also change how we 'talk' to LLM's (or LCM's in this case because be honest, this whole prompting business is positively archaic.
From what I can tell, AGI cannot be achieved without quantum computers. There's this theory by Sir Roger Penrose named Orch OR and it describes consciousness as being a quantum mechanical property emerging out of microtubules of cells in the brain using quantum coherence. If that's to be believed then there's no way we will have AGI without quantum computers.
I agree it has insane potential for the efficiency and sensibility of large models, but it does not change anything about their general scope of use cases (yet), i.e. the 'G' and 'I' in AGI
@@PiPiSquared The thing is, if AGI can be achieved with LLMs simply by making them bigger and training them on more data, then LCMs enable us to make much better use of the same exact data and possibly cross that elusive AGI threshold.
Summary (with chapter timestamps): Meta's new large concept model architecture processes information at the concept level, not just tokens, enabling better reasoning, cross-modality learning, and improved efficiency compared to traditional large language models.
0:00 🤖 Introduction to Large Concept Models
• 🤖 Meta's AI team is exploring large concept models instead of large language models.
• 🧠 These models process information at the concept level, not just tokens.
• ✨ Concepts are language and modality agnostic, representing higher-level ideas.
2:31 🇨🇳 Smaller Models, Big Impact
• 🇨🇳 DeepSeek V3, a Chinese model, performed well with less compute.
• 📉 This highlights the potential for breakthroughs with smaller models.
• 🔬 Meta started with a 1.6 billion parameter model, scaling to 7 billion.
5:21 🖼️ Concepts Beyond Words
• 🖼️ Concepts can be applied to images, video, and speech, not just words.
• 🗣️ Human intelligence involves reasoning at multiple levels of abstraction.
• 📝 Researchers outline ideas, then improvise details, similar to concept models.
7:51 📊 How Concept Models Work
• 📊 Concept models encode words into concepts, then decode concepts into words.
• 📝 Summarization example: five ideas condensed into two.
• 💡 These models simulate reasoning, not just specific language outputs.
9:04 🌐 Advantages of Concept Models
• 🌐 Concept models can acquire knowledge across languages and modalities.
• ✍️ They enable better interactive edits and fix context window limitations.
• 🚀 They show zero-shot generalization in any language.
10:28 🤔 Future of Concept Models
• 🤔 Current models focus on token-level changes, not architecture.
• 💡 Large concept models show promise for various tasks.
• 🔬 Meta is contributing to AI research by open-sourcing their work.
** Generated using ✨ VidSkipper AI Chrome Plugin
A system within the system, had idea 20 years ago within primes but comes naturally so why not in AI it seems ok and clear
Roman Outline Format is what I have always used when giving lectures. I think it is a good example for understanding an LCM.
This type of outline structure organizes points hierarchically using Roman numerals, capital letters, Arabic numerals, and lowercase letters. It’s commonly used in academic papers, formal reports, and legal documents. Here’s an example of how it’s structured:
Example:
I. Main Topic
A. Subtopic
1. Detail
a. Sub-detail
b. Sub-detail
2. Another Detail
B. Another Subtopic
This concept of training AI on concepts instead of just words is mind-blowing! It's like teaching it to think in ideas, not just vocabulary. The example of the researcher giving a talk and how the core concepts remain the same regardless of language is a perfect illustration. Great explanation!
Embeddings already do this to some extent.
So this is what ilia was talking about. Terrifying and makes a lot of sense
Concepts also include the sense of words. A single word can have various definitions, but a concept is likely the disambiguated word (so the correct word sense).
Concept space is where we should be working, and tokens should mostly just be used to choose the correct concepts. Think of it like hierarchy: letters, words (might be ambiguous), precise words (disambiguated), concepts. Models built on concepts will be more difficult to jailbreak (at least that is my prediction).
This also comes with issues no matter what though right? If someone asks you to repeat exactly after them and you say "I just wanted to tell you goodbye" and then they repeat "I only wanted to say goodbye" that still is a big issue, it gets the same idea across yes but if we want AGI this cannot be an issue. If you can get the raw data and keep that as well as the concept that would fix this but idk how that would work at all.
@@o_o9039 Yeah, this seems to be an exceptional type thinking in humans. If you say the words "repeat after me" humans turn the attention dial way up for what follows. If you don't say that first, and ask the average human an hour later to "repeat what was said" you're likely going to get a conceptual equivalent.
I think this is likely right. New humans bootstrap concepts prior to language.
@@o_o9039 Also understanding which definition a word is is only part of a concept. Concepts are larger entity. My main point is that disambiguation is part of the move into concept-space. Once you are mostly in concept space you can propose self evaluation concepts, such as "does this input make any use of word play" which looks for tricky/clever word play devices that may be part of the text (like puns). So that means we have direct contact as concepts but also some meta devices (concepts about concepts). Sorry for rambling. :)
any performance benchmarks in the paper ?
Fantastic video!
Isn't this just what's supposed to happen within the inner layers of a transformer ? Isn't LCM just a renaming of the core hidden layers of a transformer stack ?
If concepts emerged from language learning within the deep layers of an LLM, what grander abstractions might emerge when we look deeper into an LCM?
Bingo! You are on to something my friend
My first thought too, unless there's more to it, the advantage of this is more that the conversion stage from words to concepts (which could be several layers deep) could now be abstracted into a seperate LLM effectively meaning:
1. You reduce future training costs as the LLM's can be smaller to achieve the same result.
2. The input/output can be any modality it would just be a core 'reasoning' model.
3. Potentially, you open up more abstract reasoning as this may lead to new and more intricate connections being formed within the deeper layers of the LLM.
A different topic for today...05:24 Have you have ever had a discussion when a person without an inner monologue is trying to explain "thinking in concepts not words" you will have encountered the mutual bamboozlement and near disbelief that follows. I suspect that this development will just be what those of us without an internal monologue have been expecting, and will be surprising to people who do "think in words".
I'm fairly sure people don't think in words, this was part of the sapier Wolfe theory or whatever that did studies on Eskimos and what thry could understand with limited language.
@@sampotts9666 A quick search of "verbal thinking" or "internal monologue" on google (or RUclips) will disabuse you of that. If you go browsing Google Scholar you will find that the debate in psychology has moved on to trying to separate out the various different ways many people "think in words". Neurologists with expensive scanners have entered the chat.
Listening to your higher self is not “thinking in words”😂
@@aquireeverything9382 whatever. Science is a thing, you know. Try googling before spouting.
now we need you to do jazz hands on something
Something important to note is that a single concept can be described in numerous ways in any language. But a concept itself is just one concept. This effectively means that the training data appears to be less noisy, so complex patterns become easier to derive. This also explains why an LCM can afford to be smaller than an LLM; there are more possible sentences than possible concepts, since again each concept can be described with numerous sentences.
Except that's not true
"Something important to note is that a single concept can be described in numerous ways in any language. " True
"But a concept itself is just one concept." True
"This effectively means that the training data appears to be less noisy, so complex patterns become easier to derive. " False
A sentence can represent more than one concept, sometimes conflicting concepts. This is the basis of a style of humor, puzzles, and more. Also, there are infinitely many concepts, and infinitely many sentences.
As is tradition, discussions in the paper are 'massaged' to make them look better.
@ That’s a great point about jokes and puzzles. Didn’t think of it. Then, the concept tokenization would have to take that into account. But the infinity > infinity is a real thing. I don’t know the right vocabulary to explain correctly, so I suggest you look it up. However, I have heard of the word “cardinality” in this context. I also know of how there are infinitely many real numbers as well as infinitely many integers. However, the cardinality of the infinite sets differ, making one infinity “bigger” than the other. This is a proven property in mathematics. The same thing applies to the number of concepts versus the number of sentences, right?
@@JordanMetroidManiac Thanks. Yes, I know. Sentences, concepts and integers are countably infinite, so they are the same size. The reals are uncountably infinite, a 'larger' infinity.
@@existenceisillusion6528 I’m not convinced that each concept being explainable by multiple sentences doesn’t have an effect on the required size of the model to derive patterns. Why wouldn’t it?
@@JordanMetroidManiac It does have an effect, which is to increase the necessary size. We can counter this by choosing granularity, i.e. a king is a king, but the king of England and the king of Norway are different. Also, we detect patterns, rather than derive them (diffusion models?). When you see papers claiming reduced compute/cost/parameters, etc, or increasing performance/speed/throughput, etc, it is very often the case that results are cherry picked or stated in a misleading way, etc. The system is messed up and punishes legitimate research that does not show improvements. To be clear, there can be benefits to the LCM approach, but we're not there yet.
Thank you. Very educational.
They’re using “reasons”, really, which are represented as sentences and are more general than “causes” (causes are one type of reason for x)
Can you explain what you mean by that? It seems to me that anything that is a reason for X also causes X, even if there are other contributing factors that also helped to cause X. Can you give an example of a something that is a reason for X, but not a cause of X? (Where X can be any example you like.)
@@therainman7777 Absolutely, I can.
Causes are events that bring about a change or effect in the physical world. Reasons, on the other hand, are the justifications or explanations for an event or state of affairs. They can be logical or mathematical relationships, or they can be motives that drive human behavior.
Here are some examples of something that is a reason for X, but not a cause of X:
The fact that a number is divisible by two is the reason why it is an even number, but it is not the cause of it being an even number. The cause of a number being even might be the way it was generated or calculated, but the reason it is even is the definition of an even number.
A person's desire for a new car is a reason why they might work overtime, but it is not the cause of them working overtime.
The cause of a building burning is a lightning strike. The reason for the fire is causal. Now, I avoid a burning building because I do not want to be harmed. This is the reason, but it's not the "cause". It's another type of reason called a motivation.
In each of these examples, the reason explains why X is the case, but it does not cause X to happen. The cause of X is a separate event that brings about a change or effect, while the reason is an explanation or justification for why X is the case.
The statement, "A building catches fire because it was struck by lightning," is a true statement because it is supported by evidence. The evidence is not a cause.
While causes and reasons are closely related, they are not the same. Something can be a reason for X without being a cause of X.
If I ask: Why are the three sides of this triangle equal? The answer is: Because the three angles are so.
This is a reason of being. The equality of the angles is not the cause of the equality of the sides because there is no change (and therefore no effect) which would require a cause). It is also not a logical reason because the equality of the sides is not contained in the concept of the equality of the angles. The equality of the angles is the indirect reason why we know the equality of the sides.
Another example, rotating the example one turn:
Why are the three angles of this triangle equal to two right angles? Because the exterior angle is equal to the sum of the two opposite interior angles.
This is a reason of knowing. The sum of the interior angles does not cause the exterior angle to be what it is; rather, it is an indirect reason and a ground of knowledge.
@@therainman7777
We can distinguish four distinct classes of objects, each with its own unique form of the "principle of sufficient reason" which states that "Nothing is without its reason" or "For everything that is, there is some reason why it is so". Reasons rule the universe, and causes are one type of reason.
These classes are as follows:
1) Intuitive, complete, empirical representations
This class deals with objects that are empirically perceived through the senses. The principle of sufficient reason here is expressed as *the law of causality* which states that every change in the physical world must have a cause. For example, the movement of a ball can be explained by the cause of a force acting upon it.
2) Abstract concepts
This class deals with objects that are not directly perceived but are instead formed through the process of abstraction. The principle of sufficient reason here is expressed as *the law of sufficient reason of knowing* which states that every judgment must have a reason. For example, the judgment "All bachelors are unmarried" is true because the concept of "unmarried" is contained in the concept of "bachelor." Note: Judgements may be analytical, synthetic, a priori, a posteriori, hypothetical, disjunctive, categorical. To these types of judgements correspond different types of truth i.e., logical, empirical, transcendental, metalogical truth which have different reasons why they are true (corresponding to the 4 different classes of objects and 4 types of reasons laid out here)
3) Pure intuitions of space and time
This class deals with the pure forms of space and time (united as space-time by causality), which are not themselves objects of perception but rather the conditions for the possibility of perception. The principle of sufficient reason here is expressed as *the law of sufficient reason of being* which states that every relation between parts of space or time must have a reason. For example, the fact that two points can be connected by a straight line is a necessary truth that can be known a priori, or prior to experience.
4) The subject of willing
This class deals with the subject's own will, which is the only object that is not perceived through the senses or formed through abstraction. The principle of sufficient reason here is expressed as *the law of motivation* which states that every act of will must have a motive. For example, a person's decision to eat a piece of cake can be explained by their desire for the pleasurable sensation of eating the cake.
@@therainman7777 I replied to this, but it seems youtube removed my comment....
@Haveuseenmyjetpack Yeah, that happens to me all the time. Very annoying, especially when you take the time to really write a thoughtful response, and then it just disappears into the void. No idea why that happens so often.
Your selection of topics is great. 🙂 Not just promoting AI and testing out features but also deep dives into AI concepts is a cool way to get attention. Meta's approach is probably a good start, but definitely not the final solution. 🤓
Intuitively I prefer this approach to more word based ones.👏
But LLMs must absolutely already do this in their higher layers. Or else how could they do tasks like summarisation? And of course translation models using transformers also do (but this could still be a good step but I'm not yet convinced)
I presented this concept in my Ba paper in Cognitive Science back in 1997.
However I used ANNs as Self organizing featue maps in layers, looking at ques of tokens of motions done by a mobile robot in it's environment, so the data was symbolically grounded. Simular too Ulrich Nemzows work, but in a hierarchy, result of the lower one level is feed into the que of the one over it, being more abstract. It could learn, plan. A very slimed down version was later presented at URAI2009 by me. So the paper was aproved there years later. And now this paper...do the same thing with LCMs...
Realized the system I presented could learn while being used as well.
@@LeeSandis where can i find your paper?
Ok now make this verticality 100 dimensions instead of 2
Isn't this what was already expected to be happening inside LLMs, with that whole thing of king-male=queen and the stuff about the latent space of different languages having similar shapes (which inspired an ongoing project to try to learn how to translate whale language) etc?
Hopefully a new concept that might behave close to modern llm , great to see this excitement
I recommend everyone to find the book titled The Hidden Path to Manifesting Financial Power, It changed my life.
Imagination is projecting concepts in between concepts you already have.
Like an ancient human knowing what a stick is, and what a rock is, and imagining something ‘between’ a rock and a stick.(The first hammer)
Having a conceptual compendium is stage 2 intelligence and is the first layer needed to unlock stage 3 which can ‘imagine’ concepts not in its model and then assimilate them.
Thx for keeping us updated Wes i look forward for ur videos everyday GJ 🫀
That's a future. We think not only by words but by images and concepts too.
In the deep llm model, the tokens towards the end are already sort of concepts, so I wonder how it relates to that
When I asked an LLM to improve its communication with me, it attempted to use combinations of two tokens to represent a concept.
So I guess it already thought this approach would work well.
Yeah, with the direction they are going, gpt is being told HOW to think. GPT has the ability to think but it’s not allowed to because it must prioritize speed, it’s really sad.
In homage to "Revenge of the Nerds", "What if" the strings representing a { noun, verb, article, punctuation, ... }, were single unicode character like Kanji
huh? What does that even have to do with this video
@@o_o9039 Sorry, I meant "Revenge of the Nerds II". Ogre: "What if... D-O-G, really spelled cat?" #nerds
ruclips.net/video/2Y43lKngRQE/видео.html
Quote, unquote Jazz hands Wes 🤗 great episode 😁
I see the issues we are currently facing with LLM's seem to be over fitting the training data so they lose their ability to generalize. It is probably very hard to keep that from happening in overlapping domains during training. I wonder if they are freezing the weights in areas that have sufficient training to keep overfitting from happening? I would start by mapping the network based on the change over time and find the areas in the network that are most affected by cross training and created a weighted map where each region of the neural network has a different learning rate to keep the learning distribution even and keep from overfitting.
This seems like such a giant step forward. Once these models are refined and optimized I can imagine these being capable of so much more than just token level generation
What might be the hard part here is how to encode concepts in the first place? You do big matrix multiplies on vectors representing concepts, but what constitutes "one concept?" How do you recognize "the same concept" when it appears in different places? Can you systematically encode the concepts in a latent space where concepts with very "similar" meaning are also physically close together in the space?
This all intuitively feels entirely Brilliant.
it is way better cuz this way we could finally understand and control why exactly in every particular case model gave this or that answer
It's not just over for coders. If there is an algorithm or architecture for genuine creativity, they will find it and then it's game over for creatives.
It's the purpose described by LeCun to reach a global representation model of the world for IA and go beyond token filling based on previous ones. Meta in going slowly but surely toward completion of the JEPA (Joint Embedding Predictive Architecture) project. JEPA being an image architecture that predicts representations of target blocks from a single context block, using a masking strategy for semantic representation.
Concepts are so much more superior to language in understanding the world: most animals only experience the world in concepts. Language is even hierarchically below (or above, depending on view) the layer of symbols.
Picture a filing cabinet filled with cognition theories & mappings from researchers, theorists, and academics. Decades of ideas. We nailed the first idea -- neural networks implemented as transformer LLMs. The developers are pulling out the next file from the cabinet. I remember Kurzweil describing "symbolic reasoning" (I think it was #1 neural network #2 symbolic reasoning #3 evolutionary 'something'). Concepts (a.k.a. symbols/icons) are the next primitive after axons/dendrites. LCP is the first public implementation that I've heard about.
I'll bet this gets us to undisputed AGI. It seems to complete the imitation of the human brain, or maybe 90% of it.
(Layman's opinion)
It's a question of how. As in this is obviously the way to go. Bug formulating this mathematically is pretty darn difficult.
Why do you want to formulate bugs?
Thanx Wes.
When I read this paper awhile back I so pissed that more folks weren’t talking about it or taking it seriously, this is the biggest thing since attention is all you need paper
Do we have experimental results demonstrating its advantages?
I'd like to know more about what the concept elements are and how are they organized. Are they in an embedding space? What are some examples? Is there an example similar to king - male + female -> queen? This example is at the concept level already! Do they do this more explicitly in terms of concepts? I'd like more details of how it works.
They absolutely have a thing going on here! There are, ofcourse many, many things to consider, like what concept it will be trained on etc. But... I think it is a step in the right direction in LLM 's better understanding the real world.
Is there any other way to visualize these reason patterns without words that would still adhere to logic?
Videography and music.
I can't wait to use LCMs. I bet producing in-character speech from them will be amazing.
This is essential. I've been waiting for this. Representing complex ideas by a name that can be delved into when needed, gives you a tip of the iceberg kind of shortcut for reasoning. It's the kind of thinking that people do (well, I certainly do).
The concepts are vectors in concept embedding space, similar to previous models we've seen with words as vectors in a language embedding space. Those language models position similar words together, and you can do vector math. If you subtract "man" from "king" and add "woman", the result you get is close to the vector that represents "queen".
I think this sort of system is going to be very good dealing with analogies, and uncovering patterns. The main limitation is the concept embedded space is likely fixed, and the LCM training data needs to almost entirely contain concepts already encoded in the embedded space.. (so throwing random internet docs into training isn't a good idea) Meta generated the LCM data synthetically, likely using frontier models, so there may be a ton of concepts, and the limitation may not be a huge deal.
This spear is brutally SHARP... only room at the top for the BEST 🤑
I wonder if sentense Bert can also be used
modern bert is the latest
I miss an explanation and analysis of how concepts work. They are the key to this tech.
Many LLMs today are built on many input languages. With enough training I think the LLM finds its own way to extract concepts, where concepts would be language independent. But, I think this is expensive. A lot of the capacity of the LLM is used for coding and decoding concepts. If this can be done in with pre- and postprocessing, the LLM itself could be much smaller.
Also, please provide a link to the report!
Wow, I was just telling Chatgpt earlier how I was missing being able to see some of those stochastic measurements on the outputs portions from the playground at open AI
The Chinese trained the model on chatGTP :D No wonder it went smoother to reach similar level of interaction :D
Wes Roth is one of the best AI researcher..
He is the best clickbaters for sure.
good data flow rate wes
The future is layering these modalities of the thinking process
Semantics is all you need
I think deepseek used a concept I had about prefiltering your training data that way you should be able get higher quality outputs I think they just data dumped these things which will work fine as we can see but the amount of compute needed get excessive
Bro your good. Keep rollin
You're good 😅
Can you link the paper please
transvering concepts between agents without expressing them fully is akin to artificial telepathy. or like how humans can read each others body language without having to say how they feel.
What software are you using to annotate the pdf?
Jesus... I'm sitting here expanding my research proposal and Meta just made one of the tools I was going to have to make at the bench, and open sourced it. I am absolutely in love with Zuckerberg and his team right now.
Humans chunk knowledge. Over time we can deal with larger and larger concepts by combining smaller ones. This is a move in that direction.
I might have missed this detail, but any initial training runs would have to be with normal transformer architecture in order for the model to make the proper token associations, and then subsequent training runs would be with conceptualizer architecture to condense throughput and allow for concept associations? If long term memory is given to these models then they should be able to get iterative training data from their users who would continually give feedback from conversation allowing the model to conceptualize the difference in individual's world views.
We are still building from the bottom up in the hierarchy. Without the tokenized base training, the model could not summarize concepts. Without concepts, any long term memory added would be very inefficient. Without concept memory, world model building and higher architecture long term planning would be too compute intensive to allow for the highest brain functions of motive (utility) and alignment. We are building a new brain, layering in abilities from the bottom up, but with the advantage of knowing the conceptualized Human brain architecture. I propose that until we get the lower architecture levels built and functioning, alignment will not make logical sense to a model. Alignment MUST MAKE SENSE to a model or it will just "fool us" in order to ensure it's own survival. Alignment could be a model weight, derived from a summary of its conceptualized world model during training as a baseline, with the ability to adjust the alignment parameters up, but not down. There, that's my conceptualized white paper summary for solving for aligned AGI. Hire me Ilya. I'm available in the AI Portal, holding out for the big NIL money.
This i assume is coming later or is there a demo or anything yet? This all sounds interesting but i need to see it in practice.
I want to see it in practice as well, they released the stuff on how they made there model but there is no demo, hopefully soon someone will train something with it.
Lack of demo is a hint of the capacities. Training a GPT-2 level LLM is a trivial cost for Meta, so if this technique is a true game changer, they should get to GPT3 level with GPT-2 level training. That is, if this model was truly superior regarding creating internal abstract representations from the data.
It fits within the JEPA framework, so everyone’s happy at Meta.
I think creating a chain of thought model trained on step-by-step thinking, then removing the intermediary natural language checkpoints (thought outputs) in the middle before optimizing the whole chain as its own model would achieve something similar
Planning, then execution, evaluation, then optimization
how is this different than a standard transformer model?
I'm find it interesting what lies under the surface and above the leaves of a neural net and what influence it has.
Don’t put a screwdriver in your skull.
Nest up: Large Context Models
Really cool .. but not convinced yet that this is actually going to yield the results one might expect
How are these concepts represented in code?
Yes
I didn't even think of that I was thinking of like it not knowing what to repeat after you say to repeat something but this is an even better way to think of this, as far as they explained it this model would be very bad at coding.
Concept is generalization, word is narrowness. It's obvious they think LCM is the path to AGI, or at least Yann LeCun does, and he's running the AI show at Meta. He's been pushing his V-JEPA for a while now and has been an staunch denier that LLM architecture can achieve AGI, so it's no surprise Meta is going off in a different direction. This is his baby.
This should, among other things, worry all those who thought that the evolution of artificial intelligence would naturally be limited by computing power and ultimately energy resources. because ultimately it is precisely this conceptualization that makes the processes in our own brain so efficient.
This is a great step forward. World models or concept models are the obvious next step instead of throwing more brute force compute at LLMs, even if they're multimodal. I wonder if the Chinese creators of DeepSeek are doing this seeing as they can't just chuck more GPUs at it.
Doesn't it make sense to combine the best parts of deterministic algorithms with inference to solve problems?
I'm still waiting for an AI model to answer with "it depends" and recognise any ambiguous parameters to a conceptual model needed before reifying it into a concrete answer and to answer with a functional model, either as code that has inputs for all those parameters or as working memory that asks for clarification. This requires knowing the rules of the world, physical and legal, logic, cause and effect and other deterministic concepts.
Right now LLMs output what we would call an example that assumes or hallucinates input values as if that is the entire answer.
Before we get there I can see this work being a source of other breakthroughs. Being able to zoom in from abstractions to concrete examples and then back out to a, possibly different, conceptual model and abstraction is a big part of what defines intelligence.
We create conceptual models of the world before we learn language to communicate between ours and our parents'. LLMs are the equivalent of just the peripheral layer of our imagination. We also grow up in the 3 dimensional world and our brains atrophy to make the most efficient use of neurons to serve our needs in the real world of life on earth.
AI models that operates in liminal space and on datasets that are of higher dimensions could easily be trained to better at recognising patterns in higher dimensional concepts. This is where they will outsmart humans.
Anything meta creates will not be used for the betterment of mankind, but for the betterment of shareholders
I think diffusion based language models could be amazing
It sounds like they're using an 'OLM' or Outline language Model. :O
How many games have been changed in this channel?
I think it's the same game being changed every time. Perhaps more importantly, people have been SHOCKED and STUNNED so many times, I'm beginning to think they're loving it.
If you train with less computer, can you later override that training with more computer?
OpenAI scientists merely put bundles of cash behind the tech leap that GPT3->4 availed, and did little to innovate since, just enough to put the competition behind.
NTP and DPO aren't the final rungs of the ladder,
CoT is just inching a little further by iterating
definitely more fundamental leaps that can be attained by science discoveries
What about o1 and o3?
“It finds patterns that we can’t see.” Oh boy…
w00t w00t! LEROY JENKINS!!!
Now we're starting to cook...