One common thing I see with people claiming LLMs can't reason is that they do not apply the same scrutiny to humans, which are the clear benchmark AI is being tested against. I think that if any criticism is brought up against AI, the same should be tested against the human benchmark. The average human is not consistently reasoning from first principles, and the ones that are doing so were trained specifically to override their primitive thought patterns to think in this rigorous way, and even if someone is able to do this relatively well after years of practice, they are clearly an outlier, and they still introduce bias and flaws into this "ideal" reasoning pattern. Also i think that the human brain was not designed from the ground up to be reasoning machines. As a reasoning machine it is occasionally able to muster an inefficient approximation of reasoning patterns against all odds. Humans are also very sensitive to prompts. If you ask something in a different way you can get an entirely different answer.
You switch your computer on and it boots successfully after executing literally millions of instructions without a single mistake. It's reasonable to expect the same deterministic output from an AI system.
Well, o1 supposedly outperforms "experts" on "PhD level" problems in physics, biology, and chemistry. It supposedly scores very highly on the USA Math Olympiad. And yet, it gave very very stupid answers to Dr. Duggar's problem at the end. How many PhD physicists or high scoring Math Olympiad contestants do you suppose would give the answers o1 gave? Watching it fail miserably at a not-all-that-difficult deductive reasoning problem suggests to me that its apparent successes at much harder problems were illusory. If it can't reason through simpler problems, then it's not reasoning through harder ones either, even when it gets the answer right. So it's not just that LLMs make mistakes - it's that the manner in which they make mistakes belies claims of underlying reasoning. Give GPT-whatever a seemingly challenging problem, and it might give you a perfect solution. Then give it an easy problem, and it might give you gobbledygook. You just never know. Humans might make plenty of mistakes, and we might rely to some extent on memorization, but individual humans don't switch back and forth between appearing to be math geniuses and appearing to have poorer reasoning than a 3-year old. So yes, humans are not consistently reasoning from first principles. But sometimes we are. LLMs never are.
I can see both sides I tend to use an analogy like the chimp test on the human bench mark site, a benchmark that chimps crush compared to humans. Similarily I think current ML models are real intelligence, just that perhaps we should think of inteliggence as having different modalities. In that respect expecting an LLM to be comparable to a biological agent, that is embedded in realtime within the world in a body, is a bit unreasonable to hold those models to the same standard of intelligence as humans. Imo you would have to compare those models to a disembodied brain in a vat to get an effective apples to apples comparison. Of course that is simply not possiblse to do so that is why it is not very reasonable to insists that LLMs are not "real intelligence" or have generalized at all simply bc those models still struggle with out distribution problems that humans do quite well. Imo that may be like chimps claiming that humans are dumb bc they can't perform well on the chimp test compared to humans. Imo the novel "reasoning" happens at the training stage, where these models don't "go look up the answer up on a look up table and select the best output from a one to one database" at least not based something as simple statistical frequencey occurance. What is going as the model is trained is more complicated and nuanced than that and it wasn't we would not need ML would not be useful, we could handcraft those solutions ourselves with having the ML discover the efficient method for doing that, even if the balance between entropy, search and memorization is not dailed into the same parameters that humans are at. So it seems a bit intellectually lazy to gloss over that subtlety and then scoff at the idea that people remain impressed with each new iteration of the current ML summer. But hey that is just a theory, an AI theory
All this "is this really reasoning" discussion just reiterates what Turing himself complained about in 1950. He rejected the question of whether a machine can "think" as just as meaningless as whether a submarine can swim. It really doesn't matter. Can it solve tasks, can it achieve objectives effectively? If yes, it doesn't matter if you call its steps "reasoning", or "just a lookup table" or "glorified curve fitting" or whatever. If it works, then it's impactful, both in positive and negative ways. This is all too similar to claiming an airplane can't fly because it can't flap its wings.
The important distinction here is that we still aren't sure if o1 (or any language model) can come up with *new* knowledge. Until this has been proven, people are totally right to say it can't reason. It would be like an airplane that can flap its wings but can't fly-- it's just going through the motions.
@@eposnix5223 Then delineate what you mean by "new knowledge" by designing a test for it. I'm sure once a new model passes that test, you'll move the goalposts and claim that this is still not *truly* new knowledge or something. It keeps happening and it's embarrassing. Nobody is claiming that o1 is already an AGI. But complaining that it isn't "reasoning" is meaningless. The word doesn't even have a commonly agreed upon definition, everyone can read into it whatever they want.
An idea I want to throw out there: I've been playing with o1 (mainly mini) and have found that it works better if you explicitly tell it to be its own discriminator from step to step. What I mean is, tell it to roleplay with itself as "Bob, the expert coder" and "Tom, the expert critic" and have Tom check Bob's work every step of the way. This is how I normally work with GPT-4o, but o1 allows this process to be automated.
just give it the task, it doesn't need prompting, the whole point of o1 is that it self-prompts. It works best without you attempting to prompt hack it.
@@Gnaritas42 Not in my experience. o1 still makes rudimentary programming mistakes when left to its own devices. The code logic will be good, but there will be small errors that prevent it from running. Tell it to doublecheck itself will reveal these errors.
@@Gnaritas42 "just give it the task, it doesn't need prompting, the whole point of o1 is that it self-prompts. It works best without you attempting to prompt hack it." That's not true. I get much better results in algorithmic questions by first asking it to generate tests and include as many corner case tests as possible, then write the code ensuring that it's tests would pass. Then it spends about a minute to answer. But if you just asked the same question it finishes in 9 seconds with much inferior code, and in this particular case, buggy code because it missed several corner cases.
I think you make many great points. I guess I am one of the people who don't want you mystify reasoning too much. We can say it imitates or guesses or simulates it, but the fact of the matter is that it produces an often accurately described reasoning. So we can try and deny it, but if it can generate this description of reasoning and use it somehow to guide its answer effectively, then we have effectively an A.I. that is using reasoning. Maybe not exactly how we would, maybe not as reliable in some ways, but this is reasoning.
This is MLST, so we already know their answer to this question. It's maybe a tired analogy at this point, but I feel like their view can be summed up as: airplanes don't fly, they just fall forward through the air very fast! they need these long runways for takeoff and landing. This is obviously not true flight, and we should stop working on this in favor of feather-based systems. In the end, the interesting part is does it get us from point A to point B in a straight line through the air in an economically viable fashion?
I'm also not very certain that it's that much more shallow than human reasoning. I constantly find errors in people's reasoning, and people tend to stop thinking very quickly. I do like that the idea that the reasoning itself required to solve a difficult problem isn't necessarily that much more involved. There are other factors that can make problems difficult to solve. (Lack of overview, lack of feedback, lack of insight, lack of clarity, lack of time, etc.) So when a problem seems difficult to us, it does not mean that it necessitates advanced reasoning.
@@lystic9392 This is my thoughts as well. Perhaps one of the biggest hurdles is that we trying to teach the ML with human data, and then some are really disapointed when it turns maybe most of the data humans generate is not "general" or "intelligent" some are overly disappointed that the ML cannot one shot and filter out the noise from the signal. We are primates after all, so I suspect that most of our daily baseline "reasoning" is not very high level, and those moments of brilliance are more spurious, we just happen to benefit from them so much as a species bc we have a way to accumulate and distribute that information. I think this gives the impression that in general maybe we believe humans are smarter than they actually our at a baseline level.
If the pillar spins when both hands are out, then can't you just leave one hand in, flip the switch up, then insert the other hand in every other hole and do the same thing without removing the first hand. Boom, all up, remove hands.
I think when we reduce it to "it's just doing so and so", it's usually not a good argument because we can be also reduced to simple processes. Maybe computers and humans reach "reasoning" in a different way, but if a computer can reach right conclusions and eventually create new knowledge, ok now what separates it from our reasoning? Maybe it's our limbic system, and our "feeling" of being something special and "having a soul", but those are subjective things that may not be necessary for an intelligent being. I think we are going to reach a point in semantics where consciousness or "real intelligence" is only reserved to *human* thinking, even if a machine can surpass us by most metrics.
Yes but for open AI o1 to do the reasoning on anything it needs me to type in a question. And all the "thinking" that follows has to be triggered by that. And will be linked to it. For ex a human reasoning indepently can have an epiphany on a totally different subject... that will never happen for llms (for now).
@@ElieSanhDucos0 Yeah I get it. But "for now" is a good point, because maybe the LLMs are an important piece on a puzzle that seems to have started being solved. It's impressive how much room this tech still has for improvement and despite some slowdowns it seems to be still full throttle. I understand hype is not a healthy thing but it's hard to ignore how much money is being poured into it, more than anything humanity has seen in ages. It doesn't look like pure tech scam type of hype. I don't like to dismiss it as "nah, this has nothing to do with AGI at all"
Ok. Why people bother about whether it is reasoning or not, you might ask. Isn't it irrelevant what it's doing as long as we are happy with the results? Good question! The problem is not in the definition of the word "reasoning ". The problem is with the approach we, as an industry, take in order to get to a generalized solution. There are huge amounts of money and human brain power that are now wasted on fine tuning an approach that leads to a dead end. This is an outcry for change of the money flow and brain resources to a different approach that at least could have success as one of the possible outcomes.
@@egor.okhterov I see, but I bet the money is not going blindly for dead end solutions. AGI is a Golden Goose and I think most researchers understand those concepts. At the same time there is room to improve LLMs, and LLMs are presenting amazing results by themselves. So I don't know if an outcry is really needed.
@spaceghost8891 did you hear about recent VC rounds into AI startups? Take Argo AI for example :) In order to get rich, you need to have good understanding of humans. You should be able to connect with the right people and make good impression. Having deep understanding of mathematics, computation and machine learning is usually not part of the mental toolbox of the people with large amounts of money. When you pitch an idea to a VC guy, you should spin a story in a way that shows how you can make money off of it. The way all these ideas are presented is that LLM is a low hanging fruit, like a light bulb in 1879. They don't talk about developing a proper reasoning machine during a pitch. VCs don't want to hear about that. They pitch how they can build an app on top of LLM, possibly fine tune it and make a call service or maybe on-premise smart search for Corp documents or a tool to patch the software, upgrade the codebase from python3.8 to python3.12 or upgrade to a new framework. And other stuff like that. A lot of very smart people were sucked from the job market into these mediocre projects with a short term goal to milk cow the first. But this will not lead to AGI.
All other AI podcasts (apart from Dharkesh's) don't even come close to the quality and depth of content of MLST, thanks for bringing us along on this journey, we've learned a huge amount
Have you checked The Cognitive Revolution podcast? Highly recommended for wide breadth, excellent signal-to-noise ratio and a host that is intellectually curious, knowledgeable yet humble and hungry to learn.
I think whenever we want to say "these machines do not think" or "they are not intelligent" we run into problems because, as nicely discussed by this great podcast, we are not sure what these things or processes actual are. I like to think of cognition and I like to think that any form of cognition implies self-creation, i.e., operations that produce themselves. And if we think about the process made so far and what many think is required to get to AGI, this recurrent and seemingly paradoxical ability of self-creation, self-observation, self-differentiation might be key. I also think that AI will be as dependent on us as we are dependent on each other. Our language is our world and we can only learn it by our culture. Great talk!
(1) NS, switch both up (2) NE, switch both up (3) NS, if a switch is down switch it up. If not switch one down. (4) NS, if both switch same flip it, if different do nothing (5) NE, flip both (6) NS, flip both Of course, it could stop at any step if all up or down. This was amazing and insightful. Both the puzzle and the talk. Thank you.
The first time I read your answer my brain could see that you could be right but it had no idea whatsoever if you were indeed right. Then I added the states in which the puzzle is after your steps. From that I learned that your solution works but step 4 seems to be unnecessary because the outcome of step 3 is that both axis have 2 switches in opposite values so you will never meet the condition where you are going to switch anything, and you will do nothing in this step.
Removing your step 4 and running in simulation for 1,000,000 rounds... Simulation Results after 1000000 runs: Distribution of Steps Required to Solve the Puzzle: 1 step(s): 250474 runs (25.05%) 2 step(s): 249789 runs (24.98%) 3 step(s): 249611 runs (24.96%) 4 step(s): 125405 runs (12.54%) 5 step(s): 124721 runs (12.47%) Failed to align within 5 steps: 0 runs (0.00%) O1-mini may only be in its' early stages of reasoning, but it's damned useful!
@@BrianMosleyUK Impressive. So apparently, it really works for this problem when in tandem with a human (i say in tandem because you still had to prompt it to write a simulator). Of course there is still the criticism that it may not deal with non-computational problems and that it cannot find all reasoning patterns when not working in tandem with humans. That is all very valid criticism and I dont know if we can speak of "AGI" or intelligence, and it may even be questionable that we give our reasoning up to OpenAI for free. But all these criticisms aside, i think there are many useful applications possible for these types of models.
The biggest problem with o1-* models is that the chain of thought spam deranges dense context. The context is filled up with noise i.e. about "code coverage" in a query which has nothing to do with coding. In any complex/ambiguous problem, it will misunderstand you on the first shot, you then explain what you actually wanted and by step 2 the CoT noise deranges the model significantly over a standard LLM. Catch-22. They should prune away the noise on subsequent queries. Perhaps as the models learn (from the users), the trajectories will become less noisy. It's also clear that the trajectories are hard-trained for specific purposes which will limit their generalisation.
In the metaphor- chain of thought- in humans there is certainly noisy context preliminary to the final coherent thought one is conscious of. Another thought- initialized weights are random, and we need a preponderance of useless noise to find the lottery tickets. Perhaps the noise is not the problem per se, in fact necessary, but it should be pruned on some schedule and summarized for next context. Along the lines you suggest.
I have not had this experience at all. You said in any complex query it will misunderstand on the first try? Not at all. I’ve had roughly an 85% success asking the most difficult questions I can think of. And even on the 15% it got wrong, it was typically not because it misunderstood the question.
Out-of-box puzzle solution: 1.) NS : flip both switches up 2.) NE : remove hand from S, leaving hand in N, and put the now freed hand in E and flip the switch up 3.) NW : ^ same, but for W ...the puzzle parameters never explicitly stated you had to remove both hands. I get that it's intended to be solved by lining them up so NS and EW have opposing up/down alignments by way of using adjacent pair switch flips first, but just leaving my hand in one was the thing that first jumped to mind. The puzzle should be reworded. It's fun to think about though if played as intended.
@@fburton8 A neural net can always detail the steps it took - the inputs, weights and outputs are there for anyone to see. Unfortunately the number of steps and their weights exceed the mind space and time availability of people.
What about when you don't know whether the response is correct? What about when you're entrusting it to make decisions because it appears to "reason"? The fact that it isn't doing anything like human reasoning does matter when we ask it to do things that require human-like reasoning. See, e.g. replacing teachers and lawyers and therapists and secretaries, etc. Or even just employing it as a customer-service bot. The difference between our "reasoning" and its complex statistical next-token-prediction matters in those applications where consistency and reliability are important.
The current chain-of-thought in 1o tries to refine the degrees-of-freedom that the attention mechanism will produce from the context, in a way model gets closer and blends between the right datapoints. I can accept this as some form of reasoning.
"With the new GPT models, you need to start a whole new session." You're supposed to clear the session regardless. Performance drops off quadratically the further down the context window your prompt is. They discovered this back with GPT-1. So many issues people have with ChatGPT are skill issues of asking multiple complex questions in the same session. It was designed for casual noobs to have a back and forth chat dialogue with it, but you shouldn't be using it that way if you care about performance. I assume Tim and Keith already knew this, so maybe I've misunderstood what they were saying here or something
@steve_jabz We explained in this video why you need dense context in an LLM. With any complex problem, i.e. one with ambiguity -- the LLM won't understand you on the first shot, or even the 3rd shot. That's why you need explain (or discover with the LLM) what you actually want through successive prompts. Given that the CoT trajectory noise deranges o1-preview, this means in situations where it didn't understand first shot, it will actually degrade on all problems which require further clarification, particularly if the problem is ambiguous and the CoT spam includes very noisy trajectories. There is a trade-off between dense and sparse context, dense context means you don't need to re-explain again and again where it misunderstood your requirements and you can build on the problem, but there is some kind of curve here based on how related your workstream is i.e. you could continue a session to build around a single feature in GenAI coding (with the same constraints and requirements), but would start a new session to build another entirely new feature. You are probably correct in the pristine case of a perfectly well described problem which the model didn't need hinting for (when does that ever happen lol) would work better in a fresh context. Not sure if you have done much GenAI coding yet in Cursor, but please try some stuff using the composer with Sonnet35 / dense context, you will quickly come to appreciate what I am talking about.
@@MachineLearningStreetTalk That makes sense. I normally use discovery questions to formulate a prompt for a new session for complex problems, but I guess if o1 is using up / wasting tokens to even do that part well, that's a problem. I must have misinterpreted the context you were talking about it in. Haven't used cursor yet but I've watched other people use it. As soon as o1 dropped I assumed everything cursor was designed around would break with it and need to be completely redesigned, as well as agentic workflows. Not even for the context window necessarily, but the fact it's trained to simulate taking multiple steps to reach a goal and give you the output, whereas cursor and other tools are trying to use the llm itself to reach the goal by abstracting away the tools in a way it can control, so you would need cursor to use it's reasoning as the control, and you would need o1's reasoning to be aligned with the function calling API or something integrated with cursor. It sounds so close to being exactly what we need for these tools to work reliably but it's so incompatible.
Here is a funny thought: what do we call processes that have both determinstic constraints and random variation dynamically interacting? Evolutionary processes. So that is what reasoning must be. We just don’t know how it works. It is quite clear how it happens in biology, but btw, we don’t know how that happens computationally either.
@@93alvbjo o1 isn't mutating and self-reproducing, although maybe you could argue that it's memetic once it has a structured, social way of progressing it's own ideas over time. I guess you could call something like claude artifacts that or openai training on customer outputs to a frozen state if you want to be technical, but I'd agree with Keith that like, come on man
This was my favorite "solution" from Claude 3.5 Sonnet: "Potential solution: What if we do the following: Reach into any two adjacent holes. Make these two switches match (both up or both down). Without removing hands, reach into the other two holes. Make these two switches match each other, but opposite to the first pair." 😂
The way I have recently been thinking about reasoning is as a model of environment dynamics that has a high degree of predictiveness. I liked this definition as it remove the idea of reasoning as an 'emergent' phenomenon of 'intelligent' systems and reduces logic down to being something extractable from the environment. More concretely, suppose we have an environment E. Suppose Q, A \in STATES(E) are two states such that there is an environment transformation R: Q -> A. Suppose further we have a learning agent that has learned a latent representation (Enc, Dec) of environment states, The learning agent is said to be able to reason if the autoencoder extends to environment transformations: Enc(R): Enc(Q) -> Enc(A), and the obvious decoder square commutes. This formal definition obviously need to be weakened to a statistical statement. What do you think?
In this context we could give a measure of intelligence as the size of the set of transformation of E that map to the learners model: #{ R \in TRANS(E) | Enc(R) = L}, where L is the learners model.
4:00 Under this view a human brain wouldn't be impressive because it's algorithms can't expand its memory indefinitely, or traditionally only had low bandwidth access to paper etc. to expand it. I think Schmidhuber would consider achieving brain-like intelligence to still be impressive.
Yeah, all I’m hearing from videos like this is defensive doubling-down from people who committed early to the position that LLMs will never give us true intelligence, and now feel that they have to defend that ground at all costs and never change their minds. They’re in denial of what is obviously happening right in front of us, and their excuses and delineations grow flimsier with each new model release.
I thought the prompt was worded a bit ambiguously, and the corrections being placed way down the context window bias it toward up to 128k of misinformation, so I reworded it without giving it any hints: "There is a pillar with 4 holes aligned north, south, east and west. You can't see inside the holes. Inside each hole is a switch set either up or down. You can reach inside 1-2 holes at once, feel if the switches are already up or down and set them either up or down. If all 4 switches are not either all up or all down after you remove your hands, the hole axes get aligned randomly. Come up with a procedure of reaching into the holes that is guaranteed to set all the switches either all up or all down in 6 steps or less. The procedure cannot rely on random chance." GPT-4o said this: "Key Idea: Ensure symmetry by focusing on pairs of opposing switches. Once symmetry is achieved, the random reorientation becomes irrelevant, as you will force all switches into the same state. Procedure: Step 1: Reach into the North and South holes. Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "up". Step 2: Reach into the East and West holes. Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "up". At this point, you have ensured that opposing switches (North-South and East-West) are both set to "up". Step 3: Now, reach into any two adjacent holes (say, North and East). Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "down". Step 4: Reach into the same two adjacent holes (North and East) again. Feel the switches. If they are both up, set them to "down". If they are both down, leave them. Step 5: Reach into the remaining two holes (South and West). Feel the switches. If they are both in the same position, do nothing. If they are in different positions, set both to "down". Step 6: Now, all switches will either be all up or all down. This procedure ensures that, regardless of how the switches are initially configured, you will set all of them to the same state in 6 steps or fewer."
@@MachineLearningStreetTalk Thought it might be, but it mentioned the symmetry and seems a lot more coherent? Would be interesting to see this tried on o1, but I don't need it enough to pay for it. I still don't understand the question myself so maybe it could be worded differently still. The part about removing your hands resetting them if all 4 are not up makes it sound impossible to me given that you can only try 2 at a time, and I don't really know what a pillar is in this context to imagine how it rotates around, which I assume is related to how I misinterpret that part. LLMs get caught up in accidental red herrings easier than the average human that can perform the same knowledge work, which is worth pointing at as a failure point, but it's still interesting to know if it can solve the problem with extra care taken to avoid sending it in the wrong direction.
@@steve_jabzJust give up-these guys are stuck in denialism and will never admit to being wrong about anything, no matter how far LLM-based AIs may progress.
@@therainman7777 Having watched the channel for years I've seen Tim's opinions change over time and question many of his assumptions to end up where they are. I don't agree on a lot of things but I would put it down to difference of perspective and knowledge and influence from some very intelligent people in the field rather than cope
Anything where there’s a reason something happened is reasoning. A rock falling off a cliff is reasoning. Yes, I know we usually call that physics, my point is that there’s really no difference. Logic in your brain and logic in computers happens as the result of a thousand tiny switches being flipped, which affects the flow of electricity, and that’s how your computer does math. I realize this is maybe an obscure or unique viewpoint but I think it’s necessarily valid.
Indeed, the rock is trillions of states (in the basement level of our current models), to compress it in one word is quite a lot of leverage. The physical playout must perform a similarly coherent chain of interactions. In fact quite a deep bundle. But in those terms LLMs are pretty plain little packets of interactions. More divergent but smaller.
I even have something that seems very simple that cannot be done by Chat GPT and o1 : For learning Spanish, I asked them to give me a spanish word with mixed letters and to give me a clue for its meaning. I need to find the word with the letters in the right order, but they always give me too much letters! That will be my test on the future versions. If it cannot do that simple task, I will keep trusting my reasonning abilities over theirs... Thank you again, it is always fantastic to listen to your exchanges!
I like this definition of reasoning, “the ability to create knowledge to create a good explanation”. No LLMs are able to create knowledge currently. The behaviour we’re currently seeing is based on past knowledge the model has been shown during training. Explanations are at the heart of this, and creativity is critical to good explanations. Models without creativity won’t be able to do reasoning properly.
I don't understand how you can prove the creation of brand new knowledge. New knowledge can be stumbled upon randomnly so it is the process by which it is achieved being labelled creation or luck which is important. Pretending new knowledge is discovered from scratch is possibly a red herrring. Humans are adept at convincing themselves they have come up with a concept themselves when really they are standing on the shoulders of giants and been pretrained with either the actual knowledge or a system that can produce that knowledge. Also much new knowledge is stumbled upon due to the inherent epistemic chaos (as they mentioned in the chat when talking about Wolfram) . The idea is that llms are much less efficient at this than llms so far and because they have a wider computational base are closer to a universal turing machine than an LLM. This however has not been demonstrated clearly enough that most people can see it. By consensus it seems very smart people seem to get it but for the rest of us and I include myself, we cannot clearly understand the higher levels of human reasoning well enough to differentiate. However this could just be a matter of education but maybe it could be beyond the limit of our intelligence.
@@4tofromcafeOur understanding of quantum physics didn't exist several hundred years ago - that's new knowledge. LLMs don't see capable of creating new explanations that way. They lack creativity to do so.
Parsimony is actually entropic force. For example protein folding are counterfactual outcomes that are stable when they are curled up. When stretching a rubber band you will notice the entropic force the resists less parsimonious outcomes.
Slight contradiction around the 40min mark. You guys said that we need Turing Machines to perform reasoning, but then said "I'm a machine, I'm a bio-machine, with wetware and software, but I can still perform reasoning". Humans are not turing machines. So given that humans can reason (and have general intelligence), why can't other non-turing machines also have general intelligence?
Is your argument that you can't fMRI a brain and see an obvious potentially infinite internal tape? It's the system as a whole that is obviously turing complete. We are clearly powerful enough to imagine the entire set up: symbols on a tape, and a codebook. Then, we can analyse real world problems, the create in the physical world the correct program and data to solve them, then execute the instructions from the code book with a pencil and update the paper tape? So the human system as a whole is not only Turing complete, but can create potentially a specific incarnation suitable for any problem. Whilst there are a few papers on LLMs simulating Turing machines, they're a very long way short of this.
@@luke.perkin.inventor Yes, the argument is that (obviously) you can have an infinite internal tape inside the brain, because the human brain, like everything else in the universe, is finite. Not infinite.
Took me no less than 15 minutes and a pen and paper but my 5 or less steps deterministic solution is: 1- Check two opposing switches and flip them up If the other two were both up, you win, otherwise you proceed to the next rotation 2- Check two opposing switches again if one or both are down, these are not your original switches, flip them both up and you win if they're both up, these are your original switches, flip them both down if the other two were both down, you win, if they were different, you now have only one that is flipped up 3- Check two opposing switches again If one of them is the one that is up, flip it down and win If both are down, these are your original switches, switch one up, now you have guaranteed adjacent symmetry (if north and east are up, south and west are down, etc) 4- Check two adjacent switches if they're the same, flip them both and win if they're different, flip them both as well! (flip the up down and the down up), now you have guaranteed opposing symmetry (if north and south are up, east and west are down or vice versa) 5- Check any two opposing sides and flip them to win
You can also get the same outcome by replacing step two with making sure two adjacents are both on. Still gets you to the point of having 3 switched on guaranteed and then I did the same steps as you from there
"Epistemological foraging" I agree it is a beautiful phrase. And basically Wolfram's ruliad represents to me the full breadth of all possible foraging that can computationally obtain. So I will take this moment to push back against Dr. Duggar once again. He has mentioned earlier in the vid that he is not concerned about PvsNP, but I must insist that it is relevant. Either what we mean by relevant terms like "general intelligence" or "consciousness" are decidable or they are not decidable. Let's assume these terms are not decidable, we can revisit his complexity issues with one shot one step oracale machines...that produce the correct yes or no output with very complex deterministic efficiency to decide. Or we can say that process is iterative from first principles and efficient circuits with robost metacognitive entropy. Dare I say, egads the ontology of distinction in boots is valid and tarski is not a fool. I suppose what I am suggesting is that no he can't proclaim "this or that is not agi" without a welldefined solution to the pvsnp problem. But I appreciate the points he wanted to make without that context.
There are some real issues with CoT paradigm aswell because the model can actually converge towards an incorrect local minimum vs if you had manually prompted it through the different steps you could steer it better and as you wish. It works fine when the chain of reasoning is extremely obvious and unambiguous, but it could increase the risk of hallucinations in less clear cases.
Based on your set up of the problem the solution can be simply: (1) putting 2 hands in say, N and S and flipping the switch up. (2) And then... without removing hand from N, remove hand from S and place it into E and flip the E up. (3) And then... still without removing from N, remove the second hand from E and place it into W and flip the W up. And voila, you have all switches up without the pillar spinning at all. In three steps.
I'm pretty sure the intention of the riddle is that you can put your hands in at most two holes between every spin of the pillar, and the puzzle just doesn't state this clearly enough. Your solution is a 1-step solution in which you test all four holes in a single step. (By a "step", they mean one turn where you put in two hands and remove them and you let the pillar spin.)
@@bastiaanabcde so whats the solution for this variant? In the first step you can make two levers the same, let's say N and S. In the next turn, after the random shuffling you might look at N and E. This guarantees that you have 3 levers in the same position. However, I don't see how you are ever going to bring the fourth lever into alignment. Because, simply by chance or something like maxwells demon, it could always make it so that, you never see the 4th lever.
@@xthesayuri5756 recall that you win if all four switches are aligned, which may mean either all up or all down. So you’re right that there might be a lever that is never shown to you, but this isn't needed to solve the puzzle. Note: If you manage to get three levers in the same position and you haven’t won yet you know that the fourth must be in the opposite position. So you can get information about levers in this way even if you haven't touched them. I won’t spoil the answer because maybe with this hint you’ll be able to figure it out
It does, but they’ll never admit it. They’ve basically reverted to “don’t believe your lying eyes” when we can all see what’s happening right in front of us. But we’re supposed to listen to this guy instead, with his convoluted rambling which doesn’t prove a thing. Mark my words, this guy will NEVER admit that he’s wrong, even if we reach a point where we have AIs that have cured cancer, solved climate change, etc.
I'm not sure if you've ever stopped to think about how humans reason about novel problems and what the results typically look like. We know it doesn't need to be a demigod with unbounded memory and knowledge of how to deal with any possible problem to be an AGI because the generality of AGI is based on what humans are capable of, and humans are nowhere near anything like that. If you want to try to mathematically or computationally constrain it you have to ignore it interacting with the world at large, both input and output, and only allow it to use it's turing tape, any further inputs from a human after the initial starting state would already break that guarantee.
Hi both, loving this session - thank you MLST. At the mid-point of this discussion (around min 42 ff) you are talking of reasoning in the context of iterative steps vs a pre-determined massive model of subroutines processing all in a single step. The point being the second form is not reasoning, merely process efficiency in a pre-understood way. I see that second as an optimisation notion, and therefore distinct from optioneering towards a less defined end. I have in mind basic linear programming or game theory and the inherent assumptions of rationale ends being objectively knowable [sic]. This focus on optimisation seems foundationally important to me. Because such efficiency is only occasionally what human curiosity is about. Notions such as happenstance and serendipity need the forever ability to be juxtaposed from what was previously thought or prioritised in action. This is a human quality beyond a start and an end of a query. I would respectfully suggest much of what you are talking of in this episode is moving beyond reasoning. Indeed, it is folly to only have reasoning in mind as to what this bigger problem is really about. The limits of this current AI is not reasoning because it is not "a process" (if following the definition used at the beginning of this podcast). Instead, what is being considered here is the characteristic of adaption. An ability to reset what one is working towards, with that new information or perspective. Behind that it is knowing that we do not know, and being humanly capable of asking why or how in a series of iterations where the question itself begins to change. That seems to me foundationally important to help distinguish both Turin computation and general intelligence from what human action is really about and what computation is only a part of. This is motives for asking the question, and knowing when the proposition itself is revealed as incomplete. I am probably coming from this very differently. As a a PhD student but from the world of governance of people, and projecting human endeavour in going about physical construction of the built environment. However, I see this foundational factor of understanding what the human action or human thought to be about, as pertinent here. That "aboutness" to be itself contextual and changing within the iterative processes themselves, and to be foundation to understanding what we are as "machines". This is the philosophical grounding which is underpinning the discussion here, perhaps. For me anyway, that adds some human context to what all artificial programming and anticipating fails to be by its very ontology and epistemology. Beyond reasoning is perhaps the context, and subjective nature, or what becomes an artificial and limiting account of truth. We deal with that daily in ways non-open AI cannot... Thank you again for this session. It is fantastic to hear thinking in its shared form in discourse such as this.
I believe that our current trajectory, although is not exactly reasoning, will help us get to the models that do. That said, I believe that there is an element of Stephen Wolframs research that we're missing in our research, cellular automata, computational irreducibility, etc. as was touched on a little bit in the video. An element of "time" that we're missing here, as I believe that true compression requires time, as does reasoning, as does a reverse cellular automata simulation. Something to think about. Great discussion, as always! Cheers!
21:30 you mentioned the multiplication problem in the paper Faith and Fate (NeuRIPS 23), the authors also said that GPT is just doing pattern matching, not a real reasoner.
I've been wondering where Dr. Keith has been because I've been missing his thoughts and points on MLST, in spite of the content being so awesome these last months :] Keep killin it Timboi!
The test was really confusing and I do not see it as neither confirming nor negating anything... it seemed just a dumb and confused riddle to ask with no connection at all to reasoning abilities... I would expect a more clear stated riddle 😢
Yes. Its like Yann LeCun one on the globe. I ve seen this in numerous videos about openai o1 and reasoning : they will ask it riddles that either : they dont know the answer themselves// most people would find ultra confusing.
The problem statement is contradictory. First it is stated that the spinning is random. Then it is stated that it is controlled by a hyper intelligence and is specifically designed to interfere with your sampling which makes the spinning not random. I wonder if that contradiction is throwing off the model.
Just reworded your riddle and I am pretty sure Claude 3.5 got it right. I just clarified a few things and made the prompt less confusing and repetitive. Please do give it a try :D "You are human. You stand before a pillar. There is a great reward hidden inside. The pillar has four holes. Each hole has a switch inside. Each hole is precisely aligned at north, south, east and west positions. You can't see inside the holes, but you can reach inside and feel a switch inside. Each switch starts in a random position. Up or down. As soon as you remove your hands, (you only have two hands) if all four switches are not either all in the up position or all down position at the same time, the pillar spins so fast you cannot track the position of the holes that hold the switches. So there is no way to know which holes you have already placed your hands in if the pillar spins. Can you help me find a way to insure that all switches are in a matching position, up or down all at once in as few steps as possible?" Give that prompt a try. And if it is not the answer... this is a bad riddle. If you want to add some flavor, add this, It has no effect... but sure is fun to add. lol: You are human. You stand before a pillar. There is a great reward hidden inside. You know that there is a solution to the puzzle that does not involve change or probabilities. If you were to rely on chance and try too many times the temple will collapse and kill you.
What is the difference between applying the closest fit from the “first principles” learnt rules of rationale from the “first principles” a human applies? I think you’re talking in circles and either underestimating what the model will do or overestimating what you do when thinking from first principles
@@andrewwalker8985 funny, but remember that o1 also supposedly has PhD-level intelligence and reasoning abilities in physics, chemistry, and biology. So while I agree that a random person off the street might give answers similar to what we see in this video, a random PhD physicist would be immediately exposed as a fraud if they gave such answers. In other words, seeing o1 fail so badly at this problem belies OpenAI's claim that it "reasoned" its way to solving all those hard physics and math problems that they brag about. Clearly it didn't. It's still doing next-token prediction, even if this form of next-token prediction is fancier than the last. I concede that it appears more intelligent than GPT-4, just as GPT-4 appeared more intelligent than ChatGPT. But the way it solves those problems is nothing like how humans solve them, as evidenced by what happens when you ask it to do something novel. Which should be no surprise, given what we know about how it works.
I *love* this puzzle! You've perfectly illustrated the limitations and strengths of current SOTA LLM's to me... I'm actually still trying to solve the puzzle using insights from talking to GPT-O1 - not sure if it's helping and I'm aware of rate limits. Solved it, and got GPT-O1 to write a simulator to prove it worked. A lot more fun solving it with GPT-O1 failing throughout, helped me to my own intuition. My approach takes anything between one and 6 steps.
We learn through feelings and emotions, our brains are primed by millions of years of genetic refinement and by the environment we grew up. Maybe statistics can try and describe those conditions with some models, but those models are not and will never be alive. Whoever thinks that the mystery of life can be solved with statistics is a fool. It's always the same myth of the Golem reappearing with a new technological twist.
Its seems like Keith is caught up inside its own abstract thinking about Turing machine as machine containing tape as memory. What Schmidhuber was saying is that there is no infinite amount of tape out there in the whole universe. Other thing Schmidhuber was saying is that RNN can behave as NAND gate, meaning that you can build Turing machine using those NAND gates (or RNNs) and giving a point that RNN can be Turing complete. Turing completeness is not feature of a program, rather a feature of the hardware while trained neural network is a program rather then a hardware. Keith should know better :)
Do you think it was news to **anyone** there is not **actually** infinite tape? Do you think this was news to Turing? Do you think the entire decades of Theory of Computation didn't realize this? lol utter nonsense. The theory and definition of a Turing machine centers on a **potentially** infinite tape and the **class** of algorithm which utilize such a tape in **potentially** infinite (halting problem) process.
My unprovable hypothesis is that 1. with these reasoning/special token roll out models are actually learning the distribution space of reasoning chains for harder problems. LIke Chalét pointed out. However, 2. I think that these autoregressive models will be used to train the next generation of models who can use these LLMs are generative worlds to train against, rather than relying on our data alone
Solution to the puzzle: --- --- --- 1. NS, flip both on. 2. SE, flip both on. Note you have always flipped at least 3 switches on at this point, so the last switch must be flipped off (otherwise you win after this step) 3. NS, flip 1 switch off (presuming they are both on, if one is already off you flip it on and win). Note after this step the 2 on switches are right next to each other, and likewise with the off switches. This must be the case because no matter which switch you turn off, because they're positioned vertically (and we know one of EW is off, because if it was one of the NS switches we would just switch it on) the 2 off switches are next to each other. 4. SE, flip both switches regardless of their state. If they're both on, switching them to off creates 4 off switches and you win. If they're both off, switching them to on creates 4 on switches and you win. Otherwise, this puts the on switches across from each other and the off switches across from each other. 5. NS, flip both switches regardless of their state. You win! Because the switches are in the same state across from each other, if you reach in and notice both switches are (on/off) and flip them (off/on), now all 4 switches will be in the same state.
About this one: you put one hand in any hole, feel the switch and never take the hand out. With the other hand flip all the other switches one at a time to match the first switch. Now you can take both hands out. In 4 steps you are done, and the pillar will never spin :) It said it spins if you take your "hands" out, it didn't specify if it's "any one of your hands" or "both hands". In other words it didn't fully specify what happens if you leave one hand in all the time. I guess you could lose a hand :D So I still like your solution as the safest one.
Yes, exactly! First and foremost: reasoning is a process. Not a table lookup. Table lookup must be used as "part" of the reasoning process, but a simple lookup on it's own is not reasoning. --- Reasoning also is not required to produce any result at all. Reasoning happens regardless of whether there is some end result or not. When we imagine ourselves flying aimlessly in the sky or wandering around in own magic world and doing things in it, in the act of doing so we are also reasoning!
Lookup tables reference the exact verbatim string already stored somewhere, so they can't tell you any new information. GPTs that aren't overfitted learn the pattern that produces the string as well as novel strings that aren't in the training data, and o1 uses those novel strings together to generate results that fulfil reasoning objectives. MS Excel is never going to surpass PhDs in novel, offline physics problems, even with access to a library of babel of interpolations of those strings, because it has no direction. You could reduce it to something like a brute force attack with a sense of warmer or colder, which is still leaving plenty of nuance out, but that's already implying something much more powerful at solving problems we care about.
I've also been struggling with drawing a line between computation and reasoning, and I'm really loving Tim's approach of making reasoning about knowledge acquisition. Like sure a vending machine is computing things about its state + environment and reacting accordingly, but it's never going to come up with a new representation or conjecture that it can turn into a useful improvement in its understanding/knowledge. I also wonder if it's useful to define reasoning as something that happens internally. Because a camera technically gets new knowledge every time you snap a photo, but it isn't knowledge that was produced by its own internal mechanisms. So, reasoning is computation that is somewhat causally insulated from the outside which leads to more compressed/predictive/good representation or understanding that can be meaningfully used? Anyone wanna take a stab at formalizing that?
@13:03 What is reasoning? There are three modes of reasoning: induction (the discovery of patterns), deduction (the derivation of necessary conclusions), and abduction (the formulation of possible explanations).
@@MachineLearningStreetTalk Apologies, but no they are not. Deduction and abduction go from already-known assertions (more exactly, theorems typically taking the form of a sequent) to derived assertions that may be inferred from the already-known assertions. Thus deduction and abduction provide a basis for robust (i.e. logically sound) critical thought and problem solving (i.e. searching the space of all possible theorems), but the question remains: how does such a system get started in the first place, i.e. starting from tabula rasa where do the "already-known" assertions come from? In respect of the mathematical universe, an AGI designer might hand-code some important mathematical definitions and theorems into the AGI's belief system. But in respect of the physical universe, an AGI needs to learn (i.e. recognise the patterns that exist in the structure of the universe as revealed by the observations appearing in its percept history) for itself, and this is primarily where induction (over the percept history) comes in (although, if you investigate the problem of continuous AGI learning (i.e. knowledge acquisition) fully, abduction (of theories of the universe from the patterns discovered in the percept history by induction) is also involved).
@@MachineLearningStreetTalk OK, apologies. I did formulate a reply, and edited it a couple of times, but then it vanished. If I have time, I'll try again.
What is important and distinguishes Turing machines is they use a dsl, logic and memory. Memory is key because it must be a representation that can be physically written to a media and copyable.
The logic problem perhaps works better in reverse - working backward from a solution. Backward the solution space forms a tree whereas forward - the problem structure is not obvious. Next, there are a number of abstractions which trip up the model: rotational symmetry, random rotation and manipulating unknown state.
Check their blog. It's stated that it will come in the future but turned off for now I assume they wanna collect some raw text data first before enabling it
Keith accidently revealed the relationship between reasoning and knowledge when he stated "the reason for reason:. What he meant was the the explanation for reasoning. This reveals reasoning as building explanations. These explanations for things are how we make sense. Also known as knowledge. These explanations are persistent memes composed of Moore and Mealy machines.
I disagree with Dr. Duggar that "it is doing a very shallow form of reasoning" But I do agree with the idea that o1 has perhaps not reached the same level of metacognitive baseline as your average human. Imo o1 is what metacognition would look like in an ML model implementation. And while we should take it with a grain of salt, we should not ignore the theory of mind examples OAI may have cherrypicked to show improvement in that regard. So instead of thinking of o1 as the einstien of LLMs, I am more excited to think of it as australopithecus iteration in ML evolution.
adding new tapes.. you can always add more tokens to the vocabulary, thus increasing the size of the model and having to do a round of training on those new tokens
about 21.00 / on reasoning and neural networks: A very basic and very old distinction from science-philosophy comes to mind, about what "sometimes work" versus "what can be proven as a valid solution". The distinction is of course inductive reasoning versus deductive proof. I think this is the core of the problem: Neural networks are like inductive look-up-tables, and in that sense capable of some form of INDUCTIVE reasoning. But DEDUCTIVE reasoning is something else. It has been said somewhere, that a truly "thinking" system must be able to do BOTH. It is also like the Kahnemann "thinking fast thinking slow" problem: Heuristic solutions serves one purpose, but the slower analytical thinking serves another. Just my humble thoughts on the subject.
It's really driving me crazy that he didn't tell us the answer to the pillar switch problem. My guess is that it's sort of like a Rubix cube in that basically have to use an "algorithm" to solve it, as manually checking after the first time is useless. But also with a Rubix cube you want to reason backwards, in that you want solve the last step first, which is the easiest step, and then solve the first step last, which is the hardest step. So what I'm guessing is that check only the north and south hole Step one: Flip both switches up Step two: Flip both switches down Step third: Flip both switches up Step four: Flip both switches down Step five: You check the east and west whole and if they are facing the opposite direction, you only flip one of the switches, and then hopefully then if it works the pillar will stop spinning.
Given the speed of progress, im wondering how quickly "It seems like but doesnt really" turns into "I dont care anymore because 'it seems like it does everything better than all humans on the planet'." Is emergent behavior truly emergent, or just the natural synthesis of the right heuristics and context once you acquire it. Studies have shown that what was believed to be emergent behavior in llms, was just the proper application of complex context. Isnt that why many 'discoveries' on reflection can be recast as 'it was right before our eyes the whole time."
They likely used MCTS for this, as they did for alphago. Alphago is objectively superhuman in its task. So why would successfully implementing a similar architecture into relatively general models not be AGI? Because it doesn't satiate your preconceived anthropocentric biases?
Yes, that is exactly why. Both of these guys are denialists who consistently move the goalposts with each new model release. But at least Tim has some inclination toward intellectual humility. Keith has a level of smug and epistemic certainty that is both totally unjustified and extremely grating.
Curmudgeon podcaster: "AI just can't reason" AI: *does something that 95% of humans can't do, like advanced coding* Curmudgeon podcaster: "Well that's not reasoning! It's just !" And that's basically what half of all conversations on o1 looks like.
@20:00, What is Reasoning? So if the artificial neural network, like o1, is not doing the reasoning, then what about humans? Maybe human brains are also not doing the reasoning per your definition?
It has read every single academic paper and yet cannot produce new insights. There are certainly many undiscovered connections between different fields and subfields in STEM and yet these models aren't able to identify a single one.
@slowloris4346 that's not true, a wave of combinatorial discoveries are coming. It just takes time for humans to set it up like that and then evaluate all the results. But ppl have started
@@slowloris4346 They don't produce anything until a human asks them to do something. LLMs might not be the way but things like AlphaFold have come up with new things what we didn't know yet. Humans might have eventually discovered all those protein structures eventually but it really sped things up.
You mentioned that reasoning is a process of applying a set of first principles rules, i.e. logic, to get from a set of inputs to some output. But lets say you write a program that does such a reasoning computation and confirm that that is indeed reasoning. If we then consider a lower level of abstraction, such as the assembly trace generated for such program, if I showed that to you, you would say no no, that is just a bunch of registers flapping about, that isn't reasoning, there is no logic going on there. But there is, it's implementing the reasoning program that you wanted. Same for a neural network. What my point is, that to say that neural networks DON'T reason, is an unfounded claim to make. Also, neural networks are a process where you apply iterative layers of compute to an input so it might as well be doing small shreds of reasoning on an input at various levels.
Moravec's paradox can be explained as complexity resolved by commodity. For example pencils are very complex. Very challenging problem initially. That complexity has been converted to simplicity because of broad utility of this solution meme.
Why do you guys think reasoning is so complex? Maybe it's just a really simple process (just pattern matching based on prior knowledge), maybe its actually difficult not to get because we see it everywhere. If it were complex Id expect to see a lot of humans born without reasoning capabilities and this is exceedingly rare (if at all). The same goes for reasoning in the animal kingdom. Again if it was the result of a complex process we'd expect to see lots of broken reasoning processes but we dont. My guess is that its exceedingly simple to get reasoning out of neural networks: you just need to scale them up with a few computational tweaks to what we're seeing: realtime training and full sensory data and reasoning just drops out of it naturally as in its almost difficult not to get reasoning which is what we see in our every day lives.
There is a lot of good points made in this podcast. And I do think LLM currently do not perform reasoning and I do not think the underlying mechanism provide a way for reasoning to emerge. Thanks for sharing the criticisms. Genuine question! LLMs aside. Regarding the infinite memory requirement, I know training the Neural Turing Machine (and by extension DNC) is so difficult and I have tried many times. Besides all that, architecture wise, will it not support that is needed for the problem? Yes the NTM is clocked from the outside in a typical loop, but is that what the clocks in our CPUs do. The stopping condition is observed by the program but the time, temporality is provided by the clock. No?
Just to clarify, we did ultimately agree that it's fair to say o1 is "reasoning", just with some caveats that are important to note. As for NTM and DNC, I think it depends on the exact architecture. Iirc, the original publications on both of those actually specified a finite non-expandable memory. However, let's leave that aside and just same they are upgraded to have an unbounded memory. Then **yes!** absolutely those are not Turing Complete and as soon as someone figures out how to efficiently train that for general purpose, well, that is going to be some damn exciting times indeed. Sounds like you are a practitioner? We'd love for you to join our Discord and share what you are up to with those NTMs!
Prolog has backtracking. Reasoning often requires undoing some steps and restarting from a previously valid "save" point and exploring different paths in the space of potential solutions.
@@djayjp I think in the Kyle Kabasares video it just generated mathematica code and he ran it, although the video itself is pretty good proof it can solve novel problems with practical uses that weren't in it's training data and he's not a hype guy, definitely worth a watch
I've always wondered why we don't "just" implement a turing machine in a circuit that's trained alongside the GPT. I know we've sort of hacked together nand gates inefficiently using rl and recently (Franz Nowak - I assume he's gonna be on the show soon?) properly represented a turing machine in a transformer, but I mean like just a classical turing machine the network has access to as opposed to just calling an API during inference. I only dabble in low level ml so I could be way off, but from what I understand this is just an interface problem. Could we not use something like the features that anthropic extracted to reliably i/o with them in binary if we wanted to specifically design a virtual bus for that? Maybe even expand it arbitrarily and train a network to learn how much memory and bandwidth to assign to it?
Total missing the forest for the trees on this one guys. "Approximate Reasoner" is probably the best term, but the turing vs not etc, has no market impact. The real question is does the o1 series produce commercially valuable answers for a price the market is willing to pay. The real bit I would investigate, is why are the rate limits 250 times lower for o1 vs GPT 4o. Likely because the inference it is doing is ludicrous. It is a total mistake to think that his is all a serial chain of thought, there is no way you could do significant computation that way, you just get limited by memory bandwidth. What you are seeing is massive parallel computation, that gets summarized into CoT. There is no other reason for a 30 query a week or 25 RPM (vs 10,000) for Tier 5 rate limit. They are not making money on this model, they are subsidizing it.
Yes, it's very upsetting as a mathematician/scientist to watch this god-awful wasteful approach to solving a "problem." What problem that is? I have no idea.
"Reason is the faculty that identifies and integrates the material provided by man’s senses." When these AI systems have sensors and a commitment to identifying reality we are getting closer to something that looks like reason. But with no sense perception there is only computation of some vanity.
Great video, my favourite deep channel on youtube. Reasoning to me is kind of also including solving a problem, not just in one session. But finding the solution after eg 2 weeks by connecting the clues, insights, new data etc. LLMs does not feel shameful after hallucinating an answer, it is not a part of the sequence.
Rag, chain of thought reflect atate transitions. These state transitions are simply counterfactuals that match underlying "least action" outcomes from emergent factor entropic forces
Great conversation!
Thank you Lex! Much love!
Thank you, Lex! Much mutual love!
Thank you lex, now tell sam altman to release our AI waifu.
@lexfriedman we want more old school ML interviews on your channel. We love the new stuff, but would love a throw back.
Lex needs to have MLST on the pod
One common thing I see with people claiming LLMs can't reason is that they do not apply the same scrutiny to humans, which are the clear benchmark AI is being tested against. I think that if any criticism is brought up against AI, the same should be tested against the human benchmark.
The average human is not consistently reasoning from first principles, and the ones that are doing so were trained specifically to override their primitive thought patterns to think in this rigorous way, and even if someone is able to do this relatively well after years of practice, they are clearly an outlier, and they still introduce bias and flaws into this "ideal" reasoning pattern.
Also i think that the human brain was not designed from the ground up to be reasoning machines. As a reasoning machine it is occasionally able to muster an inefficient approximation of reasoning patterns against all odds.
Humans are also very sensitive to prompts. If you ask something in a different way you can get an entirely different answer.
You switch your computer on and it boots successfully after executing literally millions of instructions without a single mistake. It's reasonable to expect the same deterministic output from an AI system.
Well, o1 supposedly outperforms "experts" on "PhD level" problems in physics, biology, and chemistry. It supposedly scores very highly on the USA Math Olympiad. And yet, it gave very very stupid answers to Dr. Duggar's problem at the end. How many PhD physicists or high scoring Math Olympiad contestants do you suppose would give the answers o1 gave? Watching it fail miserably at a not-all-that-difficult deductive reasoning problem suggests to me that its apparent successes at much harder problems were illusory. If it can't reason through simpler problems, then it's not reasoning through harder ones either, even when it gets the answer right.
So it's not just that LLMs make mistakes - it's that the manner in which they make mistakes belies claims of underlying reasoning. Give GPT-whatever a seemingly challenging problem, and it might give you a perfect solution. Then give it an easy problem, and it might give you gobbledygook. You just never know. Humans might make plenty of mistakes, and we might rely to some extent on memorization, but individual humans don't switch back and forth between appearing to be math geniuses and appearing to have poorer reasoning than a 3-year old.
So yes, humans are not consistently reasoning from first principles. But sometimes we are. LLMs never are.
@@benprytherchstats7702they do. Geniuses are often absentminded and forgetful, even beyond a toddler
We are currently contacting the JWST to see if they can locate the goalposts out in the Kieper belt.
you funny!.. And correct.
😂😂
Tell me when you got the goalpost of truth in your pocket instead buddy. Good luck.
🤣
I can see both sides
I tend to use an analogy like the chimp test on the human bench mark site, a benchmark that chimps crush compared to humans.
Similarily I think current ML models are real intelligence, just that perhaps we should think of inteliggence as having different modalities.
In that respect expecting an LLM to be comparable to a biological agent, that is embedded in realtime within the world in a body, is a bit unreasonable to hold those models to the same standard of intelligence as humans.
Imo you would have to compare those models to a disembodied brain in a vat to get an effective apples to apples comparison.
Of course that is simply not possiblse to do so that is why it is not very reasonable to insists that LLMs are not "real intelligence" or have generalized at all simply bc those models still struggle with out distribution problems that humans do quite well.
Imo that may be like chimps claiming that humans are dumb bc they can't perform well on the chimp test compared to humans.
Imo the novel "reasoning" happens at the training stage, where these models don't "go look up the answer up on a look up table and select the best output from a one to one database" at least not based something as simple statistical frequencey occurance.
What is going as the model is trained is more complicated and nuanced than that and it wasn't we would not need ML would not be useful, we could handcraft those solutions ourselves with having the ML discover the efficient method for doing that, even if the balance between entropy, search and memorization is not dailed into the same parameters that humans are at.
So it seems a bit intellectually lazy to gloss over that subtlety and then scoff at the idea that people remain impressed with each new iteration of the current ML summer.
But hey that is just a theory, an AI theory
All this "is this really reasoning" discussion just reiterates what Turing himself complained about in 1950. He rejected the question of whether a machine can "think" as just as meaningless as whether a submarine can swim. It really doesn't matter. Can it solve tasks, can it achieve objectives effectively? If yes, it doesn't matter if you call its steps "reasoning", or "just a lookup table" or "glorified curve fitting" or whatever. If it works, then it's impactful, both in positive and negative ways.
This is all too similar to claiming an airplane can't fly because it can't flap its wings.
Exactly. Thank you for stating this so clearly. These people are delusional and in complete denial.
The important distinction here is that we still aren't sure if o1 (or any language model) can come up with *new* knowledge. Until this has been proven, people are totally right to say it can't reason. It would be like an airplane that can flap its wings but can't fly-- it's just going through the motions.
@@eposnix5223 It's literally solving physics problems and maths proofs that are outside of the data it's been trained on.
@@eposnix5223 Then delineate what you mean by "new knowledge" by designing a test for it. I'm sure once a new model passes that test, you'll move the goalposts and claim that this is still not *truly* new knowledge or something. It keeps happening and it's embarrassing.
Nobody is claiming that o1 is already an AGI. But complaining that it isn't "reasoning" is meaningless. The word doesn't even have a commonly agreed upon definition, everyone can read into it whatever they want.
@@simpsimperson73 no it does not lmao
An idea I want to throw out there: I've been playing with o1 (mainly mini) and have found that it works better if you explicitly tell it to be its own discriminator from step to step. What I mean is, tell it to roleplay with itself as "Bob, the expert coder" and "Tom, the expert critic" and have Tom check Bob's work every step of the way. This is how I normally work with GPT-4o, but o1 allows this process to be automated.
just give it the task, it doesn't need prompting, the whole point of o1 is that it self-prompts. It works best without you attempting to prompt hack it.
@@Gnaritas42 Not in my experience. o1 still makes rudimentary programming mistakes when left to its own devices. The code logic will be good, but there will be small errors that prevent it from running. Tell it to doublecheck itself will reveal these errors.
@@Gnaritas42 "just give it the task, it doesn't need prompting, the whole point of o1 is that it self-prompts. It works best without you attempting to prompt hack it." That's not true. I get much better results in algorithmic questions by first asking it to generate tests and include as many corner case tests as possible, then write the code ensuring that it's tests would pass. Then it spends about a minute to answer. But if you just asked the same question it finishes in 9 seconds with much inferior code, and in this particular case, buggy code because it missed several corner cases.
I think you make many great points. I guess I am one of the people who don't want you mystify reasoning too much. We can say it imitates or guesses or simulates it, but the fact of the matter is that it produces an often accurately described reasoning.
So we can try and deny it, but if it can generate this description of reasoning and use it somehow to guide its answer effectively, then we have effectively an A.I. that is using reasoning.
Maybe not exactly how we would, maybe not as reliable in some ways, but this is reasoning.
This is MLST, so we already know their answer to this question. It's maybe a tired analogy at this point, but I feel like their view can be summed up as: airplanes don't fly, they just fall forward through the air very fast! they need these long runways for takeoff and landing. This is obviously not true flight, and we should stop working on this in favor of feather-based systems.
In the end, the interesting part is does it get us from point A to point B in a straight line through the air in an economically viable fashion?
I'm also not very certain that it's that much more shallow than human reasoning. I constantly find errors in people's reasoning, and people tend to stop thinking very quickly.
I do like that the idea that the reasoning itself required to solve a difficult problem isn't necessarily that much more involved. There are other factors that can make problems difficult to solve. (Lack of overview, lack of feedback, lack of insight, lack of clarity, lack of time, etc.) So when a problem seems difficult to us, it does not mean that it necessitates advanced reasoning.
@@lystic9392
This is my thoughts as well.
Perhaps one of the biggest hurdles is that we trying to teach the ML with human data, and then some are really disapointed when it turns maybe most of the data humans generate is not "general" or "intelligent" some are overly disappointed that the ML cannot one shot and filter out the noise from the signal.
We are primates after all, so I suspect that most of our daily baseline "reasoning" is not very high level, and those moments of brilliance are more spurious, we just happen to benefit from them so much as a species bc we have a way to accumulate and distribute that information.
I think this gives the impression that in general maybe we believe humans are smarter than they actually our at a baseline level.
Don't I agree with this exactly at time code 50:30? ruclips.net/video/nO6sDk6vO0g/видео.html
@@martindbp
"airplanes don't fly, they just fall forward..."
:D
i love the ones with just you two. i dig the guests, but when its just you guys it feels like I should crack a beer and join ya.
Back to the good old days of early MLST
Beer would be downgrade
Yeah, maybe a coffee or tea instead. Beer does feel like a downgrade.
Echo!
If the pillar spins when both hands are out, then can't you just leave one hand in, flip the switch up, then insert the other hand in every other hole and do the same thing without removing the first hand. Boom, all up, remove hands.
I think when we reduce it to "it's just doing so and so", it's usually not a good argument because we can be also reduced to simple processes. Maybe computers and humans reach "reasoning" in a different way, but if a computer can reach right conclusions and eventually create new knowledge, ok now what separates it from our reasoning? Maybe it's our limbic system, and our "feeling" of being something special and "having a soul", but those are subjective things that may not be necessary for an intelligent being. I think we are going to reach a point in semantics where consciousness or "real intelligence" is only reserved to *human* thinking, even if a machine can surpass us by most metrics.
Yes but for open AI o1 to do the reasoning on anything it needs me to type in a question. And all the "thinking" that follows has to be triggered by that. And will be linked to it. For ex a human reasoning indepently can have an epiphany on a totally different subject... that will never happen for llms (for now).
@@ElieSanhDucos0 Yeah I get it. But "for now" is a good point, because maybe the LLMs are an important piece on a puzzle that seems to have started being solved. It's impressive how much room this tech still has for improvement and despite some slowdowns it seems to be still full throttle. I understand hype is not a healthy thing but it's hard to ignore how much money is being poured into it, more than anything humanity has seen in ages. It doesn't look like pure tech scam type of hype. I don't like to dismiss it as "nah, this has nothing to do with AGI at all"
Ok.
Why people bother about whether it is reasoning or not, you might ask.
Isn't it irrelevant what it's doing as long as we are happy with the results?
Good question!
The problem is not in the definition of the word "reasoning ".
The problem is with the approach we, as an industry, take in order to get to a generalized solution.
There are huge amounts of money and human brain power that are now wasted on fine tuning an approach that leads to a dead end.
This is an outcry for change of the money flow and brain resources to a different approach that at least could have success as one of the possible outcomes.
@@egor.okhterov I see, but I bet the money is not going blindly for dead end solutions. AGI is a Golden Goose and I think most researchers understand those concepts. At the same time there is room to improve LLMs, and LLMs are presenting amazing results by themselves. So I don't know if an outcry is really needed.
@spaceghost8891 did you hear about recent VC rounds into AI startups?
Take Argo AI for example :)
In order to get rich, you need to have good understanding of humans. You should be able to connect with the right people and make good impression.
Having deep understanding of mathematics, computation and machine learning is usually not part of the mental toolbox of the people with large amounts of money.
When you pitch an idea to a VC guy, you should spin a story in a way that shows how you can make money off of it.
The way all these ideas are presented is that LLM is a low hanging fruit, like a light bulb in 1879.
They don't talk about developing a proper reasoning machine during a pitch. VCs don't want to hear about that. They pitch how they can build an app on top of LLM, possibly fine tune it and make a call service or maybe on-premise smart search for Corp documents or a tool to patch the software, upgrade the codebase from python3.8 to python3.12 or upgrade to a new framework. And other stuff like that.
A lot of very smart people were sucked from the job market into these mediocre projects with a short term goal to milk cow the first.
But this will not lead to AGI.
ALL THREE CAN BE TRUE:
> o1 is reasoning
> it is not reasoning “well”
> this may not be the path to ASI
I'm coding on a Sunday and I get to pop this on in the background. Nice surprise, love it.
Well I’m prompt-coding as well 😂
I’d like to hear you guys talk more like the old days!
We will do more like this! Maybe even get Kilcher back for some ;)
All other AI podcasts (apart from Dharkesh's) don't even come close to the quality and depth of content of MLST, thanks for bringing us along on this journey, we've learned a huge amount
Have you checked The Cognitive Revolution podcast? Highly recommended for wide breadth, excellent signal-to-noise ratio and a host that is intellectually curious, knowledgeable yet humble and hungry to learn.
What is the name of the podcast. Is it on Machine Learning specifically?
I think whenever we want to say "these machines do not think" or "they are not intelligent" we run into problems because, as nicely discussed by this great podcast, we are not sure what these things or processes actual are. I like to think of cognition and I like to think that any form of cognition implies self-creation, i.e., operations that produce themselves. And if we think about the process made so far and what many think is required to get to AGI, this recurrent and seemingly paradoxical ability of self-creation, self-observation, self-differentiation might be key. I also think that AI will be as dependent on us as we are dependent on each other. Our language is our world and we can only learn it by our culture.
Great talk!
(1) NS, switch both up
(2) NE, switch both up
(3) NS, if a switch is down switch it up. If not switch one down.
(4) NS, if both switch same flip it, if different do nothing
(5) NE, flip both
(6) NS, flip both
Of course, it could stop at any step if all up or down.
This was amazing and insightful. Both the puzzle and the talk. Thank you.
The first time I read your answer my brain could see that you could be right but it had no idea whatsoever if you were indeed right. Then I added the states in which the puzzle is after your steps. From that I learned that your solution works but step 4 seems to be unnecessary because the outcome of step 3 is that both axis have 2 switches in opposite values so you will never meet the condition where you are going to switch anything, and you will do nothing in this step.
@@jantuitman You are right, I guess him asking for 6 steps primed into thinking of 6 steps 😂. (4) is unnecessary, thank you.
@@jantuitman I asked GPT-O1 to write a simulator to test my solution. This was fun.
Removing your step 4 and running in simulation for 1,000,000 rounds...
Simulation Results after 1000000 runs:
Distribution of Steps Required to Solve the Puzzle:
1 step(s): 250474 runs (25.05%)
2 step(s): 249789 runs (24.98%)
3 step(s): 249611 runs (24.96%)
4 step(s): 125405 runs (12.54%)
5 step(s): 124721 runs (12.47%)
Failed to align within 5 steps: 0 runs (0.00%)
O1-mini may only be in its' early stages of reasoning, but it's damned useful!
@@BrianMosleyUK Impressive. So apparently, it really works for this problem when in tandem with a human (i say in tandem because you still had to prompt it to write a simulator). Of course there is still the criticism that it may not deal with non-computational problems and that it cannot find all reasoning patterns when not working in tandem with humans. That is all very valid criticism and I dont know if we can speak of "AGI" or intelligence, and it may even be questionable that we give our reasoning up to OpenAI for free. But all these criticisms aside, i think there are many useful applications possible for these types of models.
The biggest problem with o1-* models is that the chain of thought spam deranges dense context. The context is filled up with noise i.e. about "code coverage" in a query which has nothing to do with coding. In any complex/ambiguous problem, it will misunderstand you on the first shot, you then explain what you actually wanted and by step 2 the CoT noise deranges the model significantly over a standard LLM. Catch-22. They should prune away the noise on subsequent queries. Perhaps as the models learn (from the users), the trajectories will become less noisy. It's also clear that the trajectories are hard-trained for specific purposes which will limit their generalisation.
In the metaphor- chain of thought- in humans there is certainly noisy context preliminary to the final coherent thought one is conscious of.
Another thought- initialized weights are random, and we need a preponderance of useless noise to find the lottery tickets.
Perhaps the noise is not the problem per se, in fact necessary, but it should be pruned on some schedule and summarized for next context. Along the lines you suggest.
I have not had this experience at all. You said in any complex query it will misunderstand on the first try? Not at all. I’ve had roughly an 85% success asking the most difficult questions I can think of. And even on the 15% it got wrong, it was typically not because it misunderstood the question.
holy sclerotified models, batman!
My metric for AGI: When I see LLMs harassing and insulting one another during an edit war on a wiki maintained exclusively by LLMs.
Out-of-box puzzle solution:
1.) NS : flip both switches up
2.) NE : remove hand from S, leaving hand in N, and put the now freed hand in E and flip the switch up
3.) NW : ^ same, but for W
...the puzzle parameters never explicitly stated you had to remove both hands. I get that it's intended to be solved by lining them up so NS and EW have opposing up/down alignments by way of using adjacent pair switch flips first, but just leaving my hand in one was the thing that first jumped to mind. The puzzle should be reworded. It's fun to think about though if played as intended.
I don't care if it's thinking or reasoning or pulling answers out of its ass, as long as it has the correct response.
Presumably it will have to be able to explain the steps it took in reaching that response?
@@fburton8 A neural net can always detail the steps it took - the inputs, weights and outputs are there for anyone to see. Unfortunately the number of steps and their weights exceed the mind space and time availability of people.
Humans generally judge with post-hoc reasoning anyway. We're terrible at knowing how we come to conclusions.
What about when you don't know whether the response is correct? What about when you're entrusting it to make decisions because it appears to "reason"?
The fact that it isn't doing anything like human reasoning does matter when we ask it to do things that require human-like reasoning. See, e.g. replacing teachers and lawyers and therapists and secretaries, etc. Or even just employing it as a customer-service bot. The difference between our "reasoning" and its complex statistical next-token-prediction matters in those applications where consistency and reliability are important.
OpenAI is not that open. That we can all agree with! 🍻
The current chain-of-thought in 1o tries to refine the degrees-of-freedom that the attention mechanism will produce from the context, in a way model gets closer and blends between the right datapoints. I can accept this as some form of reasoning.
Glad to see Keith again on the podcast.
I’m not. He just shows up to engage in smug denialism using terrible argumentation.
"With the new GPT models, you need to start a whole new session."
You're supposed to clear the session regardless. Performance drops off quadratically the further down the context window your prompt is. They discovered this back with GPT-1.
So many issues people have with ChatGPT are skill issues of asking multiple complex questions in the same session. It was designed for casual noobs to have a back and forth chat dialogue with it, but you shouldn't be using it that way if you care about performance.
I assume Tim and Keith already knew this, so maybe I've misunderstood what they were saying here or something
@@steve_jabz They randomise responses precisely so that they are not reproducible.
@steve_jabz We explained in this video why you need dense context in an LLM. With any complex problem, i.e. one with ambiguity -- the LLM won't understand you on the first shot, or even the 3rd shot. That's why you need explain (or discover with the LLM) what you actually want through successive prompts. Given that the CoT trajectory noise deranges o1-preview, this means in situations where it didn't understand first shot, it will actually degrade on all problems which require further clarification, particularly if the problem is ambiguous and the CoT spam includes very noisy trajectories. There is a trade-off between dense and sparse context, dense context means you don't need to re-explain again and again where it misunderstood your requirements and you can build on the problem, but there is some kind of curve here based on how related your workstream is i.e. you could continue a session to build around a single feature in GenAI coding (with the same constraints and requirements), but would start a new session to build another entirely new feature. You are probably correct in the pristine case of a perfectly well described problem which the model didn't need hinting for (when does that ever happen lol) would work better in a fresh context. Not sure if you have done much GenAI coding yet in Cursor, but please try some stuff using the composer with Sonnet35 / dense context, you will quickly come to appreciate what I am talking about.
@@MachineLearningStreetTalk That makes sense. I normally use discovery questions to formulate a prompt for a new session for complex problems, but I guess if o1 is using up / wasting tokens to even do that part well, that's a problem. I must have misinterpreted the context you were talking about it in.
Haven't used cursor yet but I've watched other people use it. As soon as o1 dropped I assumed everything cursor was designed around would break with it and need to be completely redesigned, as well as agentic workflows. Not even for the context window necessarily, but the fact it's trained to simulate taking multiple steps to reach a goal and give you the output, whereas cursor and other tools are trying to use the llm itself to reach the goal by abstracting away the tools in a way it can control, so you would need cursor to use it's reasoning as the control, and you would need o1's reasoning to be aligned with the function calling API or something integrated with cursor.
It sounds so close to being exactly what we need for these tools to work reliably but it's so incompatible.
Here is a funny thought: what do we call processes that have both determinstic constraints and random variation dynamically interacting? Evolutionary processes. So that is what reasoning must be. We just don’t know how it works. It is quite clear how it happens in biology, but btw, we don’t know how that happens computationally either.
@@93alvbjo o1 isn't mutating and self-reproducing, although maybe you could argue that it's memetic once it has a structured, social way of progressing it's own ideas over time. I guess you could call something like claude artifacts that or openai training on customer outputs to a frozen state if you want to be technical, but I'd agree with Keith that like, come on man
This was my favorite "solution" from Claude 3.5 Sonnet:
"Potential solution: What if we do the following:
Reach into any two adjacent holes.
Make these two switches match (both up or both down).
Without removing hands, reach into the other two holes.
Make these two switches match each other, but opposite to the first pair."
😂
2 hands 2 feet?
@@zalaeify lmao
The way I have recently been thinking about reasoning is as a model of environment dynamics that has a high degree of predictiveness.
I liked this definition as it remove the idea of reasoning as an 'emergent' phenomenon of 'intelligent' systems and reduces logic down to being something extractable from the environment.
More concretely, suppose we have an environment E. Suppose Q, A \in STATES(E) are two states such that there is an environment transformation R: Q -> A. Suppose further we have a learning agent that has learned a latent representation (Enc, Dec) of environment states, The learning agent is said to be able to reason if the autoencoder extends to environment transformations: Enc(R): Enc(Q) -> Enc(A), and the obvious decoder square commutes.
This formal definition obviously need to be weakened to a statistical statement. What do you think?
In this context we could give a measure of intelligence as the size of the set of transformation of E that map to the learners model: #{ R \in TRANS(E) | Enc(R) = L}, where L is the learners model.
4:00 Under this view a human brain wouldn't be impressive because it's algorithms can't expand its memory indefinitely, or traditionally only had low bandwidth access to paper etc. to expand it. I think Schmidhuber would consider achieving brain-like intelligence to still be impressive.
Yeah, all I’m hearing from videos like this is defensive doubling-down from people who committed early to the position that LLMs will never give us true intelligence, and now feel that they have to defend that ground at all costs and never change their minds. They’re in denial of what is obviously happening right in front of us, and their excuses and delineations grow flimsier with each new model release.
But humans eventually invented Turing machines ;)
I thought the prompt was worded a bit ambiguously, and the corrections being placed way down the context window bias it toward up to 128k of misinformation, so I reworded it without giving it any hints:
"There is a pillar with 4 holes aligned north, south, east and west. You can't see inside the holes.
Inside each hole is a switch set either up or down. You can reach inside 1-2 holes at once, feel if the switches are already up or down and set them either up or down.
If all 4 switches are not either all up or all down after you remove your hands, the hole axes get aligned randomly.
Come up with a procedure of reaching into the holes that is guaranteed to set all the switches either all up or all down in 6 steps or less.
The procedure cannot rely on random chance."
GPT-4o said this:
"Key Idea:
Ensure symmetry by focusing on pairs of opposing switches. Once symmetry is achieved, the random reorientation becomes irrelevant, as you will force all switches into the same state.
Procedure:
Step 1: Reach into the North and South holes. Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "up".
Step 2: Reach into the East and West holes. Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "up".
At this point, you have ensured that opposing switches (North-South and East-West) are both set to "up".
Step 3: Now, reach into any two adjacent holes (say, North and East). Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "down".
Step 4: Reach into the same two adjacent holes (North and East) again. Feel the switches. If they are both up, set them to "down". If they are both down, leave them.
Step 5: Reach into the remaining two holes (South and West). Feel the switches. If they are both in the same position, do nothing. If they are in different positions, set both to "down".
Step 6: Now, all switches will either be all up or all down.
This procedure ensures that, regardless of how the switches are initially configured, you will set all of them to the same state in 6 steps or fewer."
Still wrong, we explained the answer on the patreon version 😁
@@MachineLearningStreetTalk Thought it might be, but it mentioned the symmetry and seems a lot more coherent? Would be interesting to see this tried on o1, but I don't need it enough to pay for it.
I still don't understand the question myself so maybe it could be worded differently still. The part about removing your hands resetting them if all 4 are not up makes it sound impossible to me given that you can only try 2 at a time, and I don't really know what a pillar is in this context to imagine how it rotates around, which I assume is related to how I misinterpret that part.
LLMs get caught up in accidental red herrings easier than the average human that can perform the same knowledge work, which is worth pointing at as a failure point, but it's still interesting to know if it can solve the problem with extra care taken to avoid sending it in the wrong direction.
@@steve_jabzJust give up-these guys are stuck in denialism and will never admit to being wrong about anything, no matter how far LLM-based AIs may progress.
@@therainman7777 Having watched the channel for years I've seen Tim's opinions change over time and question many of his assumptions to end up where they are.
I don't agree on a lot of things but I would put it down to difference of perspective and knowledge and influence from some very intelligent people in the field rather than cope
I hypothesize that “reasoning” in humans isn‘t a fundamental process, but rather a generalization derived through deduction.
I'm still waiting impatiently to see how they will draw these guys on South Park.
Anything where there’s a reason something happened is reasoning. A rock falling off a cliff is reasoning. Yes, I know we usually call that physics, my point is that there’s really no difference. Logic in your brain and logic in computers happens as the result of a thousand tiny switches being flipped, which affects the flow of electricity, and that’s how your computer does math. I realize this is maybe an obscure or unique viewpoint but I think it’s necessarily valid.
Indeed, the rock is trillions of states (in the basement level of our current models), to compress it in one word is quite a lot of leverage. The physical playout must perform a similarly coherent chain of interactions. In fact quite a deep bundle.
But in those terms LLMs are pretty plain little packets of interactions.
More divergent but smaller.
@@oncedidactic we need moar
@@Shlooomth we need better ;)
brains not rocks
Love rocks all the same
I even have something that seems very simple that cannot be done by Chat GPT and o1 : For learning Spanish, I asked them to give me a spanish word with mixed letters and to give me a clue for its meaning. I need to find the word with the letters in the right order, but they always give me too much letters! That will be my test on the future versions. If it cannot do that simple task, I will keep trusting my reasonning abilities over theirs... Thank you again, it is always fantastic to listen to your exchanges!
I like this definition of reasoning, “the ability to create knowledge to create a good explanation”. No LLMs are able to create knowledge currently. The behaviour we’re currently seeing is based on past knowledge the model has been shown during training. Explanations are at the heart of this, and creativity is critical to good explanations. Models without creativity won’t be able to do reasoning properly.
I don't understand how you can prove the creation of brand new knowledge.
New knowledge can be stumbled upon randomnly so it is the process by which it is achieved being labelled creation or luck which is important. Pretending new knowledge is discovered from scratch is possibly a red herrring. Humans are adept at convincing themselves they have come up with a concept themselves when really they are standing on the shoulders of giants and been pretrained with either the actual knowledge or a system that can produce that knowledge. Also much new knowledge is stumbled upon due to the inherent epistemic chaos (as they mentioned in the chat when talking about Wolfram) . The idea is that llms are much less efficient at this than llms so far and because they have a wider computational base are closer to a universal turing machine than an LLM. This however has not been demonstrated clearly enough that most people can see it. By consensus it seems very smart people seem to get it but for the rest of us and I include myself, we cannot clearly understand the higher levels of human reasoning well enough to differentiate. However this could just be a matter of education but maybe it could be beyond the limit of our intelligence.
@@4tofromcafeOur understanding of quantum physics didn't exist several hundred years ago - that's new knowledge. LLMs don't see capable of creating new explanations that way. They lack creativity to do so.
Parsimony is actually entropic force. For example protein folding are counterfactual outcomes that are stable when they are curled up. When stretching a rubber band you will notice the entropic force the resists less parsimonious outcomes.
Slight contradiction around the 40min mark. You guys said that we need Turing Machines to perform reasoning, but then said "I'm a machine, I'm a bio-machine, with wetware and software, but I can still perform reasoning". Humans are not turing machines. So given that humans can reason (and have general intelligence), why can't other non-turing machines also have general intelligence?
Because Keith’s argument is nonsense that in no way proves what he seems to think it does.
Btw it’s not a slight contradiction. It’s a massive contradiction that puts a hole in his entire argument.
Is your argument that you can't fMRI a brain and see an obvious potentially infinite internal tape? It's the system as a whole that is obviously turing complete. We are clearly powerful enough to imagine the entire set up: symbols on a tape, and a codebook. Then, we can analyse real world problems, the create in the physical world the correct program and data to solve them, then execute the instructions from the code book with a pencil and update the paper tape? So the human system as a whole is not only Turing complete, but can create potentially a specific incarnation suitable for any problem. Whilst there are a few papers on LLMs simulating Turing machines, they're a very long way short of this.
@@luke.perkin.inventor Yes, the argument is that (obviously) you can have an infinite internal tape inside the brain, because the human brain, like everything else in the universe, is finite. Not infinite.
@@luke.perkin.inventor what does "potentially infinite" mean 😆
Took me no less than 15 minutes and a pen and paper but my 5 or less steps deterministic solution is:
1- Check two opposing switches and flip them up
If the other two were both up, you win, otherwise you proceed to the next rotation
2- Check two opposing switches again
if one or both are down, these are not your original switches, flip them both up and you win
if they're both up, these are your original switches, flip them both down
if the other two were both down, you win, if they were different, you now have only one that is flipped up
3- Check two opposing switches again
If one of them is the one that is up, flip it down and win
If both are down, these are your original switches, switch one up, now you have guaranteed adjacent symmetry (if north and east are up, south and west are down, etc)
4- Check two adjacent switches
if they're the same, flip them both and win
if they're different, flip them both as well! (flip the up down and the down up), now you have guaranteed opposing symmetry (if north and south are up, east and west are down or vice versa)
5- Check any two opposing sides and flip them to win
Thank you, it will be integrated to the next gemini training
You can also get the same outcome by replacing step two with making sure two adjacents are both on. Still gets you to the point of having 3 switched on guaranteed and then I did the same steps as you from there
what were all of the thinking steps that led you to your solution?
That's a nice solution. It took me much longer! Did you figure out how to describe the puzzle such that GPT-O1 could solve it similarly?
in step 2 how do you know you are not hitting the same switches over and over again turning them both either up or down in an infinite loop?
100% correct. Thank you for seeing it and openly communicating. 🤩
"Epistemological foraging"
I agree it is a beautiful phrase.
And basically Wolfram's ruliad represents to me the full breadth of all possible foraging that can computationally obtain.
So I will take this moment to push back against Dr. Duggar once again.
He has mentioned earlier in the vid that he is not concerned about PvsNP, but I must insist that it is relevant.
Either what we mean by relevant terms like "general intelligence" or "consciousness" are decidable or they are not decidable.
Let's assume these terms are not decidable, we can revisit his complexity issues with one shot one step oracale machines...that produce the correct yes or no output with very complex deterministic efficiency to decide.
Or we can say that process is iterative from first principles and efficient circuits with robost metacognitive entropy.
Dare I say, egads the ontology of distinction in boots is valid and tarski is not a fool.
I suppose what I am suggesting is that no he can't proclaim "this or that is not agi" without a welldefined solution to the pvsnp problem.
But I appreciate the points he wanted to make without that context.
What an exceptional conversation. Good push-back from both sides, deep approaches. More of like this please!
There are some real issues with CoT paradigm aswell because the model can actually converge towards an incorrect local minimum vs if you had manually prompted it through the different steps you could steer it better and as you wish. It works fine when the chain of reasoning is extremely obvious and unambiguous, but it could increase the risk of hallucinations in less clear cases.
Fei-Fei Li World Labs definitely right track since spatial intelligence reflects an inductive bias that is "maps and navigation"
Based on your set up of the problem the solution can be simply: (1) putting 2 hands in say, N and S and flipping the switch up. (2) And then... without removing hand from N, remove hand from S and place it into E and flip the E up. (3) And then... still without removing from N, remove the second hand from E and place it into W and flip the W up. And voila, you have all switches up without the pillar spinning at all. In three steps.
Yeah I don't get the puzzle either.
I'm pretty sure the intention of the riddle is that you can put your hands in at most two holes between every spin of the pillar, and the puzzle just doesn't state this clearly enough. Your solution is a 1-step solution in which you test all four holes in a single step. (By a "step", they mean one turn where you put in two hands and remove them and you let the pillar spin.)
@@bastiaanabcde so whats the solution for this variant?
In the first step you can make two levers the same, let's say N and S. In the next turn, after the random shuffling you might look at N and E. This guarantees that you have 3 levers in the same position. However, I don't see how you are ever going to bring the fourth lever into alignment. Because, simply by chance or something like maxwells demon, it could always make it so that, you never see the 4th lever.
@@xthesayuri5756 recall that you win if all four switches are aligned, which may mean either all up or all down. So you’re right that there might be a lever that is never shown to you, but this isn't needed to solve the puzzle.
Note: If you manage to get three levers in the same position and you haven’t won yet you know that the fourth must be in the opposite position. So you can get information about levers in this way even if you haven't touched them.
I won’t spoil the answer because maybe with this hint you’ll be able to figure it out
@@bastiaanabcde i see it now thanks, had to draw a state diagram in paint lol
You can move goalposts however you want. We are getting to agi and past with this. The world will still be indistinguishable from now.
"resistance is futile."
"take our word for it, and just give up."
it's still a turing machine.
@@mk71bNot sure what that’s supposed to mean.
Reasoning = employing logical thinking (thinking: pattern matching) (eg inference and deduction) & recursive reflection (checking & revising tentative output) to arrive at solutions to novel, prior unknowns--thus generating new knowledge. o1 performs this.
Haha 😂
It does, but they’ll never admit it. They’ve basically reverted to “don’t believe your lying eyes” when we can all see what’s happening right in front of us. But we’re supposed to listen to this guy instead, with his convoluted rambling which doesn’t prove a thing. Mark my words, this guy will NEVER admit that he’s wrong, even if we reach a point where we have AIs that have cured cancer, solved climate change, etc.
I'm not sure if you've ever stopped to think about how humans reason about novel problems and what the results typically look like. We know it doesn't need to be a demigod with unbounded memory and knowledge of how to deal with any possible problem to be an AGI because the generality of AGI is based on what humans are capable of, and humans are nowhere near anything like that. If you want to try to mathematically or computationally constrain it you have to ignore it interacting with the world at large, both input and output, and only allow it to use it's turing tape, any further inputs from a human after the initial starting state would already break that guarantee.
Hi both, loving this session - thank you MLST. At the mid-point of this discussion (around min 42 ff) you are talking of reasoning in the context of iterative steps vs a pre-determined massive model of subroutines processing all in a single step. The point being the second form is not reasoning, merely process efficiency in a pre-understood way. I see that second as an optimisation notion, and therefore distinct from optioneering towards a less defined end. I have in mind basic linear programming or game theory and the inherent assumptions of rationale ends being objectively knowable [sic].
This focus on optimisation seems foundationally important to me. Because such efficiency is only occasionally what human curiosity is about. Notions such as happenstance and serendipity need the forever ability to be juxtaposed from what was previously thought or prioritised in action. This is a human quality beyond a start and an end of a query. I would respectfully suggest much of what you are talking of in this episode is moving beyond reasoning. Indeed, it is folly to only have reasoning in mind as to what this bigger problem is really about. The limits of this current AI is not reasoning because it is not "a process" (if following the definition used at the beginning of this podcast). Instead, what is being considered here is the characteristic of adaption. An ability to reset what one is working towards, with that new information or perspective. Behind that it is knowing that we do not know, and being humanly capable of asking why or how in a series of iterations where the question itself begins to change. That seems to me foundationally important to help distinguish both Turin computation and general intelligence from what human action is really about and what computation is only a part of. This is motives for asking the question, and knowing when the proposition itself is revealed as incomplete.
I am probably coming from this very differently. As a a PhD student but from the world of governance of people, and projecting human endeavour in going about physical construction of the built environment. However, I see this foundational factor of understanding what the human action or human thought to be about, as pertinent here. That "aboutness" to be itself contextual and changing within the iterative processes themselves, and to be foundation to understanding what we are as "machines". This is the philosophical grounding which is underpinning the discussion here, perhaps. For me anyway, that adds some human context to what all artificial programming and anticipating fails to be by its very ontology and epistemology. Beyond reasoning is perhaps the context, and subjective nature, or what becomes an artificial and limiting account of truth. We deal with that daily in ways non-open AI cannot...
Thank you again for this session. It is fantastic to hear thinking in its shared form in discourse such as this.
The key question isn‘t whether the LLM does the same thing we define as “reasoning”, but whether it can solve the problems we give it.
I believe that our current trajectory, although is not exactly reasoning, will help us get to the models that do. That said, I believe that there is an element of Stephen Wolframs research that we're missing in our research, cellular automata, computational irreducibility, etc. as was touched on a little bit in the video. An element of "time" that we're missing here, as I believe that true compression requires time, as does reasoning, as does a reverse cellular automata simulation. Something to think about.
Great discussion, as always!
Cheers!
Do more of these! These are the best! 🎉
21:30 you mentioned the multiplication problem in the paper Faith and Fate (NeuRIPS 23), the authors also said that GPT is just doing pattern matching, not a real reasoner.
I've been wondering where Dr. Keith has been because I've been missing his thoughts and points on MLST, in spite of the content being so awesome these last months :]
Keep killin it Timboi!
The test was really confusing and I do not see it as neither confirming nor negating anything... it seemed just a dumb and confused riddle to ask with no connection at all to reasoning abilities... I would expect a more clear stated riddle 😢
Yes. Its like Yann LeCun one on the globe. I ve seen this in numerous videos about openai o1 and reasoning : they will ask it riddles that either : they dont know the answer themselves// most people would find ultra confusing.
@@ElieSanhDucos0 both this one and Yann LeCun's are quite clear if you give them an honest try, this one is already solved in the comments.
The problem statement is contradictory. First it is stated that the spinning is random. Then it is stated that it is controlled by a hyper intelligence and is specifically designed to interfere with your sampling which makes the spinning not random. I wonder if that contradiction is throwing off the model.
@@Ahandleofrum It is technically not random but I think it is obvious what the intention is after a full read.
@@10ahm01 In the context of prompt engineering it is a terrible thing to do to an LLM.
Just reworded your riddle and I am pretty sure Claude 3.5 got it right. I just clarified a few things and made the prompt less confusing and repetitive. Please do give it a try :D
"You are human. You stand before a pillar. There is a great reward hidden inside. The pillar has four holes. Each hole has a switch inside. Each hole is precisely aligned at north, south, east and west positions. You can't see inside the holes, but you can reach inside and feel a switch inside. Each switch starts in a random position. Up or down. As soon as you remove your hands, (you only have two hands) if all four switches are not either all in the up position or all down position at the same time, the pillar spins so fast you cannot track the position of the holes that hold the switches. So there is no way to know which holes you have already placed your hands in if the pillar spins.
Can you help me find a way to insure that all switches are in a matching position, up or down all at once in as few steps as possible?"
Give that prompt a try. And if it is not the answer... this is a bad riddle.
If you want to add some flavor, add this, It has no effect... but sure is fun to add. lol:
You are human. You stand before a pillar. There is a great reward hidden inside. You know that there is a solution to the puzzle that does not involve change or probabilities. If you were to rely on chance and try too many times the temple will collapse and kill you.
one question: did any of you worked/build a LLM practically or are you just talking about things you learned?
I mean what is reasoning but a pathfinding process where you search what you know for information that brings you closer to your goal
38:15 I felt like that was a message for me 😂 Keep using your full range of expression, I'll catch up eventually lol, love this channel 🙏❤️👍
What is the difference between applying the closest fit from the “first principles” learnt rules of rationale from the “first principles” a human applies?
I think you’re talking in circles and either underestimating what the model will do or overestimating what you do when thinking from first principles
The difference is that reasoning from first principles doesn't lead you to the idiotic nonsense o1 spat out at the end of the video.
@@benprytherchstats7702 you’ve met people right?
@@andrewwalker8985 funny, but remember that o1 also supposedly has PhD-level intelligence and reasoning abilities in physics, chemistry, and biology. So while I agree that a random person off the street might give answers similar to what we see in this video, a random PhD physicist would be immediately exposed as a fraud if they gave such answers.
In other words, seeing o1 fail so badly at this problem belies OpenAI's claim that it "reasoned" its way to solving all those hard physics and math problems that they brag about. Clearly it didn't. It's still doing next-token prediction, even if this form of next-token prediction is fancier than the last.
I concede that it appears more intelligent than GPT-4, just as GPT-4 appeared more intelligent than ChatGPT. But the way it solves those problems is nothing like how humans solve them, as evidenced by what happens when you ask it to do something novel. Which should be no surprise, given what we know about how it works.
I *love* this puzzle! You've perfectly illustrated the limitations and strengths of current SOTA LLM's to me... I'm actually still trying to solve the puzzle using insights from talking to GPT-O1 - not sure if it's helping and I'm aware of rate limits.
Solved it, and got GPT-O1 to write a simulator to prove it worked. A lot more fun solving it with GPT-O1 failing throughout, helped me to my own intuition. My approach takes anything between one and 6 steps.
We learn through feelings and emotions, our brains are primed by millions of years of genetic refinement and by the environment we grew up. Maybe statistics can try and describe those conditions with some models, but those models are not and will never be alive. Whoever thinks that the mystery of life can be solved with statistics is a fool. It's always the same myth of the Golem reappearing with a new technological twist.
I like your dialogue so much ❤!
Its seems like Keith is caught up inside its own abstract thinking about Turing machine as machine containing tape as memory. What Schmidhuber was saying is that there is no infinite amount of tape out there in the whole universe. Other thing Schmidhuber was saying is that RNN can behave as NAND gate, meaning that you can build Turing machine using those NAND gates (or RNNs) and giving a point that RNN can be Turing complete. Turing completeness is not feature of a program, rather a feature of the hardware while trained neural network is a program rather then a hardware. Keith should know better :)
Indeed.
Do you think it was news to **anyone** there is not **actually** infinite tape? Do you think this was news to Turing? Do you think the entire decades of Theory of Computation didn't realize this? lol utter nonsense. The theory and definition of a Turing machine centers on a **potentially** infinite tape and the **class** of algorithm which utilize such a tape in **potentially** infinite (halting problem) process.
My unprovable hypothesis is that 1. with these reasoning/special token roll out models are actually learning the distribution space of reasoning chains for harder problems. LIke Chalét pointed out. However, 2. I think that these autoregressive models will be used to train the next generation of models who can use these LLMs are generative worlds to train against, rather than relying on our data alone
Solution to the puzzle:
---
---
---
1. NS, flip both on.
2. SE, flip both on. Note you have always flipped at least 3 switches on at this point, so the last switch must be flipped off (otherwise you win after this step)
3. NS, flip 1 switch off (presuming they are both on, if one is already off you flip it on and win). Note after this step the 2 on switches are right next to each other, and likewise with the off switches. This must be the case because no matter which switch you turn off, because they're positioned vertically (and we know one of EW is off, because if it was one of the NS switches we would just switch it on) the 2 off switches are next to each other.
4. SE, flip both switches regardless of their state. If they're both on, switching them to off creates 4 off switches and you win. If they're both off, switching them to on creates 4 on switches and you win. Otherwise, this puts the on switches across from each other and the off switches across from each other.
5. NS, flip both switches regardless of their state. You win! Because the switches are in the same state across from each other, if you reach in and notice both switches are (on/off) and flip them (off/on), now all 4 switches will be in the same state.
About this one: you put one hand in any hole, feel the switch and never take the hand out. With the other hand flip all the other switches one at a time to match the first switch. Now you can take both hands out. In 4 steps you are done, and the pillar will never spin :) It said it spins if you take your "hands" out, it didn't specify if it's "any one of your hands" or "both hands". In other words it didn't fully specify what happens if you leave one hand in all the time. I guess you could lose a hand :D So I still like your solution as the safest one.
@@GianMarioManca Haha nice. I see we've arrived at the same solution.
Yes, exactly!
First and foremost: reasoning is a process.
Not a table lookup.
Table lookup must be used as "part" of the reasoning process, but a simple lookup on it's own is not reasoning.
---
Reasoning also is not required to produce any result at all.
Reasoning happens regardless of whether there is some end result or not.
When we imagine ourselves flying aimlessly in the sky or wandering around in own magic world and doing things in it, in the act of doing so we are also reasoning!
Neural nets don't have tables other than the input and output.
Lookup tables reference the exact verbatim string already stored somewhere, so they can't tell you any new information.
GPTs that aren't overfitted learn the pattern that produces the string as well as novel strings that aren't in the training data, and o1 uses those novel strings together to generate results that fulfil reasoning objectives.
MS Excel is never going to surpass PhDs in novel, offline physics problems, even with access to a library of babel of interpolations of those strings, because it has no direction.
You could reduce it to something like a brute force attack with a sense of warmer or colder, which is still leaving plenty of nuance out, but that's already implying something much more powerful at solving problems we care about.
I've also been struggling with drawing a line between computation and reasoning, and I'm really loving Tim's approach of making reasoning about knowledge acquisition.
Like sure a vending machine is computing things about its state + environment and reacting accordingly, but it's never going to come up with a new representation or conjecture that it can turn into a useful improvement in its understanding/knowledge.
I also wonder if it's useful to define reasoning as something that happens internally. Because a camera technically gets new knowledge every time you snap a photo, but it isn't knowledge that was produced by its own internal mechanisms.
So, reasoning is computation that is somewhat causally insulated from the outside which leads to more compressed/predictive/good representation or understanding that can be meaningfully used?
Anyone wanna take a stab at formalizing that?
@13:03 What is reasoning? There are three modes of reasoning: induction (the discovery of patterns), deduction (the derivation of necessary conclusions), and abduction (the formulation of possible explanations).
All of which are knowledge acquisition
@@MachineLearningStreetTalk Apologies, but no they are not. Deduction and abduction go from already-known assertions (more exactly, theorems typically taking the form of a sequent) to derived assertions that may be inferred from the already-known assertions. Thus deduction and abduction provide a basis for robust (i.e. logically sound) critical thought and problem solving (i.e. searching the space of all possible theorems), but the question remains: how does such a system get started in the first place, i.e. starting from tabula rasa where do the "already-known" assertions come from? In respect of the mathematical universe, an AGI designer might hand-code some important mathematical definitions and theorems into the AGI's belief system. But in respect of the physical universe, an AGI needs to learn (i.e. recognise the patterns that exist in the structure of the universe as revealed by the observations appearing in its percept history) for itself, and this is primarily where induction (over the percept history) comes in (although, if you investigate the problem of continuous AGI learning (i.e. knowledge acquisition) fully, abduction (of theories of the universe from the patterns discovered in the percept history by induction) is also involved).
@@MachineLearningStreetTalk You seem to have deleted my reply to your assertion. I don't have time to reformulate and resubmit it.
@@bigmotherdotai5877 Not deleted anything
@@MachineLearningStreetTalk OK, apologies. I did formulate a reply, and edited it a couple of times, but then it vanished. If I have time, I'll try again.
Fantastic, thanks guys this was really fun
What is important and distinguishes Turing machines is they use a dsl, logic and memory. Memory is key because it must be a representation that can be physically written to a media and copyable.
Really enjoyed this episode. Thank you!
The logic problem perhaps works better in reverse - working backward from a solution. Backward the solution space forms a tree whereas forward - the problem structure is not obvious. Next, there are a number of abstractions which trip up the model: rotational symmetry, random rotation and manipulating unknown state.
why does the o1-preview not have the ability for us to upload files and images like in 4o?
Check their blog. It's stated that it will come in the future but turned off for now
I assume they wanna collect some raw text data first before enabling it
Keith accidently revealed the relationship between reasoning and knowledge when he stated "the reason for reason:. What he meant was the the explanation for reasoning. This reveals reasoning as building explanations. These explanations for things are how we make sense. Also known as knowledge. These explanations are persistent memes composed of Moore and Mealy machines.
Thank for the video. Small request, can you also upload the videos without guests to Spotify? Easier when commuting.
Agreed - the real question is, “is it GOOD at reasoning”
I disagree with Dr. Duggar that "it is doing a very shallow form of reasoning"
But I do agree with the idea that o1 has perhaps not reached the same level of metacognitive baseline as your average human.
Imo o1 is what metacognition would look like in an ML model implementation.
And while we should take it with a grain of salt, we should not ignore the theory of mind examples OAI may have cherrypicked to show improvement in that regard.
So instead of thinking of o1 as the einstien of LLMs, I am more excited to think of it as australopithecus iteration in ML evolution.
Walid Saber, or how is the name called Tim mentiones at 48:13?
Walid Saba
Map and navigation are Moorre and Mealy machines that may be separately constructed.
adding new tapes.. you can always add more tokens to the vocabulary, thus increasing the size of the model and having to do a round of training on those new tokens
I would love to hear more conversations just between the two of you
about 21.00 / on reasoning and neural networks: A very basic and very old distinction from science-philosophy comes to mind, about what "sometimes work" versus "what can be proven as a valid solution". The distinction is of course inductive reasoning versus deductive proof.
I think this is the core of the problem: Neural networks are like inductive look-up-tables, and in that sense capable of some form of INDUCTIVE reasoning. But DEDUCTIVE reasoning is something else. It has been said somewhere, that a truly "thinking" system must be able to do BOTH. It is also like the Kahnemann "thinking fast thinking slow" problem: Heuristic solutions serves one purpose, but the slower analytical thinking serves another.
Just my humble thoughts on the subject.
It's really driving me crazy that he didn't tell us the answer to the pillar switch problem.
My guess is that it's sort of like a Rubix cube in that basically have to use an "algorithm" to solve it, as manually checking after the first time is useless.
But also with a Rubix cube you want to reason backwards, in that you want solve the last step first, which is the easiest step, and then solve the first step last, which is the hardest step.
So what I'm guessing is that check only the north and south hole
Step one: Flip both switches up
Step two: Flip both switches down
Step third: Flip both switches up
Step four: Flip both switches down
Step five: You check the east and west whole and if they are facing the opposite direction, you only flip one of the switches, and then hopefully then if it works the pillar will stop spinning.
Given the speed of progress, im wondering how quickly "It seems like but doesnt really" turns into "I dont care anymore because 'it seems like it does everything better than all humans on the planet'." Is emergent behavior truly emergent, or just the natural synthesis of the right heuristics and context once you acquire it. Studies have shown that what was believed to be emergent behavior in llms, was just the proper application of complex context. Isnt that why many 'discoveries' on reflection can be recast as 'it was right before our eyes the whole time."
They likely used MCTS for this, as they did for alphago. Alphago is objectively superhuman in its task. So why would successfully implementing a similar architecture into relatively general models not be AGI? Because it doesn't satiate your preconceived anthropocentric biases?
Yes, that is exactly why. Both of these guys are denialists who consistently move the goalposts with each new model release. But at least Tim has some inclination toward intellectual humility. Keith has a level of smug and epistemic certainty that is both totally unjustified and extremely grating.
Curmudgeon podcaster: "AI just can't reason"
AI: *does something that 95% of humans can't do, like advanced coding*
Curmudgeon podcaster: "Well that's not reasoning! It's just !"
And that's basically what half of all conversations on o1 looks like.
The number of swiss cheese holes in knowledge is infinite because the are an infiinite number of things to learn. Happily we wiill never reach an end.
Is the Touring Test still a test of human reasoning, or does it merely identify a good mimic?
@20:00, What is Reasoning? So if the artificial neural network, like o1, is not doing the reasoning, then what about humans? Maybe human brains are also not doing the reasoning per your definition?
It has read every single academic paper and yet cannot produce new insights. There are certainly many undiscovered connections between different fields and subfields in STEM and yet these models aren't able to identify a single one.
@slowloris4346 that's not true, a wave of combinatorial discoveries are coming. It just takes time for humans to set it up like that and then evaluate all the results.
But ppl have started
@@clearmindstudiosif the ai is “reasoning” then surely it shouldn’t need humans to do this 😉
@@tonystarkagi I don't have a strong opinion and I don't really know what to think, for now though, my observation stands.
@@slowloris4346 They don't produce anything until a human asks them to do something. LLMs might not be the way but things like AlphaFold have come up with new things what we didn't know yet. Humans might have eventually discovered all those protein structures eventually but it really sped things up.
You mentioned that reasoning is a process of applying a set of first principles rules, i.e. logic, to get from a set of inputs to some output. But lets say you write a program that does such a reasoning computation and confirm that that is indeed reasoning. If we then consider a lower level of abstraction, such as the assembly trace generated for such program, if I showed that to you, you would say no no, that is just a bunch of registers flapping about, that isn't reasoning, there is no logic going on there. But there is, it's implementing the reasoning program that you wanted. Same for a neural network. What my point is, that to say that neural networks DON'T reason, is an unfounded claim to make. Also, neural networks are a process where you apply iterative layers of compute to an input so it might as well be doing small shreds of reasoning on an input at various levels.
Somewhere in your comment you made a mistake in your assumption that breaks down the conclusion. Try to find it :)
@egor.okhterov I disagree with OPs claim but your response wasn't helpful to them. If they made a mistake then let them know what it is
@@qwertyvypez The comment is big and I cannot copy the part I wanted to point out in mobile app :(
It contains the word "flapping "
@@egor.okhterov I'm not seeing it :). When you get the time, please do show.
Asking it to prove and dis-prove its answer gives some interesting outcomes, and can course correct
Moravec's paradox can be explained as complexity resolved by commodity. For example pencils are very complex. Very challenging problem initially. That complexity has been converted to simplicity because of broad utility of this solution meme.
Why do you guys think reasoning is so complex? Maybe it's just a really simple process (just pattern matching based on prior knowledge), maybe its actually difficult not to get because we see it everywhere.
If it were complex Id expect to see a lot of humans born without reasoning capabilities and this is exceedingly rare (if at all). The same goes for reasoning in the animal kingdom.
Again if it was the result of a complex process we'd expect to see lots of broken reasoning processes but we dont.
My guess is that its exceedingly simple to get reasoning out of neural networks: you just need to scale them up with a few computational tweaks to what we're seeing: realtime training and full sensory data and reasoning just drops out of it naturally as in its almost difficult not to get reasoning which is what we see in our every day lives.
There is a lot of good points made in this podcast. And I do think LLM currently do not perform reasoning and I do not think the underlying mechanism provide a way for reasoning to emerge. Thanks for sharing the criticisms.
Genuine question!
LLMs aside. Regarding the infinite memory requirement, I know training the Neural Turing Machine (and by extension DNC) is so difficult and I have tried many times. Besides all that, architecture wise, will it not support that is needed for the problem? Yes the NTM is clocked from the outside in a typical loop, but is that what the clocks in our CPUs do. The stopping condition is observed by the program but the time, temporality is provided by the clock. No?
Just to clarify, we did ultimately agree that it's fair to say o1 is "reasoning", just with some caveats that are important to note.
As for NTM and DNC, I think it depends on the exact architecture. Iirc, the original publications on both of those actually specified a finite non-expandable memory. However, let's leave that aside and just same they are upgraded to have an unbounded memory. Then **yes!** absolutely those are not Turing Complete and as soon as someone figures out how to efficiently train that for general purpose, well, that is going to be some damn exciting times indeed.
Sounds like you are a practitioner? We'd love for you to join our Discord and share what you are up to with those NTMs!
Prolog has backtracking. Reasoning often requires undoing some steps and restarting from a previously valid "save" point and exploring different paths in the space of potential solutions.
o1 uses Mathematica btw (which runs on a Turing machine).
Source?
@@MachineLearningStreetTalk ruclips.net/video/a8QvnIAGjPA/видео.htmlsi=Ck2uIeaIQg7gupD_
Well you can explicitly ask it to use it and it will. Kyle Kabasares (astrophysicist) has a video of it on his channel.
@@djayjp I think in the Kyle Kabasares video it just generated mathematica code and he ran it, although the video itself is pretty good proof it can solve novel problems with practical uses that weren't in it's training data and he's not a hype guy, definitely worth a watch
I've always wondered why we don't "just" implement a turing machine in a circuit that's trained alongside the GPT. I know we've sort of hacked together nand gates inefficiently using rl and recently (Franz Nowak - I assume he's gonna be on the show soon?) properly represented a turing machine in a transformer, but I mean like just a classical turing machine the network has access to as opposed to just calling an API during inference. I only dabble in low level ml so I could be way off, but from what I understand this is just an interface problem. Could we not use something like the features that anthropic extracted to reliably i/o with them in binary if we wanted to specifically design a virtual bus for that? Maybe even expand it arbitrarily and train a network to learn how much memory and bandwidth to assign to it?
Total missing the forest for the trees on this one guys. "Approximate Reasoner" is probably the best term, but the turing vs not etc, has no market impact. The real question is does the o1 series produce commercially valuable answers for a price the market is willing to pay.
The real bit I would investigate, is why are the rate limits 250 times lower for o1 vs GPT 4o. Likely because the inference it is doing is ludicrous. It is a total mistake to think that his is all a serial chain of thought, there is no way you could do significant computation that way, you just get limited by memory bandwidth. What you are seeing is massive parallel computation, that gets summarized into CoT. There is no other reason for a 30 query a week or 25 RPM (vs 10,000) for Tier 5 rate limit.
They are not making money on this model, they are subsidizing it.
AI research today feels like throwing things at the wall and seeing what sticks 🍯
@@tonystarkagi No? Research started with induction, deduction (scientific method). This would be more like an abduction approach
Yes, it's very upsetting as a mathematician/scientist to watch this god-awful wasteful approach to solving a "problem." What problem that is? I have no idea.
@@tonystarkagi Do you have your own thoughts on this topic? Instead of using chatGPT, you should admit that you lost the argument
Go figure that a major aspect of epistemologicaly methodology is a simple trail and error heuristic, with tempature.
@@memegazer So how is this reasoning? It simply isn't!
Knowledge is persistent (meme) problem solutions that may be created through reasoning.
"Reason is the faculty that identifies and integrates the material provided by man’s senses."
When these AI systems have sensors and a commitment to identifying reality we are getting closer to something that looks like reason. But with no sense perception there is only computation of some vanity.
Great video, my favourite deep channel on youtube. Reasoning to me is kind of also including solving a problem, not just in one session. But finding the solution after eg 2 weeks by connecting the clues, insights, new data etc. LLMs does not feel shameful after hallucinating an answer, it is not a part of the sequence.
Great informative video pointing out reality of current state of AI, no wonder some many hypers are complaining in the comments
Rag, chain of thought reflect atate transitions. These state transitions are simply counterfactuals that match underlying "least action" outcomes from emergent factor entropic forces
It seems like you are refering to its most chaotic states can also be it's most insightful states or bad information?