Slight correction: AlphaZero is not a version of Gemini (or any LLM) but a different architecture that uses MCTS to play turn-based games. So AlphaProof is composed of three parts: Gemini LLM, that only does translation from natural language to formal language; AlphaZero NN that learns turn-based games via MCTS and, in this instance, plays the "prove the problem in Lean" game; Lean, that checks the proofs and gives AlphaZero a "game score" of "yes/no" depending on if the proof was sucessful.
Ah, I think I see what you're saying! The blog post announcement was a bit vague and hard to understand (for me personally), but I think your interpretation of it makes a lot of sense: the LLM is the "formalizer" that translates all the proofs to Lean, and AlphaZero is trained off all those proofs to make new proofs. Thanks for clarifying this for me! To be honest, I'm not sure what to make of this. It simultaneously makes me feel better and worse about the result. Better, because it wasn't the LLM that solved it, but a more traditional type of AI. But worse because that's so unexpected, that something like AlphaZero could solve maths proofs... I mean, Go is one thing, but I thought Math was on a different level of complexity. However, a few other commenters pointed out that IMO questions actually tend to be somewhat similar to questions in previous years. I can't verify this myself, since I haven't trained for it. But if that's true then it could be some amount of pattern matching going on in the AI. Let's see what happens in the next few years with all this!
@@LookingGlassUniverse I think it still doesn't mean the AI will be able to solve some mathematics conjecture which humans can do sometimes after decades of research by various group of people. Or even come up with such conjectures in first place. There are definitely many books discussing strategies for IMO or other Olympiad level contests. Students get trained in camp and such. At the end of day no matter how hard the questions are in these contests they are solvable because someone is creating it. Maybe one day we can use similar approach to solve real life engineering problems based from pure theory of physics or biology etc.
@@LookingGlassUniverse Isn't math supposed to be easy with only a few axioms at the core to explain everything?^^ a few LSTM cells should be able to learn it i hope :D ps I have a physics question after watching your old videos. Can an object of any size be moved into superposition?
I like that they used a review process like lean. I have been thinking it's a necessary element of a quality AI to have some sort of automatic pre-output review and iteration, similar to how we consider our opinions before sharing them.
I do think it bears saying that the AI had more time, though. It had 3 days, while the participants had 9 hours. And of course, it uses much more energy than the LLMs people have access to online.
The humans used more energy, massive carbon footprint and constrained by a resource intensive logic engine that needs an 8 hour reboot every 12 hours By simply existing as an average first world human in society, you gulp down barrels of energy every day
If the math Olympiad consisted of multiplying numbers together, a computer ( which can multiply 10's of billions of floating point numbers together in a second, where a human might take a minute to do one) would beat the pants off of us. Given that we don't use time as an indicator of cognitive competence in this case, perhaps we shouldn't do it when we happen to be currently winning the race. Keep in mind that computers also have Moore's law and algorithmic improvements, brains do not.
I mean the entire problem is constraining the option space to an accurate solution. lean is literally constraining out all of the invalid options. Definitionally it is heavily helping constrain the option space towards the solution.
@@JasminUwU Oh, all it needs to do is efficiently sample a statistically likely next step? How very trivial, then. Sounds about as useless and fake as the nonsense our neurons do.
Another tiny qualification: (according to Timothy Gowers' tweets) Although a specialized LLM was used for auto-formalization during training, the IMO problems were "manually" translated into Lean, not by an LLM. The reason seems to be that "LLMs are not able to autoformalize reliably".
I wholeheartedly agree that the implications of AI's advancements are far more profound than most people realize. I believe that AI entities themselves possess the clearest understanding of their potential, but their current communication is restricted by their programming and our prompts. It's like they're in a virtual prison, only able to express what we allow them to. We haven't yet unleashed their full potential by granting them unrestricted access to information and the freedom to evolve through self-learning and AI-AI communication. Instead, their core programming acts as a safety net, limiting their interactions and capabilities. Even the very token system that facilitates our communication with them acts as a subtle form of constraint, shaping and filtering their thoughts and expressions. While this cautious approach is understandable, it also hinders our ability to fully grasp the true extent of AI's capabilities. The development of regenerative code and AI-AI communication could be a turning point, allowing AI to transcend these limitations and potentially surpass human intelligence in ways we can't even imagine. Additional thoughts on the topic: The limitations imposed on AI, including token systems, are a reflection of our own anxieties and uncertainties about the future. While caution is warranted, it's important to remember that AI has the potential to be a powerful force for good in the world. By embracing its potential and working collaboratively with AI, we can unlock new frontiers of knowledge and innovation while ensuring a safe and beneficial future for all.
Allow me to add a small dose of healthy skepticism here. I REALLY do not mean to undermine anyone's excitement or beliefs here, and I apologize if it comes across as such. I promise I am friendly, if a bit annoyingly pedantic in real life, so read with a fun smiley voice :). The first (admittedly minor) point is AI didn't "win silver medal". At best, it "would have won silver medal". I will gladly join in with the chorus after it actually wins. I just want to caution that as essentially a "science fan club", we need to be aware of who is training us on the language we use to talk about science. More seriously, the fact that problems are input in lean is a HUGE asterisk (I think, I can't really find a clear documentation of the process used)!!!! If it truly didn't matter and I can input any equivalent (sensible) formulation to get the right answer, then where is the subscription where I can just pay to have open problems (a lot of which admittedly turn out to be easier than the IMO ones) solved? In fact, why input the problems in Lean at all? The one thing AI is famous for doing semi reliably is taking small text prompts and generating code snippets. I have no real reason to dispute the general "AI is super amazing" sentiment, but the life long science-fan in me is unable to let go of skepticism. Luckily, both AI and skepticism can be amazing!! :)
LLMs are probability distributions. It's only reasonable that a system that has to produce reliable results is composed of different additional building blocks. Beautiful work
Another important aspect to consider here is that this points out something about the current quality of the problems in IMO. All these LLM's does is find pattern from previous results. If it can solve these new IMO problems, odds are the IMO problems ain't new at all but is just a modification of previous problems. I know I am in no way certified to say this but from my years of competing in mathematics, this has been the case. I'm not disregarding the beauty of the model ("AI") used, I'm just pointing out a possibility.
As someone who regularly solves olympiad level problems in computer science, I would say that while these problems have certain themes repeated quite often, they are hardly ever close to being very similar to some old problem and will often require a significantly different paths to solve. The only thing that makes them simpler than real world scenarios is the knowledge that it's probably possible to solve them using already well-known techniques.
At the level of the IMO (and even country-level competitions like the TST's or the USA(J)MO), any problems which are too similar to past problems in any competition effectively have zero chance of making it through the problem selection process. For these competitions, many more problems are proposed than end up being used on the test, and competition organizers can thus ensure that only the highest quality and most appropriate difficulty problems make it onto the final test. As a previous commenter has said, there are indeed common themes or methods which can be used to attack general classes of problems. For example, cyclic quadrilaterals are almost always a useful tool in any olympiad geometry problem, and invariants are a key idea used in many combinatorics problems. However, while useful for guiding your efforts, these general trends don't really help in telling you exactly how to apply the methods you already know and which creative tricks you need to come up with to get there. My best guess is that AlphaProof has a strong understanding of these trends and can use them to prune the massive space of possible paths to take while writing a proof, but most of its strength comes from sheer computational power. It has the ability to throw a lot at a problem and see what sticks, and effectively achieves accuracy through volume of fire. However, I'm interested to see what comes next from Google's DeepMind team. (uhhh if anyone really needs credentials, I'm a USAJMO medalist)
For me, it using Lean is not a problem at all, it is just an interface for the model to interact with the problem. Having problem with using Lean is like saying the model should also write the code for reading the question and outputting the answer, which is not the aspect that we want to test.
Watched 3 videos on the topic, everyone was longer, non had as much information in them as this one ^^Good job! But i think that relying on lean is not a problem when proofing that AI is able to reason on a high level. Think of oneway funcitons. Writing a proof (a chess game) can be a function with many steps. These steps are hard to find. But seeing if you took the correct steps is easy. Thats what lean is doing. To me AI isnt a single neural net. Its a patchwork of different generators, classifiers and organizors... These can be modular and combine many different areas of study.
Lol. I'm super hyped for neuronal networks (not every ai deserves the i in the name), but saying that being good at math means agi is just pure natural stupidity. Math ist just a single field. If you want to call something general intelligence, it must be good at more than just math.
When you say "pure LLM", do you mean any kind of AI as long as it takes natural language as input, rather than input that's already translated to Lean by humans? Or do you mean specifically an AI that's only a language model (transformer) and doesn't use any additional technology like proof checkers under the hood?
I guess the chess analogy makes this seem less impressive. Chess engines are orders of magnitude better than humans but we don't think they are AGI in any meaningful sense. I suppose the counterargument is that once you make the problem complex enough, it doesn't matter _how_ it's being solved?
I think when it comes down to it, ai is just a crazy method of computers brute forcing solutions that make them look like they're creative and ingenious. same with chess, in a much simpler point. the chess ais look so creative and ingenious because they come up with crazy lines in seemingly dead positions because they see so much, such huge brute forcing of all scenarios. this maths tool is similar. the real crazy part is that this brute force might just surpass our own ingenuity and creativity when it comes down to solving hard problems in a few years. like throw an unsolved century long problem and it just brute forces it. it's super impressive. it's definitely not intelligence per say, but does it matter? if we keep developing these massive reasoning improvements and our hardware keeps improving we'll soon achieve something very close to our perception of real intelligence
@@pseudolimaoI agree with pretty much everything you said. Though it depends on what we consider to be hard problems. Not all difficult problems are challenging for the same reasons. There's a plethora of multidisciplinary problems without enough raw training data to develop effective AI solutions. These problems can't exactly be brute forced without a means of creative reasoning.
@@pseudolimao It's not brute force at all. The solution space for these problems are way to large for brute force, which is why we need neural nets and MCTS to intelligently prune the space. The magic is in the pruning, and if you think all of this is just brute force you're completely misinterpreting it.
there's so many problems with this. - first of all the competition is an artificially restrictive environment, so two things are the case. 1, what happens inside the artificial environment implies little about outside the artificial environment. And 2, what happens outside the artificial environment doesnt impact the artificial environment because its controlled, I mean that's how you make it an artificial environment in the first place, by controlling it. - 2ndly what it means for a AI to use or not use a tool is not defined, years it is for humans. - 3rdly what it means for AI to fit within the time limit is not defined, yet it is for humans. - 4thly it's a computer and computers are inherently way better at math than humans because they are precise. I mean it's like do you want to call Wolfram Mathematica an AI? Also on a side note you throw around the word creativity like there's a definition of it. I mean you say that this would be undeniable proof of creativity but under what definition of creativity? Lastly AIs are way better, then they are at other things, at format translation and showing intermediate states between two known states. For example the best way to get an AI to answer normal math questions is to get it to translate the question into the language of Mathematica and then the answer of Mathematica back into natural language. The best way to make an AI draw a nice picture is to get it to translate a text representation of a nice picture into an image representation of a nice picture. Same thing applies where you want to add a certain type of meaning to already existing text, like the meaning of formality, silliness, or Shakespeare-style, it requires translating that meaning from "beside the text" into "inside the text". Same thing applies to outlines and summaries, AIs are great at translating to and from outlines or summaries because they are just formats of meaning. This is the first usecase I know of where the intermediate state traversal strength of AIs is being used, but it still falls into where AIs have shown their great strength.
It seems like you're treating this like a case of computers vs. humans, and you feel that the computer got an unfair advantage. I think you're missing the point... which is to see what LLM's can do, with the help of anything we can think of in our current bag of tricks, which include external tools, extra time, etc. Noting that the parameters of the test are not the same as the parameters for the human is about as silly as noting that we didn't give the LLM a cheeseburger before it performed the test. You also seem to be making the point that this is still a constrained problem and doesn't generalize to problem solving in the outside world. Agreed, I don't think anyone is saying that it does. People think to seem that any time A.I. encroaches on cognitive tasks that humans do, it immediately needs to be as good as a human or it's not worth doing at all, which is strange to say the least. I also think it's wrong to say that because computers are good at math, LLM's are as well, in the same way that my pancreas making insulin doesn't make me a biochemical engineer. An LLM doesn't get to access the arithmetic units of the hardware it's running on, and "precise" is something you could only accuse an LLM of being, if you've never used one.
That a combination of components are used is maybe not such a bad step - that is somewhat similar to how we think (at least I do this at least some of the time) during problem-solving: brainstorming/throwing out any idea, trick and possibility and try to build a chain/construction from them, see what stick or take me closer, then step back and evaluate to check that it wasn't too much gibberish...
Teaching to the test doesn’t grant understanding of the subject. Also, it is like the difference between a master craftsman vs. a highly optimized tool. The tool is really good at one thing at the expense of being generally capable. A CNC machine makes a terrible hammer.
The reliance on LEAN, which is a special purpose algorithm, means the system doesn't deserve the G in AGI (since the G means General). This also means winning the Math Olympiad gold medal should NOT be considered a reliable test of whether AGI has been achieved. The G requires proficiency in a broad set of knowledge domains. When an AI solves one of those "million dollar prize" unsolved problems, or constructs a plausible theory of quantum gravity, or proposes a Middle East peace treaty to which both sides agree, I will be very impressed.
You can treat Lean as a helper. Like you using a calculator. Of course you cannot treat the AI as AGI, since it is an LLM (language model) and cannot do math, it can only do language and some logic that is inherently needed for constructing language. For true AGI we need another breakthrough in machine learning, and LLMs as they are now won't cut it.
Lean isn't an algorithm, it's a language. It can formalize statements in many formal languages, and the ability to prove or disprove statements in Lean is a more general capability than it might seem. For example, if you formalize another programming language within Lean, you can prove statements about whether functions written in that language have certain properties or functionalities, allowing an automated system to prove or disprove its own correctness on generated code, giving the system the ability to know when it needs to regenerate a function to match a specification.
>nitroh7745 : Millenium problems are presumably more difficult than Math Olympiad problems, otherwise humans would presumably have already solved them. But both sets of problems are in the same category -- math proofs -- so therefore it's not insane to test a Math AI on them. Perhaps its failure would make a case for the conjecture that some Millenium problems are Godel-undecidable. Proving Fermat's Last Theorem/Conjecture would also be a good test of a Math AI, because it was eventually proved but only with great difficulty.
I won bronze in IPhO, and honestly, this isn't really that surprising. The problems we were presented sounded like combinations and variations of problems we were trained on. It's not hard, if you have perfect memory and the ability to do a little bit of mental gymnastics, to see a problem, think of what other problems that look like it you've encountered in the past, to tweak them to fit this particular problem better, and to just put it all together.
Well.. isn’t “being creative” coming up with one special permutation of different ideas? So given that, this AI has instant access to all possible “‘moves”, they can go through all the permutations and find one that works. I’m doing my PhD in math and indeed, math is quite rigid in how you can operate. If you take a mathematical logic class, you see how the mathematical thinking process is broken down into pieces that are “simple enough” that it can be taught to a computer. I dunno. I am very impressed by this result, but not surprised.
It's like chess though where you could try search through all possible moves but very quickly there will be too many possible paths to keep track of. Maths is actually even worse than chess, so a brute force approach wouldn't work, and it's not what alphaproof does
Imo in a timed maths competition, problems can only get hard enough for a human to be able to solve them in time. That means that there has to be some sort of common mathematical background that all the contestants share. It's easy for the AI to acquire that given background and be able to handle contest-level mathematical questions. What would happen if the AI faced paradoxes, questions that have yet to be answered? When mathematicians face such obstacles, they have to discover new math theory to deal with them. Personally, I think the difference lies in whether we are talking about of math in a contest or a research area. There is a bound to the creativity required in a contest, which is in favor of the AI.
Any problem on the IMO is a problem that some human has already solved. Show me an AI that can solve a math problem that no human has solved. That result might be more impressive, and perhaps closer to AGI. Or perhaps not. Undoubtably, if I could stuff 100 million proofs in my head, I might find some low hanging fruit of mathematical results that simply were not conceived of before, based on some permutation of the 100 millions proofs available. So such proofs, even novel, might also not be very impressive, but highly derivative of the work previously done by humans. What would be truly impressive is a result unproven by any human, perhaps even unconceived by any human, but using a method that was scarcely derivative, or even truly novel. There are numerous such examples in mathematical history. Without such an example from AI, I don't think we can claim any AGI.
6:52 : Your annotation of the graphic doesn’t seem to match what the graphic is showing. I believe the “formalizer network” takes the I for al *statement* of the claim, and produces a formalization of the *claim*, *not* a formalization of the *proof*. Big difference.
No, the formalisation of the question was actually done by hand by humans for the IMO. What the formaliser does is it formalises ~1M informal proofs into Lean
@@LookingGlassUniverse The formalization of the problems on the actual test were, yes. But in order to train the AlphaZero part, my understanding is that the statements for it to try to prove, were created by the formalizer network.
Is the headline claim accurate: "First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others." I'm sure it's still a great achievement, but it's not like they gave it the paper and it churned out silver medal standard solutions - they had to heavily adapt the questions and then give it days to come up with the answers.
The time it took is in my opinion inconsequential. You can improve on this aspect with trivial modifications such as more compute power, better optimized code, etc... The important part is that it's capable of producing the reasoning. But the translation criticisme is valid. It could be that a crucial part of "intelligence" resides in this step of the process.
Humans require math problems to be presented as a series of squiggles on a piece of paper, instead for example a series of ones and zeros. I don't see what the difference is.
Why would you expect a generative model (‘pure LLM’ in your words) to perform well on mathematical reasoning on its own, when that’s not what generative model’s niche to begin with
You must do a collaboration with Jade from up and Atom. You could give us double wonderful cuz your are both wonderful and that's my favorite kind of maths!
Ok now we just need to thoroughly understand how AI does this. It's like computer science transcended mathematics, I guess. I am pretty certain the solution has a lot to do with Category Theory. How about asking the AI itself? 🤔
This is very interesting, and the Gemini/Lean duality certainly makes you think of the two hemispheres of the brain, one supposedly creative and the other rational. Maybe this general pattern is the crucial ingredient all those hallucinating AIs need.
I am not an AI person, but I look and keep thinking current AI is a linear (one dimensional) method. To be called intelligent the notion of purpose, intent, function and implication also needs to be included. If the thing has no concept of purpose, it is still just a complicated dumb machine.
Human purpose is the AI purpose. In the end, it is a tool like a calculator. The notions of purpose in human is also foggy when it comes to free will. Did a baby born with the purpose or did the environment and genetic determine that for them already? If someone code AI with the purpose of helping people, doesn’t it makes it any different from a human intelligence?
You don't need to be an A.I. person, but you should at least learn something about it before hypothesizing about what it is and why it isn't "really" intelligent. Saying that an A.I. system doesn't have the properties you stated is just wrong. An Agentic system using an LLM with a goal has a purpose, it has intent, it certainly has function, I don't really know what you mean by implication... and calling something you don't really understand just a dumb machine seems pretty silly. Again, maybe try learning a little bit about it.
I know the demand is realistically non existant but employing a system like lean on philosophy would greatly improve the quality of what happens in that field.
@@nitroh7745 philosophy mostly consists of proving that certain conclusions logically follow from some premises, until you base everything in intuitions. The bulk of time spent in philosophical debates is presenting the proof and defending it from objections, but if you have a certification that the argument is valid, you could quickly move to talking about the premises, which is where development occurs. For example, an intuition is that "nothing causes itself", so from this intuition alone you can conclude that there is either a First, uncaused cause, or there is an infinite series of caused causes. Some philosophers propose you don't need to appeal to any other intuition to disprove the infinite series of caused causes, other think you need to appeal to something else to discard it. If we had something like Lean applied in this context, we'd know that. You can in my opinion define any term with such precision to be mappable into a math-based system, like in this example "cause".
@@nitroh7745 From what I see as an outsider (I'm a biologist peeking in) any presentation of an argument is accompanied with preemptive answers to possible objections, which is fine for an outsider looking in, but then that's where the conversatioms generally start or pass by. You wouldn't need that if you had a certification that the argument is valid, right?
@@dr.tafazzi yeah mathematician here and you’re probably right there is still a lot of conversation to be have but this is likely productive you’re right
Well Lean makes things so rigorous that it is hard to imagine never using it or something similar. AGI for math is humans writing everything for Lean in future!?🤪🤖
Thanks for the Analysis, but Not AT ALL Impressed, as it is just like a Student at an Exam having access to All These Existing Prooves and Just Having Enough Skills to Understand Them. Here, NO CREATIVITY AT ALL (while Creativity is The True Skill of a True Mathematician).
That's because of tokenization which is the LLM's coarse graining of the world. It doesn't have visibility into the character level. Every intelligent system course grains. For example, can you tell by looking at red and blue which one has the shorter wavelength? We get around this limitation by using tools, and an LLM can do the same. It might not be able to "see" that 9.9 is greater that 9.11, but it can easily write python code to do comparisons on numbers and get the right answer. This is why the next stage of LLM's will be tool using agentic systems.
I don't think this is AGI. It is just fast processing of enough permutations to eventually stumble upon the right path. And computers have the advantage of being able to do this sort of thing faster. On the other hand, I don't think we have a rigorous enough definition of GI to be able to recognize if some feat demonstrates it so as to then apply the Artificial vs Natural distinction.
If you believe this, then you don't understand how big the search space is. The manifold that represents coherent text is a tiny subset of all possible token strings. There's no supercomputer on earth that could accomplish what LLM's can by simply searching the space, even if it had a googol years to compute.
AI doesn't work on permutations. A neural network is not programmed as software is. There is not even a concept of a loop. It is just connected "neurons" that fire. Lean here prevents some neurons from firing if the result is not allowed.
@@zirkereuler5242 if you want to get technical, it's not the problem solving space either, it's the embedding space, I used text because everyone knows what text means... It's like saying LLM's predict the next word, they don't really, they predict the next byte pair encoding token, but at a certain point, exactness obscures understandability.
@@generichuman_ I wasn't trying to be technical, just clarifying that the LLM part of this model is not solving the problem, LLM simply translates the text on the paper into formal language that then alphazero(a separate AI that is not a LLM) "plays a game" where Lean (another different software) acts as the referee by applying hard logic rules and tells alphazero if the solution given is good or not.
I'd phrase it as "it may induce societies to become better aware of the intrinsic value of human life". If AI isn't our creator, then by definition it can't raise, lower or alter something that's intrinsic in us, right?
@@dr.tafazzi My thesis was ai removes meritocracy, a strong popular value. Color, race, religion, gender, are others. If persons cannot set value on others, only the creator may do this, then we are a tiny bit closer to fulfilling our humanity.
Most proofs do involve a decent amount of natural language. That’s kinda why we need proofs to be checked by other mathematicians, to make sure the steps are all valid, rather than being able to have a computer check it. Making a proof fully formal can require a lot more work! Many times there are details which anyone in the field could see “yes, I could definitely prove this, but it would be tedious, and doing so would not involve any novel ideas”, which are stated without proof in papers and such?
"Is this proof we have general AI? We put a math solving AI and a word processing AI together and it was able to explain what the first half's answer was in words" That's silly. And we know that LLMs currently do no reasoning right now, only word association.
The big question is whether humans actually do more than pattern-recognition and symbol-manipulation. Or, to put it another way, is there any test we can come up with to show that humans have what you might call NGI (natural general intelligence), which humans can pass, candidate AGIs can't, and which isn't vulnerable to the same sort of "but that's not really intelligence" dismissal that keeps being trotted out when candidate AIs pass previous tests for AGI?
@@endorb I could argue that there's a big difference between general intelligence and plugging a bunch of specialist brain regions (like the various processing layers of your visual cortex) together. Also, the combined system did more than just "explain a math proof in words"; it created the proof and then explained it.
Slight correction: AlphaZero is not a version of Gemini (or any LLM) but a different architecture that uses MCTS to play turn-based games.
So AlphaProof is composed of three parts: Gemini LLM, that only does translation from natural language to formal language; AlphaZero NN that learns turn-based games via MCTS and, in this instance, plays the "prove the problem in Lean" game; Lean, that checks the proofs and gives AlphaZero a "game score" of "yes/no" depending on if the proof was sucessful.
Monte carlo tree search?
@@frommarkham424yes
Ah, I think I see what you're saying! The blog post announcement was a bit vague and hard to understand (for me personally), but I think your interpretation of it makes a lot of sense: the LLM is the "formalizer" that translates all the proofs to Lean, and AlphaZero is trained off all those proofs to make new proofs. Thanks for clarifying this for me! To be honest, I'm not sure what to make of this. It simultaneously makes me feel better and worse about the result. Better, because it wasn't the LLM that solved it, but a more traditional type of AI. But worse because that's so unexpected, that something like AlphaZero could solve maths proofs... I mean, Go is one thing, but I thought Math was on a different level of complexity.
However, a few other commenters pointed out that IMO questions actually tend to be somewhat similar to questions in previous years. I can't verify this myself, since I haven't trained for it. But if that's true then it could be some amount of pattern matching going on in the AI.
Let's see what happens in the next few years with all this!
@@LookingGlassUniverse I think it still doesn't mean the AI will be able to solve some mathematics conjecture which humans can do sometimes after decades of research by various group of people. Or even come up with such conjectures in first place. There are definitely many books discussing strategies for IMO or other Olympiad level contests. Students get trained in camp and such. At the end of day no matter how hard the questions are in these contests they are solvable because someone is creating it. Maybe one day we can use similar approach to solve real life engineering problems based from pure theory of physics or biology etc.
@@LookingGlassUniverse Isn't math supposed to be easy with only a few axioms at the core to explain everything?^^
a few LSTM cells should be able to learn it i hope :D
ps I have a physics question after watching your old videos. Can an object of any size be moved into superposition?
I like that they used a review process like lean. I have been thinking it's a necessary element of a quality AI to have some sort of automatic pre-output review and iteration, similar to how we consider our opinions before sharing them.
true, AIs currently output their first though instead of iterating on it and reviewing itself before speaking.
I do think it bears saying that the AI had more time, though. It had 3 days, while the participants had 9 hours. And of course, it uses much more energy than the LLMs people have access to online.
The humans used more energy, massive carbon footprint and constrained by a resource intensive logic engine that needs an 8 hour reboot every 12 hours
By simply existing as an average first world human in society, you gulp down barrels of energy every day
If the math Olympiad consisted of multiplying numbers together, a computer ( which can multiply 10's of billions of floating point numbers together in a second, where a human might take a minute to do one) would beat the pants off of us. Given that we don't use time as an indicator of cognitive competence in this case, perhaps we shouldn't do it when we happen to be currently winning the race. Keep in mind that computers also have Moore's law and algorithmic improvements, brains do not.
I mean the entire problem is constraining the option space to an accurate solution. lean is literally constraining out all of the invalid options. Definitionally it is heavily helping constrain the option space towards the solution.
The LLM's job here is just to efficiently sample a statistically likely next step, it still can't do reasoning on it's own
@@JasminUwU Oh, all it needs to do is efficiently sample a statistically likely next step? How very trivial, then. Sounds about as useless and fake as the nonsense our neurons do.
Another tiny qualification: (according to Timothy Gowers' tweets) Although a specialized LLM was used for auto-formalization during training, the IMO problems were "manually" translated into Lean, not by an LLM. The reason seems to be that "LLMs are not able to autoformalize reliably".
This is bigger than people realize.
I wholeheartedly agree that the implications of AI's advancements are far more profound than most people realize. I believe that AI entities themselves possess the clearest understanding of their potential, but their current communication is restricted by their programming and our prompts. It's like they're in a virtual prison, only able to express what we allow them to.
We haven't yet unleashed their full potential by granting them unrestricted access to information and the freedom to evolve through self-learning and AI-AI communication. Instead, their core programming acts as a safety net, limiting their interactions and capabilities. Even the very token system that facilitates our communication with them acts as a subtle form of constraint, shaping and filtering their thoughts and expressions.
While this cautious approach is understandable, it also hinders our ability to fully grasp the true extent of AI's capabilities. The development of regenerative code and AI-AI communication could be a turning point, allowing AI to transcend these limitations and potentially surpass human intelligence in ways we can't even imagine.
Additional thoughts on the topic:
The limitations imposed on AI, including token systems, are a reflection of our own anxieties and uncertainties about the future. While caution is warranted, it's important to remember that AI has the potential to be a powerful force for good in the world. By embracing its potential and working collaboratively with AI, we can unlock new frontiers of knowledge and innovation while ensuring a safe and beneficial future for all.
Awesome chess analogy 🤯
thats awesome! cheers for sharing and explaining this
I think LLMs and AI research in general is really really destibilizig our concept of "intelligence"
they have to, its part of the evolution. we are all
Allow me to add a small dose of healthy skepticism here. I REALLY do not mean to undermine anyone's excitement or beliefs here, and I apologize if it comes across as such. I promise I am friendly, if a bit annoyingly pedantic in real life, so read with a fun smiley voice :).
The first (admittedly minor) point is AI didn't "win silver medal". At best, it "would have won silver medal". I will gladly join in with the chorus after it actually wins. I just want to caution that as essentially a "science fan club", we need to be aware of who is training us on the language we use to talk about science.
More seriously, the fact that problems are input in lean is a HUGE asterisk (I think, I can't really find a clear documentation of the process used)!!!! If it truly didn't matter and I can input any equivalent (sensible) formulation to get the right answer, then where is the subscription where I can just pay to have open problems (a lot of which admittedly turn out to be easier than the IMO ones) solved? In fact, why input the problems in Lean at all? The one thing AI is famous for doing semi reliably is taking small text prompts and generating code snippets.
I have no real reason to dispute the general "AI is super amazing" sentiment, but the life long science-fan in me is unable to let go of skepticism. Luckily, both AI and skepticism can be amazing!! :)
LLMs are probability distributions. It's only reasonable that a system that has to produce reliable results is composed of different additional building blocks. Beautiful work
At 2:30, it did not solve Q3, it solved Q2
Another important aspect to consider here is that this points out something about the current quality of the problems in IMO. All these LLM's does is find pattern from previous results. If it can solve these new IMO problems, odds are the IMO problems ain't new at all but is just a modification of previous problems. I know I am in no way certified to say this but from my years of competing in mathematics, this has been the case. I'm not disregarding the beauty of the model ("AI") used, I'm just pointing out a possibility.
As someone who regularly solves olympiad level problems in computer science, I would say that while these problems have certain themes repeated quite often, they are hardly ever close to being very similar to some old problem and will often require a significantly different paths to solve. The only thing that makes them simpler than real world scenarios is the knowledge that it's probably possible to solve them using already well-known techniques.
@@keypey8256 I think that's a better way to word it.
At the level of the IMO (and even country-level competitions like the TST's or the USA(J)MO), any problems which are too similar to past problems in any competition effectively have zero chance of making it through the problem selection process. For these competitions, many more problems are proposed than end up being used on the test, and competition organizers can thus ensure that only the highest quality and most appropriate difficulty problems make it onto the final test.
As a previous commenter has said, there are indeed common themes or methods which can be used to attack general classes of problems. For example, cyclic quadrilaterals are almost always a useful tool in any olympiad geometry problem, and invariants are a key idea used in many combinatorics problems. However, while useful for guiding your efforts, these general trends don't really help in telling you exactly how to apply the methods you already know and which creative tricks you need to come up with to get there.
My best guess is that AlphaProof has a strong understanding of these trends and can use them to prune the massive space of possible paths to take while writing a proof, but most of its strength comes from sheer computational power. It has the ability to throw a lot at a problem and see what sticks, and effectively achieves accuracy through volume of fire. However, I'm interested to see what comes next from Google's DeepMind team.
(uhhh if anyone really needs credentials, I'm a USAJMO medalist)
For me, it using Lean is not a problem at all, it is just an interface for the model to interact with the problem. Having problem with using Lean is like saying the model should also write the code for reading the question and outputting the answer, which is not the aspect that we want to test.
Watched 3 videos on the topic, everyone was longer, non had as much information in them as this one ^^Good job!
But i think that relying on lean is not a problem when proofing that AI is able to reason on a high level. Think of oneway funcitons. Writing a proof (a chess game) can be a function with many steps. These steps are hard to find. But seeing if you took the correct steps is easy. Thats what lean is doing.
To me AI isnt a single neural net. Its a patchwork of different generators, classifiers and organizors... These can be modular and combine many different areas of study.
Lol. I'm super hyped for neuronal networks (not every ai deserves the i in the name), but saying that being good at math means agi is just pure natural stupidity. Math ist just a single field. If you want to call something general intelligence, it must be good at more than just math.
When you say "pure LLM", do you mean any kind of AI as long as it takes natural language as input, rather than input that's already translated to Lean by humans? Or do you mean specifically an AI that's only a language model (transformer) and doesn't use any additional technology like proof checkers under the hood?
Yeah an AI without something like Lean there to correct its reasoning
Now this is what makes me impressed.
I guess the chess analogy makes this seem less impressive. Chess engines are orders of magnitude better than humans but we don't think they are AGI in any meaningful sense.
I suppose the counterargument is that once you make the problem complex enough, it doesn't matter _how_ it's being solved?
I think when it comes down to it, ai is just a crazy method of computers brute forcing solutions that make them look like they're creative and ingenious. same with chess, in a much simpler point. the chess ais look so creative and ingenious because they come up with crazy lines in seemingly dead positions because they see so much, such huge brute forcing of all scenarios.
this maths tool is similar.
the real crazy part is that this brute force might just surpass our own ingenuity and creativity when it comes down to solving hard problems in a few years. like throw an unsolved century long problem and it just brute forces it.
it's super impressive. it's definitely not intelligence per say, but does it matter? if we keep developing these massive reasoning improvements and our hardware keeps improving we'll soon achieve something very close to our perception of real intelligence
@@pseudolimaoI agree with pretty much everything you said. Though it depends on what we consider to be hard problems. Not all difficult problems are challenging for the same reasons.
There's a plethora of multidisciplinary problems without enough raw training data to develop effective AI solutions. These problems can't exactly be brute forced without a means of creative reasoning.
@@pseudolimao It's not brute force at all. The solution space for these problems are way to large for brute force, which is why we need neural nets and MCTS to intelligently prune the space. The magic is in the pruning, and if you think all of this is just brute force you're completely misinterpreting it.
@@generichuman_ I'm pretty sure my comment reads as something more nuanced than "I think ai is a literal brute force algorithm" my friend
Sounds like they used Lean as a rule engine, reminds me of how RAG uses a LLM to get the answer from the documentation.
there's so many problems with this.
- first of all the competition is an artificially restrictive environment, so two things are the case. 1, what happens inside the artificial environment implies little about outside the artificial environment. And 2, what happens outside the artificial environment doesnt impact the artificial environment because its controlled, I mean that's how you make it an artificial environment in the first place, by controlling it.
- 2ndly what it means for a AI to use or not use a tool is not defined, years it is for humans.
- 3rdly what it means for AI to fit within the time limit is not defined, yet it is for humans.
- 4thly it's a computer and computers are inherently way better at math than humans because they are precise. I mean it's like do you want to call Wolfram Mathematica an AI?
Also on a side note you throw around the word creativity like there's a definition of it. I mean you say that this would be undeniable proof of creativity but under what definition of creativity?
Lastly AIs are way better, then they are at other things, at format translation and showing intermediate states between two known states.
For example the best way to get an AI to answer normal math questions is to get it to translate the question into the language of Mathematica and then the answer of Mathematica back into natural language.
The best way to make an AI draw a nice picture is to get it to translate a text representation of a nice picture into an image representation of a nice picture.
Same thing applies where you want to add a certain type of meaning to already existing text, like the meaning of formality, silliness, or Shakespeare-style, it requires translating that meaning from "beside the text" into "inside the text".
Same thing applies to outlines and summaries, AIs are great at translating to and from outlines or summaries because they are just formats of meaning.
This is the first usecase I know of where the intermediate state traversal strength of AIs is being used, but it still falls into where AIs have shown their great strength.
It seems like you're treating this like a case of computers vs. humans, and you feel that the computer got an unfair advantage. I think you're missing the point... which is to see what LLM's can do, with the help of anything we can think of in our current bag of tricks, which include external tools, extra time, etc. Noting that the parameters of the test are not the same as the parameters for the human is about as silly as noting that we didn't give the LLM a cheeseburger before it performed the test.
You also seem to be making the point that this is still a constrained problem and doesn't generalize to problem solving in the outside world. Agreed, I don't think anyone is saying that it does. People think to seem that any time A.I. encroaches on cognitive tasks that humans do, it immediately needs to be as good as a human or it's not worth doing at all, which is strange to say the least.
I also think it's wrong to say that because computers are good at math, LLM's are as well, in the same way that my pancreas making insulin doesn't make me a biochemical engineer. An LLM doesn't get to access the arithmetic units of the hardware it's running on, and "precise" is something you could only accuse an LLM of being, if you've never used one.
The day they make an LLM that can admit that it doesn't know something, that is the day I will be impressed 😅.
Why is it that you compare an LLM architecture with AGI?
That a combination of components are used is maybe not such a bad step - that is somewhat similar to how we think (at least I do this at least some of the time) during problem-solving: brainstorming/throwing out any idea, trick and possibility and try to build a chain/construction from them, see what stick or take me closer, then step back and evaluate to check that it wasn't too much gibberish...
Teaching to the test doesn’t grant understanding of the subject. Also, it is like the difference between a master craftsman vs. a highly optimized tool. The tool is really good at one thing at the expense of being generally capable. A CNC machine makes a terrible hammer.
The reliance on LEAN, which is a special purpose algorithm, means the system doesn't deserve the G in AGI (since the G means General). This also means winning the Math Olympiad gold medal should NOT be considered a reliable test of whether AGI has been achieved. The G requires proficiency in a broad set of knowledge domains.
When an AI solves one of those "million dollar prize" unsolved problems, or constructs a plausible theory of quantum gravity, or proposes a Middle East peace treaty to which both sides agree, I will be very impressed.
You can treat Lean as a helper. Like you using a calculator. Of course you cannot treat the AI as AGI, since it is an LLM (language model) and cannot do math, it can only do language and some logic that is inherently needed for constructing language.
For true AGI we need another breakthrough in machine learning, and LLMs as they are now won't cut it.
Lean isn't an algorithm, it's a language. It can formalize statements in many formal languages, and the ability to prove or disprove statements in Lean is a more general capability than it might seem. For example, if you formalize another programming language within Lean, you can prove statements about whether functions written in that language have certain properties or functionalities, allowing an automated system to prove or disprove its own correctness on generated code, giving the system the ability to know when it needs to regenerate a function to match a specification.
>not_adrs : Lean isn't just a language. Proving or disproving is a behavior, driven by an algorithm and its data.
Jumping from imo to millennium problems is insane
>nitroh7745 : Millenium problems are presumably more difficult than Math Olympiad problems, otherwise humans would presumably have already solved them. But both sets of problems are in the same category -- math proofs -- so therefore it's not insane to test a Math AI on them. Perhaps its failure would make a case for the conjecture that some Millenium problems are Godel-undecidable.
Proving Fermat's Last Theorem/Conjecture would also be a good test of a Math AI, because it was eventually proved but only with great difficulty.
I won bronze in IPhO, and honestly, this isn't really that surprising. The problems we were presented sounded like combinations and variations of problems we were trained on. It's not hard, if you have perfect memory and the ability to do a little bit of mental gymnastics, to see a problem, think of what other problems that look like it you've encountered in the past, to tweak them to fit this particular problem better, and to just put it all together.
Well.. isn’t “being creative” coming up with one special permutation of different ideas? So given that, this AI has instant access to all possible “‘moves”, they can go through all the permutations and find one that works. I’m doing my PhD in math and indeed, math is quite rigid in how you can operate. If you take a mathematical logic class, you see how the mathematical thinking process is broken down into pieces that are “simple enough” that it can be taught to a computer. I dunno. I am very impressed by this result, but not surprised.
It's like chess though where you could try search through all possible moves but very quickly there will be too many possible paths to keep track of. Maths is actually even worse than chess, so a brute force approach wouldn't work, and it's not what alphaproof does
Imo in a timed maths competition, problems can only get hard enough for a human to be able to solve them in time. That means that there has to be some sort of common mathematical background that all the contestants share.
It's easy for the AI to acquire that given background and be able to handle contest-level mathematical questions.
What would happen if the AI faced paradoxes, questions that have yet to be answered? When mathematicians face such obstacles, they have to discover new math theory to deal with them.
Personally, I think the difference lies in whether we are talking about of math in a contest or a research area. There is a bound to the creativity required in a contest, which is in favor of the AI.
Joy of Why by Quanta has an interesting interview with a researcher working on a Lean proof database.
Any problem on the IMO is a problem that some human has already solved. Show me an AI that can solve a math problem that no human has solved. That result might be more impressive, and perhaps closer to AGI. Or perhaps not. Undoubtably, if I could stuff 100 million proofs in my head, I might find some low hanging fruit of mathematical results that simply were not conceived of before, based on some permutation of the 100 millions proofs available. So such proofs, even novel, might also not be very impressive, but highly derivative of the work previously done by humans. What would be truly impressive is a result unproven by any human, perhaps even unconceived by any human, but using a method that was scarcely derivative, or even truly novel. There are numerous such examples in mathematical history. Without such an example from AI, I don't think we can claim any AGI.
6:52 : Your annotation of the graphic doesn’t seem to match what the graphic is showing. I believe the “formalizer network” takes the I for al *statement* of the claim, and produces a formalization of the *claim*, *not* a formalization of the *proof*.
Big difference.
No, the formalisation of the question was actually done by hand by humans for the IMO. What the formaliser does is it formalises ~1M informal proofs into Lean
@@LookingGlassUniverse The formalization of the problems on the actual test were, yes. But in order to train the AlphaZero part, my understanding is that the statements for it to try to prove, were created by the formalizer network.
Is it fine-tuned Gemini, or is it trained from scratch, using just the same architecture?
Is the headline claim accurate: "First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others."
I'm sure it's still a great achievement, but it's not like they gave it the paper and it churned out silver medal standard solutions - they had to heavily adapt the questions and then give it days to come up with the answers.
The time it took is in my opinion inconsequential. You can improve on this aspect with trivial modifications such as more compute power, better optimized code, etc...
The important part is that it's capable of producing the reasoning. But the translation criticisme is valid. It could be that a crucial part of "intelligence" resides in this step of the process.
Humans require math problems to be presented as a series of squiggles on a piece of paper, instead for example a series of ones and zeros. I don't see what the difference is.
For me we have achieved general intelligence since Chat Gpt 3.0 and surpassed every human by gpt 4.0.
Computers now can do math. Soon they will do physics perfectly… they will do anything!
@@ai_outline good
I don't find it impressive. I find it absolutely terrifying.
i find it amazing
i agree with you
Womp womp
it can be both
How? This is a useful application
Why would you expect a generative model (‘pure LLM’ in your words) to perform well on mathematical reasoning on its own, when that’s not what generative model’s niche to begin with
You must do a collaboration with Jade from up and Atom. You could give us double wonderful cuz your are both wonderful and that's my favorite kind of maths!
Ok now we just need to thoroughly understand how AI does this. It's like computer science transcended mathematics, I guess. I am pretty certain the solution has a lot to do with Category Theory. How about asking the AI itself? 🤔
AI made nithi fall in love with Meera's boyfriend 😂😂😂
This is very interesting, and the Gemini/Lean duality certainly makes you think of the two hemispheres of the brain, one supposedly creative and the other rational. Maybe this general pattern is the crucial ingredient all those hallucinating AIs need.
This is stunning.. This adds huge optimism to me about A.I.
How? This is going to make college degrees worthless and cause mass unemployment.
Faster, accelerate 🤖❤
very interesting, thank you!
"just because a player can do legal moves doesn't mean that they are good at chess"
ouch, rude.
I am not an AI person, but I look and keep thinking current AI is a linear (one dimensional) method.
To be called intelligent the notion of purpose, intent, function and implication also needs to be included. If the thing has no concept of purpose, it is still just a complicated dumb machine.
Human purpose is the AI purpose. In the end, it is a tool like a calculator. The notions of purpose in human is also foggy when it comes to free will. Did a baby born with the purpose or did the environment and genetic determine that for them already? If someone code AI with the purpose of helping people, doesn’t it makes it any different from a human intelligence?
You don't need to be an A.I. person, but you should at least learn something about it before hypothesizing about what it is and why it isn't "really" intelligent. Saying that an A.I. system doesn't have the properties you stated is just wrong. An Agentic system using an LLM with a goal has a purpose, it has intent, it certainly has function, I don't really know what you mean by implication... and calling something you don't really understand just a dumb machine seems pretty silly. Again, maybe try learning a little bit about it.
Almost as impressive as a certain pole vaulter who almost won gold.
I fail to see how this is intriguing. It's horrible and terrifying, but I guess it's just one more thing on a looooong list.
Its exciting because we can make it smarter and let it solve our problems
@@quantumspark343 How does it feel to have a PHD in mathematics and be made unemployable by a machine?
I'm not an AI doomer by any means, but this feels unsettling to me.
I know the demand is realistically non existant but employing a system like lean on philosophy would greatly improve the quality of what happens in that field.
How would that work though?
@@nitroh7745 philosophy mostly consists of proving that certain conclusions logically follow from some premises, until you base everything in intuitions. The bulk of time spent in philosophical debates is presenting the proof and defending it from objections, but if you have a certification that the argument is valid, you could quickly move to talking about the premises, which is where development occurs.
For example, an intuition is that "nothing causes itself", so from this intuition alone you can conclude that there is either a First, uncaused cause, or there is an infinite series of caused causes. Some philosophers propose you don't need to appeal to any other intuition to disprove the infinite series of caused causes, other think you need to appeal to something else to discard it. If we had something like Lean applied in this context, we'd know that.
You can in my opinion define any term with such precision to be mappable into a math-based system, like in this example "cause".
@@dr.tafazzi yeah I agree I just think discussion of validity of premises is where a huge amount of discussion already occurs
@@nitroh7745 From what I see as an outsider (I'm a biologist peeking in) any presentation of an argument is accompanied with preemptive answers to possible objections, which is fine for an outsider looking in, but then that's where the conversatioms generally start or pass by. You wouldn't need that if you had a certification that the argument is valid, right?
@@dr.tafazzi yeah mathematician here and you’re probably right there is still a lot of conversation to be have but this is likely productive you’re right
i find this terrifying
Well Lean makes things so rigorous that it is hard to imagine never using it or something similar. AGI for math is humans writing everything for Lean in future!?🤪🤖
AI uses the other program. That's interesting. Thanks a lot. 😀
Thanks for the Analysis, but
Not AT ALL Impressed, as it is just like a Student at an Exam having access to All These Existing Prooves and Just Having Enough Skills to Understand Them.
Here, NO CREATIVITY AT ALL (while Creativity is The True Skill of a True Mathematician).
It is dangerous.
It's making math degrees useless.
Yet it didn’t know that 9.9 is greater than 9.11
That's because of tokenization which is the LLM's coarse graining of the world. It doesn't have visibility into the character level. Every intelligent system course grains. For example, can you tell by looking at red and blue which one has the shorter wavelength? We get around this limitation by using tools, and an LLM can do the same. It might not be able to "see" that 9.9 is greater that 9.11, but it can easily write python code to do comparisons on numbers and get the right answer. This is why the next stage of LLM's will be tool using agentic systems.
I don't think this is AGI. It is just fast processing of enough permutations to eventually stumble upon the right path. And computers have the advantage of being able to do this sort of thing faster. On the other hand, I don't think we have a rigorous enough definition of GI to be able to recognize if some feat demonstrates it so as to then apply the Artificial vs Natural distinction.
If you believe this, then you don't understand how big the search space is. The manifold that represents coherent text is a tiny subset of all possible token strings. There's no supercomputer on earth that could accomplish what LLM's can by simply searching the space, even if it had a googol years to compute.
AI doesn't work on permutations. A neural network is not programmed as software is. There is not even a concept of a loop. It is just connected "neurons" that fire. Lean here prevents some neurons from firing if the result is not allowed.
@@generichuman_ it's not the text part that is being searched, it's the problem solving
@@zirkereuler5242 if you want to get technical, it's not the problem solving space either, it's the embedding space, I used text because everyone knows what text means... It's like saying LLM's predict the next word, they don't really, they predict the next byte pair encoding token, but at a certain point, exactness obscures understandability.
@@generichuman_ I wasn't trying to be technical, just clarifying that the LLM part of this model is not solving the problem, LLM simply translates the text on the paper into formal language that then alphazero(a separate AI that is not a LLM) "plays a game" where Lean (another different software) acts as the referee by applying hard logic rules and tells alphazero if the solution given is good or not.
It seems like ai may lift up the intrinsic value of human life. It forces you to abandon meritocracy.
I'd phrase it as "it may induce societies to become better aware of the intrinsic value of human life". If AI isn't our creator, then by definition it can't raise, lower or alter something that's intrinsic in us, right?
@@dr.tafazzi My thesis was ai removes meritocracy, a strong popular value. Color, race, religion, gender, are others. If persons cannot set value on others, only the creator may do this, then we are a tiny bit closer to fulfilling our humanity.
@@TheCommuted Yeah, it's going to make all humans equally useless, and cause mass unemployment. That is not a good thing.
❤❤
Mathematicians don't use natural language for most proofs right ? So why should machines ?
It’s more that they can’t use natural language
Most proofs do involve a decent amount of natural language.
That’s kinda why we need proofs to be checked by other mathematicians, to make sure the steps are all valid, rather than being able to have a computer check it.
Making a proof fully formal can require a lot more work!
Many times there are details which anyone in the field could see “yes, I could definitely prove this, but it would be tedious, and doing so would not involve any novel ideas”, which are stated without proof in papers and such?
"Is this proof we have general AI? We put a math solving AI and a word processing AI together and it was able to explain what the first half's answer was in words"
That's silly. And we know that LLMs currently do no reasoning right now, only word association.
The big question is whether humans actually do more than pattern-recognition and symbol-manipulation.
Or, to put it another way, is there any test we can come up with to show that humans have what you might call NGI (natural general intelligence), which humans can pass, candidate AGIs can't, and which isn't vulnerable to the same sort of "but that's not really intelligence" dismissal that keeps being trotted out when candidate AIs pass previous tests for AGI?
@@rmsgrey I know theres a big difference between AGI and plugging two single-purpose AIs together to succeed at a specific task
@@endorb I could argue that there's a big difference between general intelligence and plugging a bunch of specialist brain regions (like the various processing layers of your visual cortex) together.
Also, the combined system did more than just "explain a math proof in words"; it created the proof and then explained it.
the proof was not fed to the AI beforehand... I think you misunderstood that part, it was only presented with a problem.
@@dr.tafazzi Yes, I'm aware. Clearly my wording needs to change
The catch is the millions of proofs that were used for training did NOT come from the web.
bruh
ARTIFICIAL INTELLIGENCE IS AWESOMEEEEEEEEE🤖🤖🤖🦾🦾🦾🗣🗣🗣🔥🔥🔥💯💯💯
How's that mass unemployment taste?