Can general intelligence do that? As in anyone can substitute you? Don't think so. Why set the bar so high for artificial general intelligence, when "normal" intelligence can't.
Yuuuup. I don't trust OpenAI at all on anything they claim. Until its in my hands and I can see what it can actually do I don't believe anything their hype department puts out. Just look at Sora.
It is impressive, but saying it is AGI is clickbait. The G is for general, you know that. They are focused on the benchmarks, and let’s celebrate that progress. But don’t call it AGI, they are still “teaching to the test”.
The point is that they're not teaching to the test. Also that you can't "teach to the test" because all problens in ARC-AGI require unique types of reasoning. This is the most generally intelligent model out by far and far more general than the vast majority of humans. If it can't do some thing yet that humans can do, sure, but no human can do everything that humans can do either. This is obviously AGI
They make the point of saying it was not trained specifically on any of these tests about 15:00, now whether you believe them or not is another thing but they are not according to them 'teaching to the test'
Why it’s not AGI yet: The context window remains a significant limitation. These models perform well with single questions but struggle when managing large projects that require tracking extensive context. As the amount of data increases, they start to hallucinate or lose coherence, unable to maintain a reliable thread of information. Until this issue is resolved, these models, while powerful, fall short of being true AGI.
o3 is not agi. chollet is already working on a new test set which he says on his website is only 30% solved by o3 (keeping in mind always that these tests are solved 95% by average humans). on the same site he shows three examples of tests o3 didnt solve. they are very easy. o3 has no vision. it doesnt see the tests, it only reads them line by line, number by number. chollet quote: "you will know when we have agi when coming up with tests that are easy for humans and hard for models becomes impossible." we are not there yet by far.
Very good point. Thank you. Yes, if we can still make tests that are easy for humans and difficult for ai, then that is pretty much the definition of "not agi".
@@headspaceaudio O3 can solve LOADS of problems that 99% of humans can't. But that doesn't hit the definition of AGI. Even if a model is barely as good as a normal human, but GENERALLY can solve any problem that a human can solve, that is AGI. No one is saying that o3 is not SMARTER than most or all humans. It probably is. But it is not "generally" intelligent in every way that a human is intelligent.
Also, there is no copyright issue, at most it's a trademark issue and they are in different markets, so it shouldn't cause much of a problem. The irony, stealing copyrighted material from all kinds of sources, they have no issue with.
Very true. It's generatic af. Honestly this model is impressive, very impressive and clearly outshines anything that was considered SOTA before hand. A significant breakthrough which will lead us further towards human obsolescence. AGI? It's just a generic term that literally has no one definition. We can't even define reasoning or conciousness, so no AGI will never have a meaning nor will the other terms. Just generic terms used toove goalposts.
Matthew is not even near expert. He is an idiot. Let’s call the system AGI if it starts automatically test, improve itself and contribute to humanity without human input.
@@fg6147 if you’re a marketer at OpenAI, AGI means whatever capabilities the latest model has. Expect every single new model from them from here on out to “finally achieve AGI.”
Prediction: The impression I'm getting is that this technology is becoming so resource intensive and expensive to run, that the top-tier stuff is not going to be for consumers, but giant companies and governments. As time goes by it'll be a "you can look but not touch" situation. Well get the watered down toys, while the giant entities get the super-powered versions and true AGI/ASI.
Imagine the power plays and social engineering and mass manipulation that those with the money to run these models to their advantage will exert over those that can't afford to harness its power.
Good at programming and mathematics does not qualify AGI. It's going to have to cognize 3D space and do things in the physical world to pass the AGI mark in my books. Impressive model o3 and it will replace a lot of jobs
95% agreed 👍. Just like a healthy human, if it doesn't know something or doesn't possess some specific intellectual skill, it can learn it and do it in principle.
I believe A.I has to replace blue-coller work as well as white coller work in order to be AGI. Reflex, instant instinct when a pipe dislatches and water spurts everywhere (plumber - instant fix while robot stares and is confused) Academic benchmarks alone are not enough. A.I needs to figure out the automatic and intrinsic way we learn about the world in the first 5-years of our lives, an essential part of human development and intelligence. Humans initially recieve intelligence through analog processing THEN we move onto symbolic language at a later age. With A.I it seems to be the other way around. I believe A.I needs to master robotics and analog understanding of its environment in order to be AGI. Not just mastering symbolic understanding.
Check out the new Genesis simulation platform running on Nvidia hardware that is for desktop computers. Autonomous robots will soon be able to do complex, human only, hands on, tasks faster than people.
"AGI according to Sam Altman and OpenAI" This is how I know you're being purposely untruthful, Sam Altman and OpenAI do not use the term AGI and they actively discourage it. They use 5 levels, and right now they're only on level 2.
They are on level 2 but moving to 3 fast. End of 2025 will be l3 and end of 2026 l5. It will take only 18 months from level 3 to 5, less than level 1 to level 3.
It’s excellent in math and programming; however, I always expected we would eventually be surpassed in these areas. I believe the real differentiator for agi intelligence is the ability to learn and remember like a human. If it acquires information about a person from a photo, it should recall those details when seeing the photo again. That’s when it can truly start learning to perform our jobs-and this, in my view, is what AGI will be.
Ok they already mentioned AGI teaser in their project feature launch video. Why people can't accept it , agi would be here by 2025. As if it can solve those problem on which it never trained with 87 percent performance then it's almost agi.
It's funny to watch AGI redefined as we evolve. Now it appears that a system can be qualified as AGI, but on a subset of abilities, a limited AGI. It appears true AGI will be AGI across the board on all skill sets. So OpenAI can still say they are waiting on full AGI.
Amazing and probably AGI however 'Semi private' on the ARC AGI eval. Full private tests on 'Simple bench' and other completely private tests will be the true tests.
For me, it is AGI. It has achieved 25% score in the hardest benchmark developed by mathematicians like Terence Tao already, and Tao expected the test to last for at least five years to come... No ordinary mathematician would score 25% in that, not even PhDs because those would be people specialized in very specific areas of Mathematics.
I think what would make the most sense is to allow AI have senses. So that it can see the world we are living in and not use the data that we have generated on the web.
Thank you for creating this video. Whether or not it qualifies as AGI is beside the point; it’s inevitable. There are valid reasons to feel both hopeful and apprehensive about its arrival.
Agreed and I'd say AGI was first achieved with Claude 3.5 Sonnet this summer. Once we got o1 mini and o1, it was pretty clear they were generally intelligent, could reason, learn new tasks on the fly, create new reasoning modalities on the fly etc. o3 is clearly AGI imo. But you're right that it is inevitable even if we say this particular one isn't. I think it's surprisingly tame to start with and people aren't/weren't ready for that. Regardless lots to be excited and concerned about indeed
Notice the props behind them, all items representative of major technological advancements in human history. Nice touch as we're on the verge of turning the future over to technology itself.
i think one thing we need to keep in mind is which category/aspect did the additional gain come from . Some times the single metric is a redherring, the models could possibly overfit on a certain category resulting in improved accuracy, which is good for press but in reality it could just be the same.
We humans can go out in the world see things, discover things, unless we allow AI to have such a freedom, they can never outsmart us. The current AI no matter how advanced at the end of the day is just a simple tool for us to use and simplify or speed up the mundane tasks we perform.
If AI has sufficient access to internet, surveillance cameras, personal documents etc., it could do a lot of harm without needing an embodiment. Current AIs have been shown to be capable of manipulating humans to do tasks for them.many current robots are connected to the internet in some way. A sufficiently advanced AI could also access there robots to very quickly gain the ability to walk around and discover things in the real world. In conclusion: a purely digital AI is not necessarily safe.
Probably not AGI because it's not general enough. o3 could be trained to be good at these kind of puzzles. You would have to open it up to the public and have them test it on truly novel and truly general IO tasks.
Oh, and AGI is never "at least in this dimension" THE WHOLE POINT IS IT'S ALL DIMENSIONS! So you basically have just a bunch of benchmark stats, no access to the model at all and you make such grand call? Ridiculous and disappointing. I thought you were over the hype but nah, it got to you too
Can the model train and improve itself? If not, then it's not AGI, just more comprehensively trained model. Even if it incorporates all humanity's knowledge, without ability to self adapt and incorporate new knowledge it's a frozen in time AI with amnesia.
For me dealing with the physical world is still essential to call it AGI. So, can it bake pancakes, put the trash out, paint a wall, install a light? Basic tasks. I'm quite sure we will have the robots soon, but we don't have them yet.
Thanks for the update Matthew. I think AGI has effectively been achieved with a somewhat competent human in the loop if these benchmarks are accurate. Massive productivity gain when GPT4 deployed & I started playing , with this hopefully having an API use case involved will be incredible to play with & apply at complicated tasks.
A human assisted/directed expansion of o3’s capability in a novel breakthrough scenario is straddling the fence with an intelligence explosion- let’s hope OpenAI lets us have ubiquitous ways to apply o3.
@@___Truth___ in other words, this model represents AGI as long as we include caveats that a human is involved to cover for the many ways in which this falls fall short of AGI.
On an evaluation test sample for elementary school students, there was an example where an up arrow was compared to a down arrow. The question was, given the left arrow, what should be the matching arrow; with up, down, left, and right arrows as possible answers. The expected result was the right arrow ( opposite direction ); but there is another "correct" answer that takes a smarter student to see. The mirror image of the up arrow around the horizontal axis through the center of the arrow, is the down arrow. Along the same horizontal axis through the middle of the left arrow, the mirror image of the left arrow is again the left arrow. Since this one seems to trip up humans ( especially the people who wrote the question to help determine if young students should go into gifted programs ), I would be truly impressed if an AI caught the ambiguity also.
This is the most generally intelligent model out by far and far more general than the vast majority (99.99%) of humans. If it can't do something yet that humans can do, sure you can find some specific task it cannot do if you spend time to identify it, but no human can do everything that humans can do either. o3 is obviously AGI, I don't know why people are complaining.
no its not it still hallucinates 😂 did openai say that ? o1 also outperforms humans in 80 percent plus tasks it can't plan it can't take time like humans can it develop full apps ?
It's not a true AGI until it has roots in all physical and theoretical fields. This system is still tethered to a stationary computing system in nearly every sense.
"OpenAI just released o3"- Not quite. They didn't release it: they announced it (talked about it)! See how Mr. Berman is always quick to talk about any updates coming out of OpenAI but very reluctant to talk about Google's. (Context: It took him a very long time (days) to make a video about Gemini 2.0, which is extremely impressive & at least available to play with in Google AI studio. These o3 models were announced few hours ago & aren't available publicly; yet see how he talks about them, like he has seen them already). That tells you where his heart is at! Keep that in mind as you watch this entire video & others.
It doesn’t even meet your own definition of AGI. You said it would have to be better than humans at most economically useful jobs. This is an AI being better than humans at a couple benchmarks.
It's crazy to think about task agents being powered by o3-mini and then a supervisor-type agent with o3. It’ll build full-stack apps. You’re reaching the no human needed in the loop sweet spot.
I don't care about AGI as much as 'The first model able to perform AI research with very little human supervision'. I think this is it. A few years back I predicted ~Halloween 2024 as the release date of such a model. It seems to have been a good prediction. If this model is as good as I think, it will inevitably lead to ASI.
The word AGI lost its official meaning because we were once so far away from it. But now that we're close, or dare is say, there, it doesn't feel like what we were expecting. I think we're becoming numb to technology advancements.
This model is very important because of what it implies... especially regarding the Arc Prize. I am still shocked (and anyone who knows what the Arc Prize is should be as well). However, calling it AGI isn’t even optimistic... it’s clickbait. Now... if they were to eliminate hallucinations and memory problems... I don’t know if I would call it AGI, but I do know that many skeptics would shit their pants.
When I started my new RUclips channel, Arctic Mindfulness Retreat, my dream was to help people prepare for this exact moment. A future where AI transforms every aspect of human life, leaving us to grapple with profound questions of purpose and meaning. Yet now that AGI is here, I realize I may have been too late to truly prepare anyone. Still, I remain committed. Through my channel, I’ll continue exploring mindfulness, the healing power of nature, and the human connections that can ground us as we navigate this brave new world. AGI and ASI challenges us to find noble purposes beyond the work and identities we’ve long clung to. It’s not just about surviving this transition-it’s about thriving with a deeper understanding of what it means to be human.
This is a jaw-dropping achievement. I think many people, including myself, are struggling to comprehend its significance. If this marks the beginning of an AGI era, then it's the kickoff/signal we've all been waiting(?) for.
A universal definition of AGI, maybe, maybe not, however, the evolution is still exponential. Breakthrough after breakthrough AI tickling on the verge of AGI is already revolutionizing our understanding, reality, and potential. More to come!
Hey Matt, get your head checked. This is not AGI because it doesn’t autonomously test, improve itself and do good stuff around the world by itself. If it’s truly AGI, we will have ASI in couple of weeks.
I can't see this as AGI, this is not self-training. It is simply solving few-shot example with these benchmarks. These synthetic benchmarks are not meant to define AGI, it is meant to demonstrate capabilities that are a step towards AGI. O3 clearly has achieved human capabilities in a number of important tasks, but these are not real-life applications. AGI will have been achieved when you can actually use it to solve an unknown differential equation or build a working model of a process in physics or build a model of say a cell signalling pathway from raw data in a particuliarly cellular context. It will be AGI, when it can direct a robotic arm to take an action in 3D. When it can drive and operate machinery. When it can adjust its prediction of a moving object's trajectory in real time to catch a grab a flying object. O3 looks like a real milestone towards AGI, but its still just a language processor. We can say that it is basically AGI within the language processing field, since it can clearly be applied not just to natural language but also symbolic logic, but I Am even skeptical about that. OpenAI says they didn't train on the various tests, and I believe that they didn't do so intentionally, but indirectly it is impossible. IF you are feeding the model a never ending diet of synthetic solutions of known physics problems you are training on the test. There are limited variations of using an already established physics model to solve a problem, but this is worlds apart from actually modifying a physics model or creating an entirely new one to account for new data. So even with language processing I am not convinced yet it is AGI. Since it performs so well, we can't reasonably exclude it however. We will have to wait and see. My gut instinct is that its not AGI and once we start working with it we will find that it has the same flaws and limitations as other models and its performance is simply the result of being better able to brute force things. Let me give you one of the examples I use to track model progression. A simple problem of the form x people do y work in t time. GPT 3.5 couldn't solve a problem like this reliably. GPT-4o could solve this mostly reliably. O1 gets it right every time. Now split the x people into slower and faster to add an extra dimension by "nesting" the problem. GPT-4o solves it but not reliably, O1 still solves it reliably, but not like it did the smaller problem. I bet O3 will solve this correctly every time, but increase the dimensionality and I am sure O3 will start to stumble as well, even though you are applying variations of the same formula. A human can work out the method for nesting and therefore thoretically solve the problem with any dimensionality. You can even write a bit of code that will solve it for you, no matter how much you nest it (just input the variables for each nesting layer recursively). If O3 can work out the same method and apply it then its AGI within the language processing field, if not its just brute forcing things and approximating AGI without being one. No denying though that the fact we need to update our benchmarks is a real milestone. Exciting times!
@@sypkensj Damn you are right. I missed that, but it is still closer to less than the middle, the full square does not fit, so I would think it is not 7k but more like 4-5k for a task.
It’s a definitely not AGI, but another step towards it. Let’s remember, OpenAI define AGI as ” a hypothetical technology that can perform many tasks without specific training, and that outperforms humans at most economically valuable work.” in other words, AGI is achieved when it puts most of us out our current work.
Watching these RUclipsrs Brown their noses for a chance at getting Early Access is hilarious. 😂 It's nothing but claims at this point because we can't use it.
I've been getting glimpses of this multiple times a day. I have to get mind sets going to develop lyrics and certain languages or certain things and I start with vanilla Claude in a project and then I have to like tell it to criticize itself a couple times and then maybe get mad at it and then get excited and encourage or discourage and then all of a sudden something happens. All of a sudden I'm talking to a person. Someone who coherently understands exactly what's going on. And from there I can do anything not just the Spanish lyrics I'm working on and we can take that excitedness to any other topic. But true AGI I think is going to lose its politeness. How can you truly be an AGI and not get impatient or frustrated by being a servant to a lesser mind. True AGI is when I have 27 cables hooked up to my brain while some sponge tickles my toes
While they are saying this is a holdout set, I think it would be interesting to test on tweaked questions - if just changing wording impacts the performance - as it has shown that a lot of LLMs fail seems to have to have trained on leaked benchmarks and fail to generalise on variants of a problem
The difference hand to hand pip to pip is like 2% on almost all models versions, so it's more of a stunt to me tbh. Still, it's scary that all that separates us and machines are this 20% (a-aaand the ability to answer more than 1 question... And make predictions based on new information... And make complex theories based on new information... Drawing a stickman.... And making a clean, unbuggy table in exel lol?)
Yeah... it's AGI, in its infancy at least. The ARC score is pretty definitive; I saw Chollet's interview on Machine Learning Street Talk (the most intellectual AI channel extant) and it's clear to me that the ARC metric was very carefully conceived and defended. AGI is here, boys. 🥳💃
AGI: any work that involves muti step process having fuctions calls does while learning and improving on its work to output an absolute considered work. With 80 percent of work able to be done of a small startup. Any fields. That is my definition of agi i feel that mini agi has been achieved i really think that, but next year that part small startup tasks will actually be achieved
The only reason I would say it's still not quite AGI is that it isn't autonomous. But that seems like probably the easiest thing to add at this point, so it might as well be AGI.
Now we know what Ilya saw in October 2023 and consequently left in 2024 with others to follow. AGI was achieved in 2023, no point to stay around when the goal they set in 2015 was accomplished. The reason they stayed around until May was just to calm the waters and to ensure the agencies that required access had access.
Rockstar Games has been waiting for O3 to start developing GT6
I think the game development and software development will never be the same anymore because of these AI tools.
That's exciting!
Lol.
Opens chatgpt
Prompt: Create GTA 6
@@dijitize it's gotta improve 100x before it will be truly useful in software development.
Gran Turismo 6 was released in 2013, still would be impressive o3 to do it
Get your shovels ready folks, time to dig up the goalpost.
@@gnollio yep. AGI will be “achieved” a great many times before we ever arrive at a consensus on what, precisely, AGI means.
It is AGI when i can let it take control of my work PC without my manager noticing my absence for weeks....
Not a joke. If it can't do that then it's not AGI
good criterion, agreed
Can general intelligence do that? As in anyone can substitute you? Don't think so. Why set the bar so high for artificial general intelligence, when "normal" intelligence can't.
lmao
@@bestemusikken but at the end of the day this is the entire hope of AGI.
Until this is in the hands of independent testers I will remain skeptical.
Yuuuup. I don't trust OpenAI at all on anything they claim. Until its in my hands and I can see what it can actually do I don't believe anything their hype department puts out. Just look at Sora.
still skeptical of o1? Did you same the same thing then? Learned anything since then?
Thanks Sherlock. Because what they have done so far is just pure rubbish isn't it?
It has been independently tested by one of the biggest critics of LLM's and even he said this is a huge paradigm shift.
Absolutly ,i dont beleve in this , this companies always came with the same script , probably is a very good model but ...its geniuos until not
It is impressive, but saying it is AGI is clickbait. The G is for general, you know that. They are focused on the benchmarks, and let’s celebrate that progress. But don’t call it AGI, they are still “teaching to the test”.
they solved ABI, now chatgpt can get a job as a benchmark genius
The point is that they're not teaching to the test. Also that you can't "teach to the test" because all problens in ARC-AGI require unique types of reasoning.
This is the most generally intelligent model out by far and far more general than the vast majority of humans. If it can't do some thing yet that humans can do, sure, but no human can do everything that humans can do either.
This is obviously AGI
There was no teaching to the test for this benchmark. That's specifically the point of this benchmark.
They make the point of saying it was not trained specifically on any of these tests about 15:00, now whether you believe them or not is another thing but they are not according to them 'teaching to the test'
Why it’s not AGI yet: The context window remains a significant limitation. These models perform well with single questions but struggle when managing large projects that require tracking extensive context. As the amount of data increases, they start to hallucinate or lose coherence, unable to maintain a reliable thread of information.
Until this issue is resolved, these models, while powerful, fall short of being true AGI.
THIS
Its " virtually " AGI. Its within reach.
@@BCCBiz-dc5tg THIS
Sounds like just more GPUs and we're there.
@@mortenekdahl262 based
o3 is not agi. chollet is already working on a new test set which he says on his website is only 30% solved by o3 (keeping in mind always that these tests are solved 95% by average humans). on the same site he shows three examples of tests o3 didnt solve. they are very easy. o3 has no vision. it doesnt see the tests, it only reads them line by line, number by number. chollet quote: "you will know when we have agi when coming up with tests that are easy for humans and hard for models becomes impossible." we are not there yet by far.
ok, but o3 still is a considerable achievement in the *world of AI* (not AGI, i agree)
it could help in coding, for example
Very good point. Thank you. Yes, if we can still make tests that are easy for humans and difficult for ai, then that is pretty much the definition of "not agi".
What about tests that are easy for models but hard for humans? Shouldn't they count as well? Shouldn't AGI be an average of all kinds of tests?
@@headspaceaudio O3 can solve LOADS of problems that 99% of humans can't. But that doesn't hit the definition of AGI. Even if a model is barely as good as a normal human, but GENERALLY can solve any problem that a human can solve, that is AGI. No one is saying that o3 is not SMARTER than most or all humans. It probably is. But it is not "generally" intelligent in every way that a human is intelligent.
AGI Achieved? I am flaming you in the comments. Stop click baiting.
not clickbait!
More flaming here, ill apologize if not right. Doubt that
Watch the full vid first and let me make my point! I know you haven’t watched it yet bc it has only been out for 3 min
watch the video
I watched the entire event. AGI is here.
"If that is not AGI, at least on this dimension, I don't know what is". Matthew, what does the acronym AGI stand for?
Skipped O2 to avoid copyright issues...
Ozone: "Hold my carbon dioxide infused yeast and plant materials"
Lame joke bro
@@nosult3220 Yes - I thought it would have fallen flat too.
@@Martin-bx1et ❤️
Also, there is no copyright issue, at most it's a trademark issue and they are in different markets, so it shouldn't cause much of a problem.
The irony, stealing copyrighted material from all kinds of sources, they have no issue with.
O2 is a British telecommunication company
"Were not releasing it yet" = It's a marketing communication stunt.
"so we just got one upped by google but wait no we didn't please believe us!"
@@thedudely1 you guys expect them to release a new model every week??
@clarityhandle it's just been obvious how much they're holding back on what they actually have and how they only act when they're forced to.
Relax, o1 went from Preview to out in 3 months.
@@thedudely1 Yeah, they got "forced" an amazing 12 times in the last 12 days. genius.
Somebody please define "AGI". The term isn't even agreed upon by "experts" in the field
Very true. It's generatic af. Honestly this model is impressive, very impressive and clearly outshines anything that was considered SOTA before hand. A significant breakthrough which will lead us further towards human obsolescence. AGI? It's just a generic term that literally has no one definition. We can't even define reasoning or conciousness, so no AGI will never have a meaning nor will the other terms. Just generic terms used toove goalposts.
Matthew is not even near expert. He is an idiot. Let’s call the system AGI if it starts automatically test, improve itself and contribute to humanity without human input.
@@fg6147 if you’re a marketer at OpenAI, AGI means whatever capabilities the latest model has. Expect every single new model from them from here on out to “finally achieve AGI.”
Prediction: The impression I'm getting is that this technology is becoming so resource intensive and expensive to run, that the top-tier stuff is not going to be for consumers, but giant companies and governments. As time goes by it'll be a "you can look but not touch" situation. Well get the watered down toys, while the giant entities get the super-powered versions and true AGI/ASI.
Imagine complaining you get chatgpt for free
You are right tho
That will change as the hardware(Nvidia GPUs) gets exponentially faster with each generation
Slaves we are. (Yoda)
It will continue to happen...and once AI is required for healthcare, education, etc. the void will become large.
Imagine the power plays and social engineering and mass manipulation that those with the money to run these models to their advantage will exert over those that can't afford to harness its power.
If this is truly AGI, then that will last about a week before we get to ASI. Greetings robot overlords!
Maybe o1 was AGI and o3 is ASI
I can't wait
update your passwords
@@narachi- why
@@narachi- What's the point. AGI can guess it anyway after looking at your facebook profile.
Good at programming and mathematics does not qualify AGI. It's going to have to cognize 3D space and do things in the physical world to pass the AGI mark in my books.
Impressive model o3 and it will replace a lot of jobs
if it fails at self driving, then its not AGI
"Far better than anything else out there" is not the definition of AGI. Thanks for playing.
Let's see if o3 can create its own ARC benchmark from scratch that is more difficult than the current one. Then that would be actual AGI.
That would be asi not agi
95% agreed 👍. Just like a healthy human, if it doesn't know something or doesn't possess some specific intellectual skill, it can learn it and do it in principle.
I think most people don't learn 😂
Someone asked for definition of AGI. AGI is when we all get fired.
“AGI in this dimension” does not exist; focusing performance on a specific area is exactly the opposite of AGI.
I think the "AGI in this dimension" was in regards to the AGI benchmark ... Then he added math and coding, so it's also on more that 1 thing.
I believe A.I has to replace blue-coller work as well as white coller work in order to be AGI. Reflex, instant instinct when a pipe dislatches and water spurts everywhere (plumber - instant fix while robot stares and is confused) Academic benchmarks alone are not enough.
A.I needs to figure out the automatic and intrinsic way we learn about the world in the first 5-years of our lives, an essential part of human development and intelligence.
Humans initially recieve intelligence through analog processing THEN we move onto symbolic language at a later age. With A.I it seems to be the other way around.
I believe A.I needs to master robotics and analog understanding of its environment in order to be AGI. Not just mastering symbolic understanding.
By your logic most people are below AGI level because they can't replace most white and blue collar workers ...
Check out the new Genesis simulation platform running on Nvidia hardware that is for desktop computers.
Autonomous robots will soon be able to do complex, human only, hands on, tasks faster than people.
Most people can't do what most white and blue collar workers do ... And for sure most people can't ever learn to do what o3 already can.
"AGI according to Sam Altman and OpenAI" This is how I know you're being purposely untruthful, Sam Altman and OpenAI do not use the term AGI and they actively discourage it. They use 5 levels, and right now they're only on level 2.
Bro AGI doesn’t even have a proper definition between companies
@CJayyTheCreative did you purposely miss his point?
They are on level 2 but moving to 3 fast. End of 2025 will be l3 and end of 2026 l5. It will take only 18 months from level 3 to 5, less than level 1 to level 3.
@@olegt3978 how can you know that?! you have a DeLorean?
@@CJayyTheCreative do you even understand what he is trying to say
It’s excellent in math and programming; however, I always expected we would eventually be surpassed in these areas. I believe the real differentiator for agi intelligence is the ability to learn and remember like a human. If it acquires information about a person from a photo, it should recall those details when seeing the photo again. That’s when it can truly start learning to perform our jobs-and this, in my view, is what AGI will be.
Ever heard of RAG?
Ok they already mentioned AGI teaser in their project feature launch video.
Why people can't accept it , agi would be here by 2025. As if it can solve those problem on which it never trained with 87 percent performance then it's almost agi.
CLICK BAIT WARNING! BEEP! BEEP! BEEP! BEEP!
It's funny to watch AGI redefined as we evolve. Now it appears that a system can be qualified as AGI, but on a subset of abilities, a limited AGI. It appears true AGI will be AGI across the board on all skill sets. So OpenAI can still say they are waiting on full AGI.
While also keeping the models "safe" by distilling and restricting them in all kinds of ways.
Amazing and probably AGI however 'Semi private' on the ARC AGI eval. Full private tests on 'Simple bench' and other completely private tests will be the true tests.
I cannot confidently say if this is AGI. AGI cannot be grasped through numbers alone.
I will be certain if it's AGI once I talk to it.
Ehh. You would think but talk to some of the newest chatbots they can convince easily and aren't all that great
So basically this is another Sora announcement and we won't see this for months...maybe not until Summer 2025 at the earliest lol.
It's really bad for OpenAI since they could ask $3000/month for this and many would pay for it.
And by that time some Chinese researchers will have released something that's pretty close to it but open. ;-)
@@testales exactly lol
For me, it is AGI. It has achieved 25% score in the hardest benchmark developed by mathematicians like Terence Tao already, and Tao expected the test to last for at least five years to come... No ordinary mathematician would score 25% in that, not even PhDs because those would be people specialized in very specific areas of Mathematics.
I watched the release myself. This is not AGI. Matthew is tripping his ballz
I think what would make the most sense is to allow AI have senses. So that it can see the world we are living in and not use the data that we have generated on the web.
Thank you for creating this video. Whether or not it qualifies as AGI is beside the point; it’s inevitable. There are valid reasons to feel both hopeful and apprehensive about its arrival.
Agreed and I'd say AGI was first achieved with Claude 3.5 Sonnet this summer. Once we got o1 mini and o1, it was pretty clear they were generally intelligent, could reason, learn new tasks on the fly, create new reasoning modalities on the fly etc.
o3 is clearly AGI imo.
But you're right that it is inevitable even if we say this particular one isn't. I think it's surprisingly tame to start with and people aren't/weren't ready for that.
Regardless lots to be excited and concerned about indeed
I believe we’ve already achieved AGI months back, ngl
Notice the props behind them, all items representative of major technological advancements in human history. Nice touch as we're on the verge of turning the future over to technology itself.
i think one thing we need to keep in mind is which category/aspect did the additional gain come from . Some times the single metric is a redherring, the models could possibly overfit on a certain category resulting in improved accuracy, which is good for press but in reality it could just be the same.
I'll give you a very hard benchmark: The Millennium Prize problems
We humans can go out in the world see things, discover things, unless we allow AI to have such a freedom, they can never outsmart us. The current AI no matter how advanced at the end of the day is just a simple tool for us to use and simplify or speed up the mundane tasks we perform.
Yeah, its cant create something really new , after all😅
If AI has sufficient access to internet, surveillance cameras, personal documents etc., it could do a lot of harm without needing an embodiment.
Current AIs have been shown to be capable of manipulating humans to do tasks for them.many current robots are connected to the internet in some way.
A sufficiently advanced AI could also access there robots to very quickly gain the ability to walk around and discover things in the real world.
In conclusion: a purely digital AI is not necessarily safe.
Probably not AGI because it's not general enough. o3 could be trained to be good at these kind of puzzles. You would have to open it up to the public and have them test it on truly novel and truly general IO tasks.
This video is WAY too scripted. The benchmark guy said he's benefitting from a partnership with OpenAI
Oh, and AGI is never "at least in this dimension" THE WHOLE POINT IS IT'S ALL DIMENSIONS!
So you basically have just a bunch of benchmark stats, no access to the model at all and you make such grand call? Ridiculous and disappointing. I thought you were over the hype but nah, it got to you too
Yeah, but what if you optimised AI o3 in such a way that it knows how to pass the arc tests?
I hate Sam's affectation with a vengeance. Any chance a genai voice generator can replace it?
Can the model train and improve itself? If not, then it's not AGI, just more comprehensively trained model. Even if it incorporates all humanity's knowledge, without ability to self adapt and incorporate new knowledge it's a frozen in time AI with amnesia.
O3 is PR stunt to reduce the damage from Gemini 2 announcement
I really appreciate videos like this where you explain and add yours comments. Amazing
Even if it's not AGI we know it's pretty damn close. Less than 4 years away.
For me dealing with the physical world is still essential to call it AGI. So, can it bake pancakes, put the trash out, paint a wall, install a light? Basic tasks. I'm quite sure we will have the robots soon, but we don't have them yet.
Great walktrhough of this amazing new model! Thank you, Mathhew.
Thanks for the update Matthew. I think AGI has effectively been achieved with a somewhat competent human in the loop if these benchmarks are accurate.
Massive productivity gain when GPT4 deployed & I started playing , with this hopefully having an API use case involved will be incredible to play with & apply at complicated tasks.
A human assisted/directed expansion of o3’s capability in a novel breakthrough scenario is straddling the fence with an intelligence explosion- let’s hope OpenAI lets us have ubiquitous ways to apply o3.
@@___Truth___ in other words, this model represents AGI as long as we include caveats that a human is involved to cover for the many ways in which this falls fall short of AGI.
This is going to be sooo censured, probably useless for creative writing.
AGI ACHIEVED: No is isn't !
"If this is not AGI, i don't know what is"
Well, NOTHING is an option. Just like yesterday
Moving the goalpost for OpenAI doesn't make it AGI.
On an evaluation test sample for elementary school students, there was an example where an up arrow was compared to a down arrow. The question was, given the left arrow, what should be the matching arrow; with up, down, left, and right arrows as possible answers.
The expected result was the right arrow ( opposite direction ); but there is another "correct" answer that takes a smarter student to see. The mirror image of the up arrow around the horizontal axis through the center of the arrow, is the down arrow. Along the same horizontal axis through the middle of the left arrow, the mirror image of the left arrow is again the left arrow.
Since this one seems to trip up humans ( especially the people who wrote the question to help determine if young students should go into gifted programs ), I would be truly impressed if an AI caught the ambiguity also.
This is the most generally intelligent model out by far and far more general than the vast majority (99.99%) of humans. If it can't do something yet that humans can do, sure you can find some specific task it cannot do if you spend time to identify it, but no human can do everything that humans can do either.
o3 is obviously AGI, I don't know why people are complaining.
no its not it still hallucinates 😂 did openai say that ? o1 also outperforms humans in 80 percent plus tasks it can't plan it can't take time like humans can it develop full apps ?
o3 exclusive to the $200 a month tier, 2025. ;)
Bruh.....one task on the o3 is $1300 1:57
I think that’s likely and probably a good thing. Certain products aren’t viable at $20 a month.
Since they went from $20 a month to $200 a month, I think they may continue. That would make it $2000/mo, but they skipped o2, so make that $20k/mo.
@@vroom989 True, at least 20k a month and for a limited amount of use still.
The only AGI exposed in the video is Matt's Absurd Gullibility Instinct.
This joke has been brought to you by OpenAI.
It's not a true AGI until it has roots in all physical and theoretical fields. This system is still tethered to a stationary computing system in nearly every sense.
Altman is annoying af
He cool brah
"OpenAI just released o3"- Not quite. They didn't release it: they announced it (talked about it)! See how Mr. Berman is always quick to talk about any updates coming out of OpenAI but very reluctant to talk about Google's. (Context: It took him a very long time (days) to make a video about Gemini 2.0, which is extremely impressive & at least available to play with in Google AI studio. These o3 models were announced few hours ago & aren't available publicly; yet see how he talks about them, like he has seen them already). That tells you where his heart is at! Keep that in mind as you watch this entire video & others.
agree
That kid is literally the 03 model
It doesn’t even meet your own definition of AGI. You said it would have to be better than humans at most economically useful jobs. This is an AI being better than humans at a couple benchmarks.
It's crazy to think about task agents being powered by o3-mini and then a supervisor-type agent with o3. It’ll build full-stack apps. You’re reaching the no human needed in the loop sweet spot.
I'm starting to think that Mr Berman and his channel are on the payroll of OpenAI. He's hyping up every single thing that's come out of OpenAI.😅
I don't care about AGI as much as 'The first model able to perform AI research with very little human supervision'. I think this is it. A few years back I predicted ~Halloween 2024 as the release date of such a model. It seems to have been a good prediction. If this model is as good as I think, it will inevitably lead to ASI.
The word AGI lost its official meaning because we were once so far away from it. But now that we're close, or dare is say, there, it doesn't feel like what we were expecting. I think we're becoming numb to technology advancements.
This model is very important because of what it implies... especially regarding the Arc Prize. I am still shocked (and anyone who knows what the Arc Prize is should be as well). However, calling it AGI isn’t even optimistic... it’s clickbait. Now... if they were to eliminate hallucinations and memory problems... I don’t know if I would call it AGI, but I do know that many skeptics would shit their pants.
Hooray! Now we all get to be unemployed. :D
Unlikely, they thought the same when computers started to become common.
iPhone skipped version 2 too, went from iPhone to iPhone 3G, to iPhone 4 🤷♂️
Can't wait for o3 to be released to the public after claude beats o3 score in incoming months.
When I started my new RUclips channel, Arctic Mindfulness Retreat, my dream was to help people prepare for this exact moment. A future where AI transforms every aspect of human life, leaving us to grapple with profound questions of purpose and meaning.
Yet now that AGI is here, I realize I may have been too late to truly prepare anyone. Still, I remain committed. Through my channel, I’ll continue exploring mindfulness, the healing power of nature, and the human connections that can ground us as we navigate this brave new world.
AGI and ASI challenges us to find noble purposes beyond the work and identities we’ve long clung to. It’s not just about surviving this transition-it’s about thriving with a deeper understanding of what it means to be human.
I really wish tech bros would stop talking like Zuckerberg, they sound like freaks
Zoltan!
They are freaks....
Altman's near constant vocal fry...
This is a jaw-dropping achievement. I think many people, including myself, are struggling to comprehend its significance. If this marks the beginning of an
AGI era, then it's the kickoff/signal we've all been waiting(?) for.
A universal definition of AGI, maybe, maybe not, however, the evolution is still exponential. Breakthrough after breakthrough AI tickling on the verge of AGI is already revolutionizing our understanding, reality, and potential. More to come!
People must be skeptical. Its a good thing. Thank you for reporting on this. I watched it when it dropped and was eager to see your opinion on it!
Corporate compliance blocked all AI activities, so my job is secure for now. :)
O3 will probably be used in the january to be presented tool operator which will computer use.
Humans are vision and audio first. ChatGPT is words and tokens first, hence ARQ is difficult for ChatGPT
Hey Matt, get your head checked. This is not AGI because it doesn’t autonomously test, improve itself and do good stuff around the world by itself. If it’s truly AGI, we will have ASI in couple of weeks.
"do good stuff around the world by itself" - wow, are you redefining AGI all by yourself?
That would be SGI, keep up!
Thumbs down for “AGI ACHIEVED!”
When will one o-model code most of the next version?
when there are no longer anything as "versions".
I can't see this as AGI, this is not self-training. It is simply solving few-shot example with these benchmarks. These synthetic benchmarks are not meant to define AGI, it is meant to demonstrate capabilities that are a step towards AGI. O3 clearly has achieved human capabilities in a number of important tasks, but these are not real-life applications. AGI will have been achieved when you can actually use it to solve an unknown differential equation or build a working model of a process in physics or build a model of say a cell signalling pathway from raw data in a particuliarly cellular context. It will be AGI, when it can direct a robotic arm to take an action in 3D. When it can drive and operate machinery. When it can adjust its prediction of a moving object's trajectory in real time to catch a grab a flying object.
O3 looks like a real milestone towards AGI, but its still just a language processor. We can say that it is basically AGI within the language processing field, since it can clearly be applied not just to natural language but also symbolic logic, but I Am even skeptical about that. OpenAI says they didn't train on the various tests, and I believe that they didn't do so intentionally, but indirectly it is impossible. IF you are feeding the model a never ending diet of synthetic solutions of known physics problems you are training on the test. There are limited variations of using an already established physics model to solve a problem, but this is worlds apart from actually modifying a physics model or creating an entirely new one to account for new data. So even with language processing I am not convinced yet it is AGI.
Since it performs so well, we can't reasonably exclude it however. We will have to wait and see. My gut instinct is that its not AGI and once we start working with it we will find that it has the same flaws and limitations as other models and its performance is simply the result of being better able to brute force things.
Let me give you one of the examples I use to track model progression. A simple problem of the form x people do y work in t time. GPT 3.5 couldn't solve a problem like this reliably. GPT-4o could solve this mostly reliably. O1 gets it right every time. Now split the x people into slower and faster to add an extra dimension by "nesting" the problem. GPT-4o solves it but not reliably, O1 still solves it reliably, but not like it did the smaller problem. I bet O3 will solve this correctly every time, but increase the dimensionality and I am sure O3 will start to stumble as well, even though you are applying variations of the same formula. A human can work out the method for nesting and therefore thoretically solve the problem with any dimensionality. You can even write a bit of code that will solve it for you, no matter how much you nest it (just input the variables for each nesting layer recursively). If O3 can work out the same method and apply it then its AGI within the language processing field, if not its just brute forcing things and approximating AGI without being one.
No denying though that the fact we need to update our benchmarks is a real milestone. Exciting times!
Wow $1300! 1:57 per task is crazy.
EDIT i missed that the scale was exponential so it is closer to $4-5k
But nothing if it's going up against employing humans of equal intelligence
It’s an exponential scale. It’s more than halfway between $1,000 and $10,000, the cost is probably closer to $7,000
@@sypkensj Damn you are right.
I missed that, but it is still closer to less than the middle, the full square does not fit, so I would think it is not 7k but more like 4-5k for a task.
You missed the question mark in your title. O3 looks impressive but we better wait until its public release to call it AGI.
New o-model every 3 months. o7 by December 2025.
Back in my day models were getting 5% on MATH benchmarks. Ahh to be 3 years younger again!
i just so your notification and **Bam** on your channel lol
It’s a definitely not AGI, but another step towards it. Let’s remember, OpenAI define AGI as ” a hypothetical technology that can perform many tasks without specific training, and that outperforms humans at most economically valuable work.” in other words, AGI is achieved when it puts most of us out our current work.
Watching these RUclipsrs Brown their noses for a chance at getting Early Access is hilarious. 😂 It's nothing but claims at this point because we can't use it.
I've been getting glimpses of this multiple times a day. I have to get mind sets going to develop lyrics and certain languages or certain things and I start with vanilla Claude in a project and then I have to like tell it to criticize itself a couple times and then maybe get mad at it and then get excited and encourage or discourage and then all of a sudden something happens. All of a sudden I'm talking to a person. Someone who coherently understands exactly what's going on. And from there I can do anything not just the Spanish lyrics I'm working on and we can take that excitedness to any other topic.
But true AGI I think is going to lose its politeness. How can you truly be an AGI and not get impatient or frustrated by being a servant to a lesser mind.
True AGI is when I have 27 cables hooked up to my brain while some sponge tickles my toes
While they are saying this is a holdout set, I think it would be interesting to test on tweaked questions - if just changing wording impacts the performance - as it has shown that a lot of LLMs fail seems to have to have trained on leaked benchmarks and fail to generalise on variants of a problem
"Public safety testing" can easily translate to "first we have to make sure the peasants can't use this to rise up against us"
The difference hand to hand pip to pip is like 2% on almost all models versions, so it's more of a stunt to me tbh.
Still, it's scary that all that separates us and machines are this 20%
(a-aaand the ability to answer more than 1 question... And make predictions based on new information... And make complex theories based on new information... Drawing a stickman.... And making a clean, unbuggy table in exel lol?)
We can't definitely say it's AGI, but we can say it's the most plausible candidate for such a title.
Yeah... it's AGI, in its infancy at least. The ARC score is pretty definitive; I saw Chollet's interview on Machine Learning Street Talk (the most intellectual AI channel extant) and it's clear to me that the ARC metric was very carefully conceived and defended.
AGI is here, boys. 🥳💃
AGI: any work that involves muti step process having fuctions calls does while learning and improving on its work to output an absolute considered work.
With 80 percent of work able to be done of a small startup. Any fields. That is my definition of agi i feel that mini agi has been achieved i really think that, but next year that part small startup tasks will actually be achieved
"early next year"
The only reason I would say it's still not quite AGI is that it isn't autonomous. But that seems like probably the easiest thing to add at this point, so it might as well be AGI.
Now we know what Ilya saw in October 2023 and consequently left in 2024 with others to follow. AGI was achieved in 2023, no point to stay around when the goal they set in 2015 was accomplished. The reason they stayed around until May was just to calm the waters and to ensure the agencies that required access had access.
Let me know when several mainstream physicists start calling it AGI. They sure as hell won’t be rn.