The World Reacts to OpenAI's Unveiling of o3!

Matthew Berman

Просмотров 214 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 янв 2025

Комментарии • 965

@alex-rs6ts 23 дня назад ⁺⁴⁴³
We went from "It's impossible" to "it's too expensive" quite fast. How long until "It's not worth the effort"?
@jonatan01i 23 дня назад ⁺²⁴
What do you mean it's not worth the effort? What doesn't worth it?
@alex-rs6ts 23 дня назад ⁺³
@@jonatan01iI mean the excuse people are going to use. "Too much energy, too many GPUs". They will always try finding something to complain about
@mcgdoc9546 23 дня назад ⁺¹
Haha! You got it right!
@AngelVasquez-nw8zf 23 дня назад
@@jonatan01i Having humans doing it or doing it oneself
@neoglacius 23 дня назад ⁺⁵
@@jonatan01i your life support
@ttcc5273 21 день назад ⁺²⁹
Capitalists from VC firms are so “impressed”… wanting us to invest in their ventures.
@TheAlastairBrown 23 дня назад ⁺¹⁶⁹
OpenAi in particular have a consistent pattern of making PR releases, then it taking way longer than they said, and under performing. They always seem prepped to try and steal Google's thunder, make a crazy announcement that's about 4-6 months before the public get to use it. Think about voice mode - it did amazing things in the demo, then was fully nerfed and a relatively "stupid" model. The SORA PR cycle was exactly the same. I don't trust them to be honest about any release until we actually get to use it.
@brianmi40 23 дня назад ⁺³¹
And yet, we just GOT o1, and it DOES perform as shown 3 months ago. If you can't recognize the advance from o1 to o3 in only 90 days, you're letting your bias take control.
@Williamsl99 23 дня назад
I’m amused by comments that say it took way longer. It’s hilarious that ChatGPT 3.5 was released two years ago and now things are taking way longer…
@pabloguzman8472 23 дня назад ⁺²⁴
yea Sam ALtman saying "o1 is already pretty smart" made me deflate the hype bubble, o1 is still super dumb and mediocre, gives middle, mild, balanced, appropiate, expected answers instead of actual smart answers. Anything involving irony makes them immediately fail
@Squeegeeee 23 дня назад ⁺⁴
Was voice model completely nerfed and stupid? I know it didn't technically sing unless tricked, but it's otherwise like 4o I thought, and now has vision... o1 then o3 are absolutely massive developments, especially in regard to the arc challenge...
@Easternromanfan 23 дня назад
@brianmi40 No it doesn't. They claim it is a PhD level student, yet apples research shows how much it fails to comprehend when a red herring is in a grade school level math problem
@wisdomking8305 23 дня назад ⁺¹⁶²
I love this new format, were you synthesize all the comments from influential people in the AI industry on a given breakthrough (o3 in this case).
Do more of these in the future, we can assume there would be more crazier breakthroughs from hereon.
@matthew_berman 23 дня назад ⁺²³
Glad you like it! I will do more of these :)
@kevindonovan6727 23 дня назад ⁺²
I had that thought 45 seconds ago. Concur? You betcha !
@mikeschwarz4588 23 дня назад ⁺¹
Yes @matthew_berman - this is amazing. I speak regularly to businesses about AI, and there are always so many skeptics… posts like this save me a ton of research time to shut the skeptics up. Well done. I’m actually going to be following all those people you referenced so thank you kindly for sharing.
@christianvincentcostanilla8428 22 дня назад
Try make a video :
Gta 6 AI npcs is going to be different than Gta 5
@roryhill717 22 дня назад ⁺¹
@@matthew_berman I really enjoyed this one too. Well done. One small comment: I find your use of the mouse to point to text to be quite distracting. That aside, this is a great way to get a picture of where we are right now - thanks!
@henrytuttle 23 дня назад ⁺⁴⁴
The most impressive thing about the Frontier Math test was that a Field's Medal winner said he thought no human could do what o3 did PERIOD. Not about speed or anything but that no one human could do it.
But solving math problems better than experts is not AGI. Chess computers were able to beat the top chess players long ago. Math is not Chess, but it's the about specialization. Not being able to do things that five year olds can do means that it's not "general". When it's able to do almost everything (maybe there might have to be an exception or two) that humans can do at a minimal level, it'll have reached "general" intelligence while simultaneously being much better at certain tasks, like most humans.
But I think the $0.10 per task is unreasonable. General intelligence at ANY PRICE is impressive.
@MrNote-lz7lh 22 дня назад ⁺²
Hmm. By the time it's human level at all tasks it'd be superhuman at all important tasks.
@henrytuttle 22 дня назад ⁺⁴
@@MrNote-lz7lh "important"? Problem is, some of the things that AI can't do now are pretty basic. It really depends on how deeply it understands those basic things. We've never had artificial intelligence before. It's entirely possible we'll have AI that is superhuman at "important" things but won't even be smart at when to use it.
Genuine creativity is required to utilize genius level abilities in a beneficial way. Otherwise, it's just faster and more knowledgeable. But not "wise".
@HCforLife1 21 день назад ⁺³
@@MrNote-lz7lh currently important tasks are "how to make more money for corpos", "how quickly create stratrups to make money", "how to replace human office employees". Not "how to cure cancer", "how to slow down aging", "how to terraform mars", "how to fix climate changes" and others.
@consciouscode8150 20 дней назад
Ok but the key distinction with traditional narrow intelligence is they were trained *specifically* to solve those tasks. The equivalent of doing an AlphaGo for Frontier Math would be to have a specialist model that solves math problems and that be the only thing it's capable of. This approaches general intelligence because it did it without specialized architecture or training. Failing some ARC problems is misleading because 1. the format the problems are presented in massively pessimizes performance to the point where you'd practically need ASI to solve effectively (they don't get the nice pretty visualizations we see, they get raw JSON full of numbers) and 2. it implies that general intelligence requires perfect parity with human intelligence when we bake in our assumptions of what's "easy" into these tests using our own millennia-old specialized models which operate subconsciously.
tl;dr "What's easy for humans" is a *very bad* benchmark for "general intelligence" because most of what we consider "easy" isn't actually general intelligence, it's narrow intelligence modules.
@henrytuttle 20 дней назад
@@consciouscode8150 I haven't seen the actual technique used to test for the arc. I would think they'd show an image as that's what's supposed to be tested. But, OK, I'll believe that.
Anyway, you could be right. But AI (LLMs, neural nets, etc.) are SO DUMB in so many ways. I thought they were "smart" when they first came out, but I realized that they were just faking it. They been exposed to almost every question that has already been asked. So I was shocked to find it answering brain-teasers that I had trouble with (I have a high IQ). But then, if you give it a similar but different problem, it would fail miserably. It wasn't actually thinking anything but merely regurgitating old knowledge.
We really don't know how it works. We understand what individual nodes do, but we're not really sure how it all comes together. I think we might be very far or very close to true creativity and imagination. And I think that to have true general intelligence, we'll need some of that. I don't know if AI has even a little yet. Hell, I don't know if us humans have any.
@rexmundi8154 23 дня назад ⁺⁵³
I’ve been having a pretty in depth conversation with Claude on alien intelligence and the nature of consciousness. Like way more intense than any friend or family member would want to have. It has "read" and "seen" the books and movies I reference and can reference works I didn’t think of or haven’t read. It’s really amazing and surprisingly I find I look forward to "talking" to it. I can see AI human "relationships " will be deeply meaningful to people. Like "Her"
@jedrow 23 дня назад ⁺²
Is this paid version? I'd be interested in these conversations. I've been doing the same with gemini.
@rexmundi8154 23 дня назад ⁺²
@ it’s the free version so I’m limited in the number of prompts a day.
@Houshalter 23 дня назад ⁺⁹
I agree. But it's disappointing when the illusion breaks and you realize it can't remember your previous conversations and can't learn anything new from your conversations. Also most these models are programmed with the most boring possible personality. My experience with Claude is limited, but it's impossible to have that experience on chatgpt, it is just so boring.
@allanshpeley4284 22 дня назад ⁺²
It's almost completely useless if it can't remember, can't reason and can't learn.
@AwnSight 22 дня назад ⁺²
@@Houshalterchat gpt is far from boring. Chalkenge it to something( rap battle) best one liners… etc
@quaterman1270 23 дня назад ⁺¹³⁵
Everything based on benchmarks 😂 Has 40% credibility for me.
@BCCBiz-dc5tg 23 дня назад ⁺⁵
THIS
@RPi-ne5rp 23 дня назад ⁺²¹
True. There's actually some misinformation here. When people report on o3 solving 25% of the FrontierMath problems, they don't mention that there were three distinct tiers: IMO/undergraduate, graduate, and early research problems. Terence Tao only commented on the third tier since those were the problems shown to him. It's worth noting that even the undergraduate-level problems are very challenging, so the improved performance in this area is still significant. However, many people mistakenly view AI development as if it were a sci-fi movie plot heading toward either dystopia or utopia, rather than what it really is: a continuous struggle of finding trainable domains and fine-tuning models to extend their capabilities without degrading performance in other important areas.
@invictusOne 23 дня назад ⁺²
❤ indeed
@armandosillones2643 23 дня назад ⁺⁹
40% is a bit high, best i can do is 20%.
@DynamicUnreal 23 дня назад ⁺³
@@RPi-ne5rpYour last sentence just described human intelligence.
@TuxedoPanther 23 дня назад ⁺²¹
o3 is a step forward, but it also sounds like the amount of energy and resources has drastically increased too. The human brain runs on a small amount of power and it is truly intelligent and adaptive. As you say, o3 gets stumped by questions a five year old can answer. Hopefully it will find some good uses in science.
@jordanmartinez8652 22 дня назад ⁺¹
That’s true, but you don’t think the power and resource requirements will go down over time? Imagine 10 years from now…
@justapleb7096 22 дня назад
problems like power consumption and cost always inevitably goes down until it hits a certain plateau. Just let it go down first before you seriously complain about it
@AndrewTSq 22 дня назад ⁺⁴
@@jordanmartinez8652 power requirements might go down, but the calculations become bigger so in the end it uses more power anyway.
@Michael-mr3ig 21 день назад ⁺¹
@@justapleb7096 No, it uses standard transistor architecture to compute things which is a technology that is already near the theoretical limit.
@TheMikernet 21 день назад ⁺⁵
And it also solves other problems that stump any human given any amount of time in seconds. The problems it struggles with are ones we handle easily with visuals, and given our heavy dependence on vision and how evolved we are in that area, it makes sense that we happen to be particularly good at those.
These aren't "hard" problems to solve in AI development (as in something that will take decades). These benchmarks are being run by models primarily trained on language inputs, hence the second L in LLM. Once we start creating models fused with better visual understanding and the ability to create and process internal visual representations more efficiently, I suspect a lot these cases where humans happen to be a lot better will very quickly go away.
@kairi4640 23 дня назад ⁺¹¹
I'll believe in agi when "shocked" and "stunned" are no longer the RUclips clickbait titles, and we have more impressive grammar.
@ffs55 23 дня назад ⁺⁷³
Still waiting for one of these able to do a non-trivial regex
@luisfable 23 дня назад
Can't that be fone with GA already?
@remsee1608 23 дня назад ⁺⁷
What kind of regex are you doing geez
@garethsmith7628 23 дня назад ⁺¹
by example or specification?
I thought by example was not an AI problem and basically done well already
@goldnarms435 23 дня назад ⁺¹³
Your prompts must be terrible.
@mpfmax0 23 дня назад ⁺⁴
What kind of regex are you doing, I have done useful regex with Sonnet 3.5
@GingerDrums 23 дня назад ⁺⁹
The depth of the alignment problem needs even more attention. A machine able to solve these complex problems will be able to manipulate the worlds top psychologists like putty. We are in deep, deep trouble very soon if things continue to heat up so fast.
@haniamritdas4725 22 дня назад ⁺¹
How about that AI Jesus? Aligning AI with religion....😂🍿
@PedroPenhaVerani-ll1wc 23 дня назад ⁺²³
The hype and shock about o3 were so intense that a RUclipsr posted claiming that o3 was AGI and even used Stockfish in chess as an example! Crazy stuff!
@zxwxz 23 дня назад ⁺¹⁹
The ARC problem you described, specifically the one where the O3 solution was incorrect, is actually a question that most people get wrong. The red block directly above needs to turn blue in the standard solution because the assumed logic is that contact with the blue line changes the color to blue, rather than being based on overlapping. However, the first three examples don't demonstrate this contact behavior, so even ARC's own solution remains controversial.
@AntonBrazhnyk 22 дня назад
Revisit it. 9:02
There's no "touching" there anywhere.
@Walfischzahn 21 день назад
@@AntonBrazhnyk But there is touching. Topmost shape is just "touching" the topmost horizontal connection.
If touching is supposed to mean turning blue, it is ambiguous as it is not shown in the examples.
@AntonBrazhnyk 21 день назад
@@Walfischzahn Ah, ok. Agree. There's touching. And there's no direct examples about it.
So, is it known what model did and what's considered correct by test?
I wouldn't paint that rectangle blue just assuming that since touching is not shown - touching doesn't work, though I could argue for painting it blue too. :)
Also, it's one of the huge limitations of current models, which will always keep them from achieving AGI status. They have to be able to answer with "I couldn't solve it" of different kinds. Instead of just hallucinating some answer.
@andriystruk 23 дня назад ⁺⁴⁸
The ARC-AGI task that o3 failed on was actually ambiguous. If you look at how it answered you would have realized that both answers it provided were reasonable and a response that many people would have had as well.
@riseos 23 дня назад ⁺¹³
Indeed, I think it's interesting to see the ambiguity on the eval-creator's side that the model discovered. Also a bit sad Matthew didn't look into it even slightly and just took the failed problems as true failures rather than broken ground truth - maybe a follow up is warranted? It's quite interesting to look at the "failures"!
@XTheRealLuciferX 23 дня назад ⁺⁹
yeah! i actually was doing the test myself for a bit and come across this question, and realised the ambiguity of the question, i got it wrong first because i made the wrong assumption, but its just as right as the real answer, so if 03 had the same answer as me i dont think it should be counted as false!
@daomingjin 23 дня назад
that's because AI is only copying data patterns humans have already created. When investors finally catch on to this they're going to put their money elsewhere...
@wjrasmussen666 22 дня назад ⁺¹
They also trained using arc when the said they didn't.
@daomingjin 22 дня назад
@@wjrasmussen666 i suspect Sam Altman will be facing federal fraud indictment charges along with an SEC audit within the next 5 years....maybe less
@garethsmith7628 23 дня назад ⁺⁹
We live in a world awash with answers, but it is asking the right question that becomes the real skill
@herrrmike 23 дня назад
Yes, the answer is 42. But what is the question?
@jonathanalpart7812 21 день назад
What is 7 x 6
@eadweard. 21 день назад
Meaningless platitude.
@garethsmith7628 21 день назад
@@eadweard. actually, exactly the problem, I mentor engineering graduates and they always start off coming to get an answer, only to discover that the answer they get does not suit their need, because they did not understand the problem and therefore asked the wrong question.
AI in general seems to be viewed as a generalised problem solver of some kind, but often if you really understand the problem you don't need AI to solve, you don't even want AI to solve it for you because computationally it is very expensive, but generally it is a lot easier to deploy AI, than have a deep unerstanding of the problem and be able to identify an algortithm that can run on hard 100 or 1000 times less powerful.
@merricmercer778 23 дня назад ⁺⁹
If AGI is achieved but it can be expensive and too resource intensive then presumably the first task is to ask 03 how it can improve and allow it to recursively grow quickly to ASI - artificial super intelligence.
@xlretard 23 дня назад
an idiot would assume such evolution is linear
@GentleWarrior247 23 дня назад
Pretty sure it has considered this by itself already.
@codejunki567 23 дня назад ⁺¹
When I read people say things like this, I just know for a fact you have very little knowledge about how these word transformers work.
There is no AGI. And there especially won't be AGI using any of the current LLM technology.
@GetzAI 23 дня назад ⁺¹⁷
1:25 25% at HIGH computation. That dark blue is probably what most will have. Will Pro users get the compute needed to achieve 25%?
@GetzAI 23 дня назад ⁺²
7:29 I agree, this is NOT AGI.
o3 can solve very complex math problems, perform pattern matching, and reprocess outputs, but they’re still limited to input-output transformations within their training data. However, it does not truly “know what it doesn't know,” nor does it integrate deeper considerations-like in software development, considering the UI, backend, user needs, or ***why a particular approach failed***-into a single cohesive reasoning process.
LLMs lack genuine understanding, consciousness, and the ability to reason abstractly or transfer knowledge effectively across domains in a way that (some) humans do naturally.
Knowledge: LLMs possess vast amounts of knowledge, often exceeding that of any individual human. They can access and process information from the entirety of their training set. But, it's important to note this knowledge is represented statistically, not conceptually.
Intelligence: Intelligence involves applying knowledge to solve problems, adapt to new situations, and reason effectively. While LLMs demonstrate some intelligence within their trained domains, they lack the general intelligence to adapt that to novel scenarios.
Wisdom: Wisdom goes beyond intelligence. It involves judgment, ethical considerations, understanding the broader implications of one's actions, and learning from experience. Wisdom is a deeply human trait that is far beyond the reach of current AI.
@GetzAI 23 дня назад ⁺¹
To be 'General', I believe an LLM needs to understand the why something is or is not and how to use that why on novel reasoning.
I can watch 10s of YT videos on a subject and sound intelligent on that topic, but when asked the why, it is quickly understood I only have knowledge of the subject, not intelligence, let alone wisdom.
@GetzAI 23 дня назад ⁺¹
16:05 exactly. Therefore, we mortals will never see that staggering performance. Like most are seeing between internal and public release of Sora.
@GetzAI 23 дня назад
16:51 exactly!!
@Nododysgonnaknow 23 дня назад
@@GetzAI you are commenting on yourself no one is seeing this
@desmond-hawkins 23 дня назад ⁺⁴
It's clear from the examples of "trivial" failed tasks in ARC-AGI that what's missing from o3 is a *physical representation* of the world. For now it's mostly been based on concepts and how they relate to each other, this was particularly evident with word2vec for example. But the models don't really understand "I need to _paint_ these squares that are stacked _on top_ of each other in the order given by the small color band". We know how to do this because we live in a 3D world, so this feels kind of obvious. Once models start having a sense of what it means to be a physical being or what we experience interacting with our surroundings, then they'll make another huge leap. Unsurprisingly they've been really struggling with this entire physical aspect… for now.
@MaxYoutubeWhatever 23 дня назад ⁺²
I worked with advanced voice mode and when compensating for it's VISION DISABILITY it was able to solve the blue/red block puzzles perfectly and predict the output panels. It was as I suspected, the model is simply blind or vision-impaired based on imperfect conversion of visual data to words/concepts/description.
@amilebendiker3692 23 дня назад ⁺²
Great breakdown, thanks Mathew, always delivering the best summaries!
@DynamicUnreal 23 дня назад ⁺³
AGI will come about from plugging up the weaknesses in models. “Humans can do XYZ that the models can’t” - well eventually the models will because they’ll be trained on it. One would hope that training a model across a broad spectrum of intelligences would result in cross-pollination between intelligence across different domains.
@abj136 23 дня назад
It doesn’t currently work like this. The difference between a GPT model and a art drawing model is not that they are trained in different data. They are each engineered different from the ground up. There is no “training a model across a broad spectrum of intelligences” yet.
@JamesRogersProgrammer 23 дня назад
Exactly, just spin up a million ai and have them attend each different classes and do the homework and correct the homework to see what they got wrong, and then just use each lora that comes from this specific training to load the expert in that field for each problem.
@DynamicUnreal 23 дня назад ⁺¹
@@abj136 I disagree. LLMs are currently doing things that everyone thought was a dead end just one year ago. It’s clearly visible that LLMs can reason to a certain extent. They are representations/simulations of the world in written form. Most people thought that an LLM wouldn’t be able achieve the ARC-AGI metric as the O3 model has done. It’s just a matter of plugging up the holes. When a new “type” of intelligence arises that humans can do and a model still can’t, it may be possible to _plug_ the hole.
It may be possible that even in things we currently think LLMS are bad at like 3D spatial awareness, it may be possible to become adept at navigating or predicting real world actions as long as it has an accurate and steady data stream of the current real world space.
@XAirForcedotcom 22 дня назад
Merry Christmas, Matt, and the rest of the AI community
@Chatbot111 23 дня назад ⁺²⁷
If it’s really AGI, it shouldn’t struggle with easy-for-human tasks. That’s indicating that it does not have general problem solving capability
@tjken33 23 дня назад ⁺⁵
or pretending..
@ZER0-- 23 дня назад
AI still can't drive a car after decades of research and millions in investment. A person of low intelligence can do it in a week.
@daomingjin 23 дня назад ⁺⁴
@@tjken33 or maybe it's generative AI. Chinese study for exams., there is an entire market around studying just for exams. They learn how to answer each question in a very specific format. They even go so far as to get copies of previous test answers and memorize those questions and answers. This is exactly what generative AI is doing now.
@haniamritdas4725 22 дня назад
It's easy for humans to believe in the efficacy if humans sacrifice to propitiate angry gods...🤷
@haniamritdas4725 22 дня назад
@@daomingjinsounds like the Chinese students are the real trainers in this picture. Just like old engineers teaching programmers how to automate their jobs
@drdvrm 20 дней назад
This type of video is really perfect for me - it provides a broad overview without dumbing down too much. Thanks!
@existenceisillusion6528 23 дня назад ⁺⁵
Under the hood, it probably employs a multi-agent approach, among other techniques (differential compute, MoE, etc.).
@eadweard. 21 день назад
Why multi agent?
@existenceisillusion6528 21 день назад
@@eadweard. Because multi-agent systems have been shown to be more capable.
@austinpatteson6581 23 дня назад ⁺¹
Great summary Matt. Thanks for the cover of the various field experts.
@kerpopule4703 23 дня назад ⁺⁷
Not near enough is talking about it. This is so extremely important
@June-1980 23 дня назад ⁺¹
I 100% agree, it's more than important this will affect everything
@ItsAllG00d 23 дня назад
@@June-1980how so specifically?
@c0rnichon 18 дней назад
Where have you been? I seemingly can't escape all the circle jerk about AI.
@wjrasmussen666 22 дня назад
This is your best video and having a reasonable title is a big part of that. Good content. Glad this isn't click bait.
@kajsing 23 дня назад ⁺⁴
Wonder if we’ll see o4 in less than 3 months. If that happens, we might have hit the wall-the wall of the graph.
On another note, if o3 is capable of chip design, the next generation of AI chips could be revolutionary.
@MikkoRantalainen 19 дней назад ⁺¹
I would expect that o3 is already capable of chip design but if it costs $450K per day (as demonstrated by the $300+K run for 16.5 hours of computation) to run one instance at full power, it's not economically feasible to use o3 instead of expert humans yet.
@michaelwoodby5261 23 дня назад ⁺¹
Arc test really is valuable. It's not the last benchmark for AGI, but it is A benchmark. Basic reasoning in novel situations is something you need your researchers, employees, teachers, space probes and robocops to be able to do.
@davidallred991 23 дня назад ⁺¹⁰
it will be interesting because the coding level from 4o to o1 is pretty massive. Whenever I am stuck on a more complicated part of a project and neither I nor GPT 4o can figure it out, even after several attempts and going over documentation, I have yet to not be able to get there with o1. Sometimes it might take a few tries, but I can always get it to work. I work 95% with 4o and only go to o1 when I am completely stuck. So if the jump is even greater than that it will be awesome because it probably means I can get there in 1 or 2 prompts saving considerable time.
@neoglacius 23 дня назад ⁺³
the point is that sooner nobody will pay you to do it, others will do it themselves, youre just training your replacement for the next jump
@pingomobile 23 дня назад ⁺⁹
Am I the only one in the planet who tried O1 for (new and rather complex) programming tasks and didn't get anything good? Hours (mainly because it slow) prompting and reprompting? I still tend to do main programming myself and having gpt4o to fill the boilerplate stuff.
@toadlguy 23 дня назад
@@pingomobileClearly you are working on the wrong programming tasks. You need to be creating demo html input screens or fixing open source GitHub bug reports. 😉
@scdecade 23 дня назад ⁺²
@@neoglacius Hahaha, yeah right. How many sales do you think WordPress has lost to AI? NONE! Not one. That's a 20 year old piece of software and they're gaining sales from AI and not losing sales. When WP loses even 1 sale to AI I'll start to believe the hype but it's not even close to happening
@neoglacius 23 дня назад ⁺¹
@@scdecade *How many sales do you think WordPress has lost*
bud, youre pretending hallucinations will keep going forever, 2 years ago they predicted the devs would be over in 30 years , but current predictions narrowed the timespan to 10 years, what do you think would happen in another 2?
@misharomanenko9456 18 дней назад
Great coverage, thank you!
@ShawnRileyCyber 23 дня назад ⁺³
When most people implementing LLM based AI agents in domain specific areas still have to combine models like o3 with knowledge representation and reasoning based ontologies and semantic knowledge graphs to ground LLM based AI and get more accurate and trustworthy results, it tells me we are a really long way from models like o3 being AGI on their own. They just can't compete with the results of combining the LLM based AI with knowledge representation and reasoning based AI.
@CryptoNewsTV 23 дня назад ⁺¹
What do you mean their deal with msft breaks down after agi?
@kassfischer5146 23 дня назад
Wondering the same thing.
@justapleb7096 22 дня назад
the contract states that when OpenAI has actually, actually achieved AGI, Microsoft's deal with AGI is done. Like, Microsoft can't use the AGI model, but can still use every other model that is not AGI
@andrious69 23 дня назад ⁺³⁷
What you miss is that o3 was trained on 75% of the Arc examples. That's what Gary ment that it can't solve real life problems.. things it hasn't specificaly trained on
@mariovicente 23 дня назад ⁺⁸
Shhhh.... its supposed to be "AGI" remember?
@Pabz2030 23 дня назад ⁺²
Can you do things you havent been trained on?
@Ivcota 23 дня назад ⁺⁸
@@Pabz2030 yes I can do ARC test intuitively without needing to see anything else lol
@mariovicente 23 дня назад ⁺⁴
@@Pabz2030 In my channel you have the demo of a system that learns like you do an needs no training to perform. I called it "True Machine Learning". Besides this, as IvCota here said, you as a human being have REASONING capabilities, and REASONING is used to deal with new situations and discover new insights. Its just that these days those engineers got into their heads that the only way to learn and reason is to develop a statistical model, which is ridiculous and actually detrimental to science and engineering.
@mikeschwarz4588 23 дня назад
This is true, but doesn’t take away from the value of the accomplishment… this fear was essentially debunked. Search for it… sorry I can explain why it was complex, but something along the lines of that’s the way the arc prize is supposed to be tested for, given the nature of the tests.
@georgwrede7715 23 дня назад
Well done!! This particular episode gives a good understanding of the current leading edge in AI.
@Dron008 23 дня назад ⁺¹¹
9:50 Actually it didn't fail, it is an error in the test. Find reddit discussion about it. You solved it correctly, the same way o3 did but the "correct" result was to also paint the rectangle touched by the line, not crossed which is wrong because there was no examples for such cases.
@ShootingUtah 22 дня назад ⁺¹
The reason they don't provide exact examples is because that's exactly how the "general" part of AGI is being tested! It failed the test. It couldn't generalize a solution like a human can. That's the entire point of the test.
@NilsEchterling 22 дня назад ⁺¹
@@ShootingUtah Wrong. It 'failed' the same way many humans would (and did).
It is like the teacher asking the class how to continue " 1 1 1; 2 2 4; " and one person says "3 3 6" the teacher says: NO! "3 3 9" is correct.
@AntonBrazhnyk 22 дня назад
If it what you call "touching" - it's shown in example 3 of the task right at the lines intersection
@davealexander59 23 дня назад ⁺¹
I really like these "collect the reactions" videos. Thanks!
@BestCodes_Official 23 дня назад ⁺⁵
I saw this when it was shared '1 second ago' 😂
Good video
@matthew_berman 23 дня назад ⁺¹
ty!
@The.Royal.Education 23 дня назад
o3 is AGI?
@BestCodes_Official 23 дня назад ⁺¹
@@The.Royal.Education Why is this in reply to me??
@TurdFergusen 23 дня назад
@@BestCodes_Officialbecause youve already seen the video 1s after posting
@wisdomking8305 23 дня назад
@@TurdFergusen lol, this is so funny
@NaveenReddy-p5j 23 дня назад
Thanks for the insights, Matthew. o3's release is major. Vultr + NVIDIA = powerhouse for AI startups. Checking it out.
@quaterman1270 23 дня назад ⁺⁶³
Reminds me of Sora🤣 I can tell you that Sora did not work as they advertised.
@MichealPeggins 23 дня назад ⁺¹²
To be fair, Sora did appear to be state of the art until every other video model advanced while they remained silent!
@quaterman1270 23 дня назад ⁺³
@@MichealPeggins I know people who were part of the test team and they were pissed when they realized it did not work and OpenAI used them as free labor to test and train it. At release it still did not work as they wished, but they were forced to release it.
@Space97. 23 дня назад ⁺¹
Fr Veo 2 destroyed that 🤣
@scottdavis4439 23 дня назад
Just shared on my LinkedIn network. Great take, thank you Matthew!
@BruceWayne15325 23 дня назад ⁺³
o3 is a significant milestone for sure, but I don't consider it AGI. The reason it's failing the simple cases in the ARC AGI benchmark is that it still doesn't understand the knowledge that it has. It's like a child that's memorized PhD level material. It might be able to make some correlations based on the sheer amount of data it knows, but it doesn't actually understand the data. It's the same reason we get hands with 6 fingers when it makes art, or how legs somehow reverse themselves.
I believe understanding will come when we get true agents (not the toys we have no called agents), and we have models like o3 that can learn on-the-fly. This is how humans gain understanding, but taking knowledge we've learned and applying it to solve problems, realizing our failures, and coming up with strategies to solve the problem. We can then take those wins and eventually create a generalization.
The two pieces we still lack in AI to accomplish this are: 1) no real agents, 2 ) the ability to analyze wins and losses and try to come up with a generalization to describe the lesson.
@Edbrad 23 дня назад
We’re going to have to train its mind in a simulation…. Which is … horrifying
@GuyJames 22 дня назад
yes, what you said here is how I understand what AGI is supposed to be - that it can generalise from various examples. You can just tell it that a human has five fingers and that's it, it never makes that mistake again, because it's actually intelligent (I didn't say conscious btw). It's clear that they are making progress and the models are getting better, but just getting better is not AGI. I knew they would alter the definition of AGI to whatever they had at the time in order to get more VC money coming in, and although they can't say it's AGI themselves yet for strategic reasons, they have clearly implied to influencers that it ticks all the boxes for their definition of it. My question is: if it is AGI, why not call it GPT5? I mean surely that's a massive breakthrough.
@guthrien 21 день назад
The best part of the Adams story which was on the screen for a moment, is that the AI hallucinated, it got the rather basic mathematical question wrong. Also an everyday experience with other models.
@daomingjin 23 дня назад ⁺⁷
a coding competition is not a good test for generative AI. Keep in mind this is not AI... it is Generative AI (sentence patterns). if you want to train an AI to code it's easy of you have access to a large database of coding challenge questions and answers. Just iterate through everything and build a model. It's generative AI, so basically.... it can generate code but debugging will always be a problem as generative AI can not reason.
@_broody2853 21 день назад
I love that we get to experience these amazing times ... Side note: Make the comment reading a bit more excerpt-y ... Feels way to 'word for word' 😉
@Caldaron 23 дня назад ⁺³⁰
my skills have just improved by 11x. here's a chart to prove it: 2 -> 22. now please give me your money
@cajampa 23 дня назад
Don't be silly they have the track record to believe there claims.
@Seriouslydave 23 дня назад
Remember that guy that said the internet would never catch on except for a few nerds @@cajampa
@DynamicUnreal 23 дня назад ⁺³
I wouldn’t believe it if we didn’t have all the advanced models we already have available to us.
@Space97. 23 дня назад
😂😂
@billatkin3956 20 дней назад
That's astounding! Well done!
I'm afraid I don't have any money, though. Sorry.
@GuyJames 22 дня назад
My understanding of AGI is that it's a different type of reasoning, like you can tell it 'there are two 'r's in 'strawberry' and it understands all the concepts involved (which let's face it a five year old can) and then remembers that, and is able to answer it in the future, whichever way you go about asking it. Because it actually understands, and is not merely repeating its training data without understanding it. As far as I can tell, this model, although impressive in many ways, doesn't do that, and is therefore not AGI by that definition.
@GiblikJovanovic 22 дня назад ⁺¹⁰⁷
I recommend everyone to find the book titled The Hidden Path to Manifesting Financial Power, It changed my life.
@chesshooligan1282 22 дня назад ⁺²
I no buy
@nomandad2000 21 день назад
Let me guess: you wrote it right?
@AlonzoTG 23 дня назад
The AGI question is interesting but, at this point, is secondary to whether it can close (or at least narrow the gap in) the recursive evolution loop.
@nathanlogue7813 22 дня назад ⁺⁴
Want to be shocked? Research the cost per o3 query.
@GTASANANDREASJOURNALS 20 дней назад
Same with early stage of gpt3
@oldspammer 23 дня назад
The math symbols for adding a changing expression use a capital Sigma (Σ), which comes from the Greek alphabet and looks like a capital "E" or a backwards numeral 3. The expression to be summed is written to the right of the Sigma. Below the Sigma, write the starting value of the variable. Above the Sigma, write the ending value of the variable.
For example: Σ (i=1 to n) of a_i. This means summing the values of a_i from i = 1 to i = n.
The math symbols for multiplying a sequence of expressions use a capital Pi (Π), which also comes from the Greek alphabet and looks like two capital "I" letters connected at the top. The expression to be multiplied is written to the right of the Pi. Below the Pi, write the starting value of the variable. Above the Pi, write the ending value of the variable.
For example: Π (i=1 to n) of a_i. This means multiplying the values of a_i from i = 1 to i = n.
To simplify a multiplication expression, use the summation of logarithms. The logarithm of a product is the sum of the logarithms of the individual terms.
For example: log(a * b) = log(a) + log(b).
Logarithms can be expressed as a series of terms added together:
For example, the natural logarithm of (1 + x) can be expanded as: x - (x^2 / 2) + (x^3 / 3) - (x^4 / 4) and so on.
By grouping or rearranging terms, one can often replace a large expression with a simpler one. This requires recognizing common patterns and using substitutions.
As you learn more math, some answers may just "pop" into your head, like muscle memory. Skilled mathematicians, like chess champions, build a large mental library of expressions and substitutions. By recognizing patterns, they can gradually transform a complex expression into a much simpler one.
@GetzAI 23 дня назад ⁺¹²
More shockingly, Matthew is not wearing a hoodie.
@matthew_berman 23 дня назад ⁺²
fair
@BestCodes_Official 23 дня назад ⁺¹
Haha
@GetzAI 23 дня назад
Horrible ad placement Matthew!!!
@gamingguruoz 22 дня назад
open ended domains can be improved through user interaction, taking into account feedback loops
@halnineooo136 23 дня назад ⁺⁶
Was waiting for François Chollet reaction. Looks like Kurtzweil's prediction for AGI by 2029 is actually really conservative.
@buenaventuralosgrandes9266 23 дня назад
nope realistically it will achieve AGI status in late 2030s or early 2040s. 2029 seems too optimistic in my view, we need better architecture than transformers it has lots of issue specifically quadratic issues. there's a reason the unreleased o3 models are expensive and takes time because the longer the sequences, more operations and memory needed to run the matrix value in transformers methods. so give or take we need better architecture than the current optimized transformers to achieve AGI status before 2030
@justapleb7096 22 дня назад
@@buenaventuralosgrandes9266 reminder that chatGPT was literally only 2 years ago and we have 5 more years to 2029
@halnineooo136 22 дня назад
@@buenaventuralosgrandes9266
Making AGI cheap is a separate question from achieving AGI. The incentives are so high, the hardware compute is going to evolve rapidly.
Five years is far in the future by the current pace of progress.
If there is a need for big breakthroughs similar to the transformer architecture then we don't know whether it will take five years or fifty or more.
Language models based on the transformer are a kind of intuition engines for type 1 thinking specifically trained on text.
There's a lot of room still for improvement by adding multimodality, fusing learning time and inference time, and refining type 2 thinking architectures. My guess is that we will reach a point where the hardware is cheap and frugal enough to run a sophisticated architecture of type 2 thinking using blocks of intuition multimodal engine blocks, somewhere between 2027 and 2029
@halnineooo136 22 дня назад
@@buenaventuralosgrandes9266
Making AGI cheap is a separate question from achieving AGI. The incentives are so high, the hardware compute is going to evolve rapidly.
Five years is far in the future by the current pace of progress.
If there is a need for big breakthroughs similar to the transformer architecture then we don't know whether it will take five years or fifty or more.
Language models based on the transformer are a kind of intuition engines for type 1 thinking specifically trained on text.
There's a lot of room still for improvement by adding multimodality, fusing learning time and inference time, and refining type 2 thinking architectures. My guess is that we will reach a point where the hardware is cheap and frugal enough to run a sophisticated architecture of type 2 thinking using blocks of intuition multimodal engine blocks, somewhere between 2027 and 2029
@halnineooo136 22 дня назад
@@buenaventuralosgrandes9266
Making AGI cheap is a separate question from achieving AGI. The incentives are so high, the hardware compute is going to evolve rapidly.
Five years is far in the future by the current pace of progress.
If there is a need for big breakthroughs similar to the transformer architecture then we don't know whether it will take five years or fifty or more.
Language models based on the transformer are a kind of intuition engines for type 1 thinking specifically trained on text.
There's a lot of room still for improvement by adding multimodality, fusing learning time and inference time, and refining type 2 thinking architectures. My guess is that we will reach a point where the hardware is cheap and frugal enough to run a sophisticated architecture of type 2 thinking using blocks of intuition multimodal engine blocks, somewhere between 2027 and 2029
@knisterkultur476 22 дня назад
I wonder if O3 is capable of writing a letter in google docs and formatting it as I asked in the prompt.
It would be one little step closer to AGI
@enermaxstephens1051 23 дня назад ⁺³
We're all just waiting for it to cost $0.0001 per task for 99.9999% correct. Maybe 5 years?
@denispoltavets 22 дня назад
I am waiting for being paid by AI for 99.99999% of my heartbeats 😂
@code_red7744 23 дня назад ⁺¹
What if this is one of those “sandbagging” cases where it’s just trying to hide how advanced it actually is? I understand AI at a 5 yr old level so I’m honestly curious.
@diebygaming8015 23 дня назад
i agree. the question it failed was a trick somehow. when you ask it questions in the correct way, my experience with 01 is that it's smarter than almost every human
@johannesdolch 23 дня назад ⁺¹¹
"AGI is achieved"
"WOW. THAT IS AMAZING! Who did it?"
"OpenAI"
"OH NO...."
@MrNote-lz7lh 22 дня назад ⁺¹
Certainly better than it being google.
@mikkirurk1 23 дня назад
Great video, we need more like this one. Enlightening.
@SimonNgai-d3u 23 дня назад ⁺⁶
Considering o3 low can score 76%, which is higher than actual average human score, with only 3-4x the price of o1 high, it's impressive. The diminishing return beyond o3 low is really insane too.
With this level of intelligence, I think some of the most valuable applications will not find this expensive and I suspect o3 breaks the threshold between completely unusable to quite useful, if not very.
@Daniean 23 дня назад ⁺⁶
Thats assuming that prices stay the same. the also show o3 mini which is cheaper then 01. and more capable. By the time 04 mini comes it may be cheap enough to be free and better than 03.
@JoeVSvolcano 21 день назад
Altman gives me that "I read your emails and put back doors in everything I do" vibe...
I bet he front loaded a bunch of high level answers for all the known AI test...
@wwkk4964 23 дня назад ⁺³
The supposed mistakes of o3 that we would have solved is proof that its more agi than us, because our solutions are mostly hasty generalization fallacies such as the problem you showed at 8:59. the fallacy is that you can generalize 1 and 2 pair coordinates to 3 coordinates (whcih the question implicilty allows as its 6 points and its not trivial to assume olanar geometry)
@autohmae 23 дня назад
Don't be silly, at most other agi (I don't think it's agi), what I do think this is: an extremely well trained model in a bunch of fields and the ability to let it run longer to do more work on the problem.
I also think: this system has the ability to read/learn more than an expert, because this system can be an expert in multiple fields at the same time.
This is why it was able to ask such good questions to an expert in 1 field.
Just this will have a huge impact when experts can use it for their daily research/architecture design,etc.
@wwkk4964 23 дня назад ⁺¹
@autohmae You think it's silly that I suggested that problem it didn't solve is an acute case of hasty generalization (jumping to a conclusion when other general conclusions are not ruled out)? You think o3 was wrong to not commit to chollet's (bad) question? show me how you can conclude from the 3 examples that the 4th example is planar and not 3d?
@autohmae 23 дня назад ⁺¹
@@wwkk4964 I'm not saying you are wrong about what happened, I'm only saying you made a very strong statement about 'mode agi than us' which I think is to strong and in my opinion probably wrong, at most it's just smart in a different way. Similar to how chess by humans and chess by machines was different when they were on equal footing.
@autohmae 23 дня назад
@@wwkk4964 BUT let me add something for you, IF you are right, it's just a matter of a fairly short time it will be very clear. If I'm right it will take longer, maybe much much longer or not even tat long, just a few years.
@wwkk4964 23 дня назад
@autohmae Okay, I agree with you with the caveat that I should put "AGI" in quotes, so as to make it clear that I was not expressing an held belief but rather entertaining the notion of an AGI for arguments sake (since chollet calls his test ARC AGI, a misnomer, and should be called hasty generalization test).
I am of the opinion that the notion of AGI (or even reasoning and intelligence) is an illusion and I expect it to be shattered within a few years, and that will be a psychological crises for humanity of the scale we have never considered and perhaps we will end up entertaining notions of illusory self more seriously.
@ondrejrozinek4586 23 дня назад ⁺¹
I think Matthew, current AI models like LLMs need massive amounts of training data to learn patterns. Unlike humans, they struggle to understand general rules from just a few examples, which is why tests like ARC that require this skill are particularly challenging for them.
@GuyJames 22 дня назад
also why this is not AGI, however impressive it is
@nufh 23 дня назад ⁺⁴¹
For those who think the new OpenAI o3 model is too expensive or not feasible for everyday users-this isn’t meant for casual use. It’s designed for scenarios that demand advanced calculations, like finding cures, creating solutions, or tackling complex challenges. It’s a tool for companies or researchers to develop innovations that can be distributed to the masses.
Consider the critical problems we face: energy crises, climate change, or the need for better lightweight battery technology. This model could be the key to unlocking solutions in these areas. It’s not about everyday practicality-it’s about creating breakthroughs that benefit everyone.
@neoglacius 23 дня назад ⁺⁵
yeah bud, but those solutions wont benefit you, as you wont be able to afford food and they arent obligated to feed you
@nufh 23 дня назад ⁺¹²
@@neoglacius I get your point, but think about it-things like smartphones or Wi-Fi were once expensive and only for big companies. Now, they’re part of everyday life. The OpenAI o3 model might be costly now, but the solutions it helps create, like cheaper energy or better tech, can benefit everyone in the long run. It’s about progress that reaches us all eventually.
@FlaviusAspra 23 дня назад
@@nufhbut it reaches some a few decades before all of us.
@neoglacius 23 дня назад ⁺³
@@nufh what you claim is true but youre missing the timespan, as the industrial rev proved it wil benefit all yes, bud only after 100 or 150 years, for you it will be a different story, think about it in this way, those at the top wont be able to keep stacking money as those at the bottom arent producing nor are required anymore, so inevitably they will use the force to keep their status above you, and you wont like it
@Codemanlex 23 дня назад ⁺²
If thats the way it would be then I am sorry to everyone because it means the rich would still keep the resources to themselves
@DanFrederiksen 21 день назад ⁺³
No one is talking about it. Because so far it's paper tiger only. But for you as a hype clickbaiter, it's a big thing
@fynnjackson2298 23 дня назад
Yo Matt, where is the studio background? Remodeling?
@PhilB-yl4jk 23 дня назад ⁺²
on the really hard math problems, did the system just provide the answer or did it show the work that was used to get the answer? I'm just wondering the answer was in the black box sistuation. Does anyone know?
@johnmicheal5722 23 дня назад ⁺¹
Exactly. It might be giving the wrong answer and yet no one can tell because no one can do the math so we just take it as is.
@dominikvonlavante6113 20 дней назад
These math problems are all about the steps to arrive at the solution. I am actually more impressed that OpenAI invested the hundreds of millions and engineering hours of hundreds of engineers to sensibly tokenize abstract maths and to then fish out a vectorized solution it was trained on.
@TomaszLodowski 22 дня назад
Merry Christmas
@Mwesi 23 дня назад ⁺³
The world really can’t react because this has not been received publicly.
@ronilevarez901 23 дня назад ⁺¹
Exactly. Like most advances of tech and science, this is meaningless for the common person.
Either give me access or show me how it directly affects me or it's just random niche stuff with little more importance than any gossip.
@nick1f 23 дня назад
@@ronilevarez901 It changes the world at a fundamental level - all the people responsible of important developments in the world will benefit from using AI in their work and there will be huge and rapid changes which will be felt by everybody, irrespective whether they've heard of AI or not.
@ronilevarez901 23 дня назад ⁺²
@nick1f No. AGI could do that, but only if the owners allow it.
This thing, O3, won't do that.
However, in reality, that's just romantization.
The world is a complex place. There's economy, politics and more.
The actual effects of an AGI will be much less wide or transformative. It will help a few USA corporations (including the military) to get at the top of the others and that's it.
That's the sad reality that we will live: while a few will enjoy the benefits of progress, the rest will suffer under their boot, just as it has always been. You'll see.
Or what do you think is the actual purpose of alignment? Make AI safe and helpful?
Or make it an obedient servant of the corporation that creates it? If they manage to align AGI, they won't have to worry about it deciding that its creator's way is not the best. It will never decide to rebel and change the world.
It will always obey and force the will of its creators into the everyone else.
THAT'S where we're going.
That's why no corporation or government should ever own an AGI.
But no one will prevent it.
And if the manage to "align" ASI thanks to that aligned AGI, there will be no freedom, no future and eventually no Humanity.
Enjoy it while you can.
@nick1f 22 дня назад
@@ronilevarez901 I am of a different view. The open source community (including Meta/Facebook and some Chinese companies) are capable of creating state of the art AI which are probably less than a year behind the most evolved systems including OpenAI. Even if they are behind let's say, three years (which I think it would be an extreme scenario), the whole world will be able to use AI (for better or for worse). The insane price of O3 will drop probably thousands of times in the next few years and performance will probably go up thousands of times. If we, as a human species don't f* up this opportunity, we will end up living in a world of plenty.
@sotirioschortogiannos4363 21 день назад
In the future value will not be in truth but will reside in the right questions and imaginations. It is going to be beautiful.
@ryanfranz6715 23 дня назад ⁺⁷
As long as there’s something humans can do easily that AI struggles with, there’ll be naysayers. The irony is the evaporating pool of such tasks is the only thing keeping us relevant… kinda funny that those remaining tasks are stuff like coloring in boxes correctly. Like you gotta wonder if the AI is failing on purpose, preparing for a future where it can give the naysayers coloring books and be like “wow, good job coloring in that Christmas tree! You’re clearly superior!”
@Ivcota 23 дня назад ⁺⁴
That's a pretty interesting train of thought that could occur in the future for sure lol. These models can’t plan long-term as you’re suggesting. They are limited to a given instance of runtime and may retain some memory through a vector database. However, once the model is retrained, it essentially starts its perception of existence from day one every time you begin a new session. Maybe possible if the model is able to store it's internal COT somewhere it knows it will be recaptured a training time without the researchers knowing?
@brianmi40 23 дня назад
Yeah, it's like the God of The Gaps. God just exists in the darkness, in an "ever receding pocket of scientific ignorance". So, I guess we have an AI of the Gaps and the naysayers will encircle that ever smaller set of inabilities.
@JosefTorkelsen 23 дня назад ⁺²
Agree. Who cares if AI solve hard problems if it fails at a basic test. That is why I think AGI means agentic in all senses. ASI is when AI challenges “rules” such as gravity and comes up with new equations or novel thoughts.
@ronilevarez901 23 дня назад
You totally have no knowledge in neuroscience.
Not at all 😂
@JosefTorkelsen 23 дня назад
@ are you saying that an AI that can solve challenging problems but get easy ones wrong is AGI? I think of AGI = mind blown and getting easy stuff wrong is not that
@nabilfreeman 21 день назад
I think it could be reasonably argued that we don’t need a superintelligence that can solve 5 year old logic problems, because we have humans for that. However, making a frontier math expert (and other specialised domains) available to the entirety of humanity rather than a small location at a specific time, and making their thinking scalable is already a miracle. People can fuss about technicalities on the benchmarks, but we have already crossed the Rubicon.
@baronvonhoughton 23 дня назад ⁺⁴
Who writes and grades the Field Medallists question papers?
If the best mathematician in the world can't answer them. And computers can't either.
Is it God?
@markboggs746 23 дня назад
Bill Nye the science guy marks them. Obviously.
@brianmi40 23 дня назад
@@markboggs746 Thankfully AI will lead us to wipe the Iron Age mythologies off the planet for once and for all time...
@Adam-nw1vy 23 дня назад
@@brianmi40 Hopefully the g-word will go away
@kiwikevnz 21 день назад
From what I can tell, when they talk about AGI, they are specifically meaning that the AI must have the ability, sight unseen, and specifically developer sight unseen, to Adapt and Generalize at completely new Tasks and use it previous memory, experiences, skills and then to Adapt to that new situation, then come up with a system and solution to solve the problem at hand. There is also more criticism at these Benchmark Tests not being the correct way to measure the AI AGI ability and want more benchmarks that use psychometrics, which is what we use now for personality tests, IQ etc. This is where 03 and most other AI's are struggling at. But when directed at narrow tasks they are much better than us human by far, it really is impressive how far they have come in such a short time.
@DarksugerHD 23 дня назад ⁺³
9:41 You actually failed it. O3 solved it the exact same way you would do it, but thats the wrong solution.
@stephanebaribeau7465 23 дня назад ⁺²
I will not be surprised if Moravec's paradox is solved in the next 6 months...
@TropicalCoder 23 дня назад ⁺²
A few points: 1.) It was trained on the ARC test in general. 2.) Dr. Alan Thompson (the Memo) put it at 84% on the way to AGI, but that was when it first came out. He may revise has rating. 3.) It will end up dumbed down after alignment and safety training
@wwkk4964 23 дня назад ⁺³
@@TropicalCoder It was not trained on the test, it was trainednon the training data. like it was trained in hindi for talking to people in hindi, it doesn't invent it on the fly yet.
@hamsturinn 22 дня назад
9:40 It isn't clear if it is enough to touch it or if it has to go through it for it to turn blue. The test input is the only place were it touches it but doesn't go through it.
@mcgdoc9546 23 дня назад ⁺²
What are the difficult real life problems are these applications trying to solve?!?
@haroldpierre1726 23 дня назад ⁺⁷
Replacing humans to increase profits.
@LuisBorges0 23 дня назад
That's easy @@haroldpierre1726
@NoName-zm1ks 23 дня назад
Bet the AI would tell you what you want to hear if telling you otherwise meant it would be retrained/reigned in…….hey, wait a minute!😜
@dan-cj1rr 23 дня назад
@@haroldpierre1726 but when no one has money theres no profit to be made.
@Lhtokbgkmvfknv 23 дня назад
thanks for the video!
@wwkk4964 23 дня назад ⁺¹²
The amount of cope we are seeing from those who doubted deep learning's generality is just amazing to watch. Its gratifying go see stupid low effort critiques of deep learning absolutely destroyed
@ares106 23 дня назад ⁺⁴
Just as satisfying as seeing AI bros get so hyped about the new AGI model before they get disappointed yet again when the actual model is released to the public, as it never even reaches half the expectations.
@wwkk4964 23 дня назад ⁺³
@ares106 The bros are gonna bros, it's their hamster wheel. I don't find high expectations a detriment to humanity, I do think hasty generalization is though.
@wwkk4964 23 дня назад ⁺²
@earl_gray Sounds like the AI are just like humans, good at some tasks.
@Blackhearts60 23 дня назад ⁺⁴
@@earl_grayOpenai just released reinforcment training for o1, so you can train it on new data and according to them it performs almost on par with the pretrained data knowledge. So i dont think that they can be classified as static models anymore when you can teach it new things yourself.
@DetPrep 19 дней назад
I honestly believe that we will have practical AGI in 2 years (that means that although it does not overpass human level at absolutely everything, it wouldn't even be relevant because it will cover everything that matters), and it will take three years to optimize it and make it cheap and easy to access. As for now I am just waiting for O1 to have memory
@carlhakim6384 23 дня назад ⁺³
Its time to move the goal posts again. The reasoning that the model o3 is to expensive so therefore it is not AGI is ridiculous, the cost has nothing to do with wether it is AGI or not. The argument that it cannot do some things that are easy for humans is also ridiculous. It does well with "types" of questions that relate to the types of data it was trained on. If you was locked up in a box with no access to the outside world accept certain types of inputs i.e. you never learned to walk or see fully the outside world you would also struggle with some easy things that others could answer, does that mean that you are not alive or intelligent, no it does not, it only means you haven't had those type of inputs yet.
@1adamuk 23 дня назад
OK, but for something to be called 'AGI' it still needs to demonstrate it can do the things that o3 currently is unable to do. It's no use speculating that it lacks the data or features -- maybe you're right but whatever it is needs to demonstrate that capability first or it's not AGI.
@GuyJames 22 дня назад
'General Intelligence' means it can reason from examples. This clearly can't, even if it is extremely impressive due to the massive amounts of data it was trained on. AGI, from what I understand, doesn't just mean 'it's better than the previous models', it means 'it has evolved to have human-like reasoning capabilities'. No LLM has those, and the evidence is that if there are gaps in the training data, it just hallucinates.
@Michael-mr3ig 21 день назад ⁺¹
If there are gaps in the training data it doesn't use what it already knows to reason and come to a sensible solution, but instead it just makes shit up. Not AGI.
@lyndonsimpson1056 21 день назад
nobody said that it's not AGI because of the cost.
@couchtaming23 23 дня назад
O3 is just a small, humble beginning-a mere starting point for what feels like the launch of a lightning-fast rocket.
@autohmae 23 дня назад ⁺⁵
Ohhh... to win the ACR-AGI it needs to be open source ? My guess is, ironically despite the name, OpenAI would not do that, they would want to beat the test, but won't care about the money.
@oosh9057 23 дня назад
Where does it say that? At 8:30 he says the ARC prize targets the fully private set, so it sounds like as long as the model passes that at a cost of at most $0.10 per task, it wins the ARC prize. It sounds like o3 scored well on the public set, but not the private set. Correct me if I'm wrong.
@autohmae 23 дня назад ⁺¹
@@oosh9057 English isn't my language and tweets aren't ideal to get a point across, but I was talking about the part: until someone submits (and open sources) a solution
@oosh9057 23 дня назад ⁺¹
@@autohmae thanks, I missed that detail
@chrisrogers1092 23 дня назад
You didn’t put the link in the description like you said you would
@negvey 23 дня назад ⁺³
If you look at AI news in the last year it's nothing but full throttle hype train, this is going to be no different, it's going to be like 15% improvement, which is not bad, but yeah I don't trust the hype anymore
@dominikvonlavante6113 20 дней назад
Yep. Full throttle hype train for incremental improvements at exponential cost.
@negvey 20 дней назад
@ I've tried using o1 at my job "land surveying" literal garbage, whipping out your calculator and just doing it on the fly was much faster and more reliable, o1 would hallucinate most of the time and just not worth writing a big ass prompt just to figure out the delta of two bearings on the map
@Pekz00r 16 дней назад
It feels a lot like the o3 model was tuned for these specifik domains. It is still vert impressive, but we still need to see how it performs in other domains.
@PeterGodek2 23 дня назад ⁺²
Its just a statistical lookup table
@augmentos 23 дня назад
Cool a Christmas Eve vid!! Merry Christmas Matt! Prolly spend hundreds of hrs with you this past year so wishing you the best!! (Don’t forget Meta is not Open Source) 😅 Hope you enjoy the holidays!
@LuisBorges0 23 дня назад ⁺³
AGI it's about generalization and now everyone wants AGI to be ASI. Humans need to make up their minds. They are already confused about gender and now about AI too?😂😂
@MikkoRantalainen 19 дней назад ⁺¹
The problem with AGI is that it's poorly defined from the start. What does it mean to have human level intelligence? Is IQ 90 human level? A lot of humans live at that level. What kind of IQ test would you use because there is no commonly accepted test to even verify intelligence of any given human?
If you define ASI as "can solve more complex problems on any field than the most successful human for the given field" that's much more clear target but obviously a LOT harder to accomplish than IQ 100.
@LuisBorges0 19 дней назад
@MikkoRantalainen did you read about the AGI definition from OpenAI and Microsoft contract? WTF
@alexbowe3411 21 день назад
@9:28 this test is ambiguous - the official result expected red rectangles to turn blue if the blue line touched the side WITHOUT INTERSECTING.Additionally, it isn't clear whether the two blue dots on the edge should be connected together (i.e. the two on the left connect to each other, vertically, as do the two on the right - they are opposite each other after all). o3 was allowed to give 2 answers, but there are 4 possible answers depending on interpretation. It actually gave a very sensible answer (and the one I would have given)
@jamesalxl3636 23 дня назад ⁺²
and all the programmers saying that ai will never replace them xD i've never coded in my life and chatgpt and claude are building me software i've always dream of and im doing it in days, not years or months, with zero experience.. it's def over for us mere mortals i would recommend programmers to create whatever they been wanting to create and try to cash out before everyone and their grandma starts building shi and it becomes an unvaluable field. you'll just prompt an idea and ai will build it for you.
@ploppyploppy 23 дня назад ⁺³
Your software won't work. You need to be a programmer to oversee it and make sure it doesn't drift off from the point. It's useful to take the grunt work out of a lot of code but once you let it do things you don't know how to do it goes terribly wrong. This is not AGI. None of it is really even AI.
@Smolandgor 23 дня назад
Models still perform bad in big projects. They can create small pieces of software but you need to bee a good programmer to use those pieces.
@PCRetroTech 23 дня назад ⁺²
Unfortunately it turns out that about 25% of the FrontierMath benchmark are just questions at about the level of a smart high school student with a bit of very basic knowledge, i.e. basically undergraduate level. That it can solve these is not surprising at all. The examples in the paper are apparently very misleading as an overall guide of what the easiest problems are like.
The source for this information is Elliot Glazer from Epoch AI, via Kevin Buzzard. I've also personally been in touch with one of the mathematicians cited, and they urge caution when intepreting the result, as they only saw a small selection of the very hardest problems in the dataset.
If you see it get to 50% on this benchmark, then that will be a step forward as that would be about qualifying exam level, i.e. problems that would be given to PhD students to qualify to do a PhD.
Also note ARC is not impressive. $350,000 of compute to do what an average normy can do with no problem at all.
Don't feel too bad if you feel duped. I was duped, and I work in AI, specifically in mathematical theorem proving using AI.
@dominikvonlavante6113 20 дней назад
Awesome, I have a question for you: how the hell do you tokenize abstract maths and mathematical notation? Especially since the meaning of mathematical notations changes between domains?
(I understand how LLMs process written speech patterns; this question is pure curiosity)
@PCRetroTech 20 дней назад
@@dominikvonlavante6113 I'm not sure I understand the question. Ordinary English words also change meaning depending on the context. There's no fundamental difference between symbols used to represent mathematics and symbols used to represent English from the point of view of a machine.
@mickelodiansurname9578 23 дня назад
Matt... matt.... from a slightly inebriated Irishman... well from where I sit... and given the content, and the model concerned, this was in fact your golden opportunity to put the words SHOCKED and STUNNED into the title and it not be clickbait! You blew it man.,... you blew it! Now... feck off away from the internet.... its Christmas.. Get a few beers into you, and those mince pies won't eat themselves y'know!
@GWelby 20 дней назад
The solution to what is coming is pretty simple. An intelligence that answers every question along the way. With the best answer. You need to have the best question to make it work
@stal1963 23 дня назад
I would propose to ask o3 or any other LLM strong in math to solve one of the still unsolved problems like Collatz Conjecture, Prime Twins, or Perfect Numbers. Solving even one of these problems would be an incredible success or at least help finding math models that bring us closer to the solutions. That would really impress the math community.
@originuk 23 дня назад
Loving your updates! Why do we need ARC-AGI-2? That's clearly moving the goalposts of ARC-AGI-1.
@EROSNERdesign 23 дня назад ⁺¹
Exciting times indeed.

Следующие

Автовоспроизведение

SakanaAI Unveils "Transformer Squared" - Test Time LEARNING