I'm the furthest thing from an AI fanatic, have been saying for ages people are way over hyping LLMs, have solid technical background, et al. Anyway, without question this is AGI in its infancy form though. People can put whatever spin or use whatever justification they wish to shrug this off, but when a computer system is capable of solving highly complex, novel phd level physics problems that take skilled humans weeks i not months to solve, and solve them within seconds, that's AGI. This is the equivalent of when the cell phone was a large, expensive, heavy brick tethered to a 40lbs backpack... wasn't overly useful, but nonetheless, the technological breakthrough had been made to allow people to make phone calls from the middle of the street. This new O1 model is the exact same concept but for AGI. Now it's just a matter of refinement, miniturization, efficiency increases, integration, and all those things that are in any standard tech cycle. Fact remains though, the technical breakthrough to enable AGI has been made.
Thanks for the comment Seth. I don't agree with your definition of AGI, but it is a significant accomplishment for sure. Per Strawberry's own definition of AGI, there are still quite a few pieces missing. However, its such a vague thing that its more of a "I'll know it when I see it" thing.
"its not reasoning" i think humans are about to realize we arent that special. the GPTs are on par with a human just currently restricted as to not alarm the general public. "let them ease into Ai"
It's something there in that ballpark has been for a long time. The Q* leaks were very forthcoming and Sam himself didn't quash them he confirmed them. The full model is meant to have cracked 192 AES with COA which is no joke obviously. In the leak it details the model wanting to self port to new architecture of its own design, this all falls in line vaguely with the "shown" foundations of o1-preview. It's the synthetic data and un-monitored self training with synthetic data combined with new avenues of retrieving data that lead to this. Im 100% confident they are training a model right now with a different architecture, but also with a different training regime, involving using multimodality to synthesize visual training data, kind of like dreaming. There's many reasons humans dream lol, but one is reinforcement learning, and synthesizing data. We dream about things we haven't done yet and create a viable path forwards with no points of reference other than our imagination. This is my suspicion anyway. I think o1-preview and o1-mini are tiiiiny models with insane guard rails.
Thanks for the comment Adam. I agree and disagree. I think we're not as special as we think and GPT's are replicating some of what our brains do, but I don't think it is yet on par with us. I believe there are several more ingredients to the system yet to be done that require new tech developments beyond transformers.
David, LLM's have not shown the ability to create new ingredients, but they only can remix the ingredients they've been trained on. Until that changes, I'm very skeptical of any claim that it is actually building its own "new" architecture.
@@practical-ai-engineeringthese latest models are bad for me, my work redefines the basis of maths and science which these highly censored models flip out at me over. Feels like the future may be where innovative thinkers have no place, what a bleak future.
@@6AxisSage Yeah that's where all of the progress comes from ... the fringe. It's never from the centre where everything is peer reviewed and gets discounted or discredited by people who read 15 words of a 96 page paper.
I was trying to think what I do in my job that a llm would not. I have to make many decisions about how to progress a task. I have to consider what I think other people’s reactions will be to my decision and take those into account. I effect I’m making projections of many possible futures and weighing probability and time and effort and potential returns based on my experience and knowledge. Then I choose the best option. Does strawberry do that kind of thinking?
BBDB, thanks for the comment. I would say it does some of that kind of "thinking" well, but certainly not all of it. I also see other parts of your process/system that it doesn't do at all. It is an in-out chatbot with serious limitations, so that's expected. For example, much of your putting together many of the deep knowledge, assumptions, expectations, etc you have when considering all the key factors it doesn't do. Sometimes you have to stop and gather information to come back to later in ways that LLM's can't do. Sometimes you have to make a big set of connected non-obvious decisions that I would expect an LLM to get progressively more wrong, the more you chain them together. Some of these things can be enhanced or even solved for by using LLM's within software as tools and not directly as chatbots, but this requires AI engineering work that few have experience in. Anything close to general full automation isn't here and doesn't seem close. Full automation for just what you do would be incredibly challenging as well. Every percentage point of automation you want to increase gets exponentially more challenging. That last 1%-10% is probably not worth the effort at this time.
If I had a stupendously good memory, and could read every book on quantum physics ever written and recall every example at will, I could probably answer most questions about quantum physics quite easily, but would that mean I understand it, that I can reason about it? No, if I was asked some novel question I wouldn't know where to start, and that's where I think we're at with current models and systems
Sort of yes and sort of no Karl. Literally, yes, you don't understand it and you're correct in how the models and Strawberry process works. Effectively, though, in many cases, it doesn't matter. Simulation of intelligence may as well be intelligence. It can be argued that this is sort of how our brains work in a lot of cases anyway. Essentially, pattern recognition and recombination. I don't think it is a strong argument, but it is reasonable to a degree.
Excellent agents could already be built without o1. Its just that everyone builds them too LLM-centric and so they are brittle and unreliable. Certainly o1 is an interesting new tool to add to agents, but I don't think it really makes that much of a difference in most cases. Many times o1 gives me almost exactly the same response as GPT-4o. Agents built correctly are just SOFTWARE where LLM's are used to make decisions, perhaps, chat and other stuff. Much of an agent's use of LLM's ends up being very narrow and simple where models like GPT-3.5 work just fine. Proper AI powered software is centered around classic coding and LLM's are used as tools by that code. Bad AI powered software is LLM-centric where classic coding and other "tools" and techniques are used by the LLM's, but this is how almost everyone is trying to do it nowadays. Agents as "LLM on a loop" (as one very big name in the agents space called them) is a terrible way to go about it.
Yes, but it wasn't required to make good agents. Almost everyone is viewing agents from an LLM-centric viewpoint, which dramatically limits their capabilities. Excellent agents have been able to created since GPT-4 by thinking of them as AI-powered software where LLM's are used as tools instead of thinking of them as LLM's that use software as tools.
@@practical-ai-engineering - stipulated. However, the use of LLMs in DeepMind's Alpha Proteo and Alpha Proof, and Salesforce's Agentforce suggests to me that it is an important avenue for research and development.
Copium is my new favorite word. "It looks like it's reasoning, and it can answer PhD level questions in all domains of human knowledge, but it's not ACTUALLY reasoning..." Meanwhile no one knows where a human's thoughts come from or how they are formed, really and truly. People really think because they came out a birth canal they are magical.
@@Dababs8294 Everyone fails simple reasoning problems. That is part of using reason, we make mistakes based on heuristics. Explain to me why you might misspell a word or misplaced your keys or forget something like a name or an address? Does that mean you aren't intelligent?
I agree William, there's some copium going on, haha. Reducing a complex system to its base parts can always allow it to be dismissed easily and that is certainly going on. On the other hand, there's definitely some unreasonable hype happening.
Re: Alan -- it is because the system can only create new recipes, but not new ingredients that are not already in its training set. So, if a problem requires ingredients to solve that are not in its dataset, it fails.
Sam is a businessman and Open AI is playing the long game to stay in business. I do not trust human emotions but practical fact checking. I am a retired newspaper cartoonist in Utah so I understand the dangers of believing too much. 🤖🖖🤖👍
We don't prompt the model. If you use the API and and ask it to show you 'message.0' it should respond with it's system prompt, and you will see, all we tell it, is that it's knowledge is up until Oct 23.
You make some good points. Especially that there's still much more to build out around the models. I came to a different viewpoint, and at the moment do feel this is AGI, but I don't completely disagree with you either. It does depend how it's defined and that's about as clear a mud. I feel the raw knowledge and reasoning is on par with an average human overall though. While there is definitely work to be done, things like memory, agency, and continuous learning all have a pretty clear path forward. We won't need any breakthroughs in those spaces, just a bit of grunt work IMO.
Thanks for the view and comments DG. Haha, yeah, clear as mud is right. TBD on whether we need breakthroughs or more grunt (engineering) work to navigate the tools we have. Sometimes it is very easy to get an 85% solution, but then impossible with the current toolset to get to that 100% solution without a major breakthrough. Just ask Tesla on FSD.
If you're unable to handle nuance and multiple seemingly conflicting perspectives on the same thing, then you'll have trouble with any cutting edge technology, not just AI.
I'm the furthest thing from an AI fanatic, have been saying for ages people are way over hyping LLMs, have solid technical background, et al. Anyway, without question this is AGI in its infancy form though.
People can put whatever spin or use whatever justification they wish to shrug this off, but when a computer system is capable of solving highly complex, novel phd level physics problems that take skilled humans weeks i not months to solve, and solve them within seconds, that's AGI.
This is the equivalent of when the cell phone was a large, expensive, heavy brick tethered to a 40lbs backpack... wasn't overly useful, but nonetheless, the technological breakthrough had been made to allow people to make phone calls from the middle of the street.
This new O1 model is the exact same concept but for AGI. Now it's just a matter of refinement, miniturization, efficiency increases, integration, and all those things that are in any standard tech cycle. Fact remains though, the technical breakthrough to enable AGI has been made.
Thanks for the comment Seth. I don't agree with your definition of AGI, but it is a significant accomplishment for sure. Per Strawberry's own definition of AGI, there are still quite a few pieces missing. However, its such a vague thing that its more of a "I'll know it when I see it" thing.
"its not reasoning" i think humans are about to realize we arent that special. the GPTs are on par with a human just currently restricted as to not alarm the general public. "let them ease into Ai"
It's something there in that ballpark has been for a long time. The Q* leaks were very forthcoming and Sam himself didn't quash them he confirmed them. The full model is meant to have cracked 192 AES with COA which is no joke obviously. In the leak it details the model wanting to self port to new architecture of its own design, this all falls in line vaguely with the "shown" foundations of o1-preview. It's the synthetic data and un-monitored self training with synthetic data combined with new avenues of retrieving data that lead to this. Im 100% confident they are training a model right now with a different architecture, but also with a different training regime, involving using multimodality to synthesize visual training data, kind of like dreaming. There's many reasons humans dream lol, but one is reinforcement learning, and synthesizing data. We dream about things we haven't done yet and create a viable path forwards with no points of reference other than our imagination. This is my suspicion anyway. I think o1-preview and o1-mini are tiiiiny models with insane guard rails.
Thanks for the comment Adam.
I agree and disagree. I think we're not as special as we think and GPT's are replicating some of what our brains do, but I don't think it is yet on par with us. I believe there are several more ingredients to the system yet to be done that require new tech developments beyond transformers.
David, LLM's have not shown the ability to create new ingredients, but they only can remix the ingredients they've been trained on. Until that changes, I'm very skeptical of any claim that it is actually building its own "new" architecture.
@@practical-ai-engineeringthese latest models are bad for me, my work redefines the basis of maths and science which these highly censored models flip out at me over. Feels like the future may be where innovative thinkers have no place, what a bleak future.
@@6AxisSage Yeah that's where all of the progress comes from ... the fringe. It's never from the centre where everything is peer reviewed and gets discounted or discredited by people who read 15 words of a 96 page paper.
Seeing a fair take on AGI's ups and downs is great. Really makes you think! Thank you for this.
Thanks Gretchen!
I was trying to think what I do in my job that a llm would not.
I have to make many decisions about how to progress a task. I have to consider what I think other people’s reactions will be to my decision and take those into account.
I effect I’m making projections of many possible futures and weighing probability and time and effort and potential returns based on my experience and knowledge. Then I choose the best option.
Does strawberry do that kind of thinking?
BBDB, thanks for the comment.
I would say it does some of that kind of "thinking" well, but certainly not all of it. I also see other parts of your process/system that it doesn't do at all. It is an in-out chatbot with serious limitations, so that's expected. For example, much of your putting together many of the deep knowledge, assumptions, expectations, etc you have when considering all the key factors it doesn't do. Sometimes you have to stop and gather information to come back to later in ways that LLM's can't do. Sometimes you have to make a big set of connected non-obvious decisions that I would expect an LLM to get progressively more wrong, the more you chain them together.
Some of these things can be enhanced or even solved for by using LLM's within software as tools and not directly as chatbots, but this requires AI engineering work that few have experience in. Anything close to general full automation isn't here and doesn't seem close. Full automation for just what you do would be incredibly challenging as well. Every percentage point of automation you want to increase gets exponentially more challenging. That last 1%-10% is probably not worth the effort at this time.
If I had a stupendously good memory, and could read every book on quantum physics ever written and recall every example at will, I could probably answer most questions about quantum physics quite easily, but would that mean I understand it, that I can reason about it? No, if I was asked some novel question I wouldn't know where to start, and that's where I think we're at with current models and systems
Sort of yes and sort of no Karl.
Literally, yes, you don't understand it and you're correct in how the models and Strawberry process works.
Effectively, though, in many cases, it doesn't matter. Simulation of intelligence may as well be intelligence. It can be argued that this is sort of how our brains work in a lot of cases anyway. Essentially, pattern recognition and recombination. I don't think it is a strong argument, but it is reasonable to a degree.
o1 is still having a hard time with mathematical calculations, from my experience, but it's pretty good with far more complex concepts now.
Yeah, thanks Mitchell
The "reasoning" capabilities of o1 will enable better AI agents. The next "level" on OpenAI's ladder after "reasoners" is "agents".
Excellent agents could already be built without o1. Its just that everyone builds them too LLM-centric and so they are brittle and unreliable. Certainly o1 is an interesting new tool to add to agents, but I don't think it really makes that much of a difference in most cases. Many times o1 gives me almost exactly the same response as GPT-4o.
Agents built correctly are just SOFTWARE where LLM's are used to make decisions, perhaps, chat and other stuff. Much of an agent's use of LLM's ends up being very narrow and simple where models like GPT-3.5 work just fine.
Proper AI powered software is centered around classic coding and LLM's are used as tools by that code. Bad AI powered software is LLM-centric where classic coding and other "tools" and techniques are used by the LLM's, but this is how almost everyone is trying to do it nowadays.
Agents as "LLM on a loop" (as one very big name in the agents space called them) is a terrible way to go about it.
Yes, but it wasn't required to make good agents. Almost everyone is viewing agents from an LLM-centric viewpoint, which dramatically limits their capabilities.
Excellent agents have been able to created since GPT-4 by thinking of them as AI-powered software where LLM's are used as tools instead of thinking of them as LLM's that use software as tools.
@@practical-ai-engineering - stipulated. However, the use of LLMs in DeepMind's Alpha Proteo and Alpha Proof, and Salesforce's Agentforce suggests to me that it is an important avenue for research and development.
Dave the ASI
Haha, thanks Matt
Copium is my new favorite word. "It looks like it's reasoning, and it can answer PhD level questions in all domains of human knowledge, but it's not ACTUALLY reasoning..." Meanwhile no one knows where a human's thoughts come from or how they are formed, really and truly. People really think because they came out a birth canal they are magical.
It can answer PhD lvl questions but it fails simple reasoning problems a 9 year old gets right. How do you make sense of that?
@@Dababs8294 Lack of robustness.
There are PhDs who believe in god, yet most 9 year olds know magic isn't real, how do you make sense of that?
@@Dababs8294 Everyone fails simple reasoning problems. That is part of using reason, we make mistakes based on heuristics. Explain to me why you might misspell a word or misplaced your keys or forget something like a name or an address? Does that mean you aren't intelligent?
I agree William, there's some copium going on, haha.
Reducing a complex system to its base parts can always allow it to be dismissed easily and that is certainly going on.
On the other hand, there's definitely some unreasonable hype happening.
Re: Alan -- it is because the system can only create new recipes, but not new ingredients that are not already in its training set. So, if a problem requires ingredients to solve that are not in its dataset, it fails.
Great to hear your thoughts on this. Cleared some things up for me. Cheers!
Thanks Casey!
Sam is a businessman and Open AI is playing the long game to stay in business. I do not trust human emotions but practical fact checking. I am a retired newspaper cartoonist in Utah so I understand the dangers of believing too much. 🤖🖖🤖👍
Haha, thanks for the comment Calvin
We don't prompt the model. If you use the API and and ask it to show you 'message.0' it should respond with it's system prompt, and you will see, all we tell it, is that it's knowledge is up until Oct 23.
Hm. Not sure what your point is here. Thanks for the comment Mr. Not Existing
You make some good points. Especially that there's still much more to build out around the models. I came to a different viewpoint, and at the moment do feel this is AGI, but I don't completely disagree with you either. It does depend how it's defined and that's about as clear a mud. I feel the raw knowledge and reasoning is on par with an average human overall though. While there is definitely work to be done, things like memory, agency, and continuous learning all have a pretty clear path forward. We won't need any breakthroughs in those spaces, just a bit of grunt work IMO.
Thanks for the view and comments DG.
Haha, yeah, clear as mud is right.
TBD on whether we need breakthroughs or more grunt (engineering) work to navigate the tools we have. Sometimes it is very easy to get an 85% solution, but then impossible with the current toolset to get to that 100% solution without a major breakthrough. Just ask Tesla on FSD.
“It’s not reasoning but it is reasoning.” I stopped watching the video 4 minutes in.
If you're unable to handle nuance and multiple seemingly conflicting perspectives on the same thing, then you'll have trouble with any cutting edge technology, not just AI.
Ur wrong its not RLHF AT ALL
Take it up with OpenAI Charlie. They said this in their own materials. If it is wrong, then they lied.