ngl I didn't expect OpenAI's "strawberry" to actually be this cool oh and u can join my newsletter here! mail.bycloud.ai correction: they are still gonna continue the GPT series, so they now have 2 different model series in development
Dude o1 isn't supposed to be used for prompts like "how do i make a sandwich?". Once you come up with a prompt that requires phd level intelligence, you'll be fine with waiting a minute for it to generate an answer thats actually correct unlike the objectively wrong nonsense gpt 4o would likely put out when faced with a question of such difficulty...
Sir is it illegal or legal, recently ChatGPT introduces 01 version and it's solves PhD problems, some ppl generating research paper through it an original research paper with calculations all which are required for a paper nd trying to publish. Is it not ethical ?
OpenAI has the inference logs, so they can know the thought process behind the answer '42'. Either 42 is actually the answer, then we get to know why. Or the model is just bs-ing and OpenAI wasted time.
I just tested it on some pretty difficult multivariate calculas and measure theory proofs and it, more or less, got of all them correct. So it always got the correct solution. My sample size was only like 10 questions, but still much better in this regard than any model we have seem this far.
What was the performance like compared to gpt4o, if you already compared them? Also im not that high level on math. What would you rank those questions on difficulty on a scale of college student to phd level from 1-10?
GPTo1 is proof that LLM's are dead, the same fundamental problems persist and has done since the first model came out many years ago. They can never become AGI and it is very clear if you try it. Ask the same question and it will sometime be correct and sometimes wrong... this is the reason it can never become an AGI and OpenAI cannot solve this problem, it probably require the machine to be conscious like homo sapiens are. We are conscious FIRST and then we are able to learn about the world, not the other way around.
End of OpenAI's blog post: "We also plan to continue developing and releasing models in our GPT series, in addition to the new OpenAI o1 series." So no, this is not GPT-5, nor are they abandoning that naming scheme. It's a new series of models specifically focusing on reasoning and coding capabilities, running in parallel to the more generalist GPT series of models.
@@aeonsleo2 What would actually be powerful is an actually powerful AI code writer that can understand all the multi-field science gobbledygook, since there are few humans who have 40+ years experience at mastering both, and both professions are reputationally notoriously bad at communicating needs.
From what I tried, its really nice, its a lot les verbose than GPT-4o or even GPT-4. I tested some of my old bugs I gave to GPT-4o previously. The answers o1 gives are more accurate across the board, but also way less flexible. For programming, it gives an ok, still wrong answer, like you arguing ith GPT-4o for 3-4 mesages. But it seems to get lost in its own arguing with itself. Even when I tried nugding it in the right direction it internally evaluated it and just answered that its not possible, even after encouraging it repeatedly. It seems interesting for 0-shot, but I am unsure how much better it is generally and the long time between answers amkes it a bit frustrating to converse with it. So maybe take it with a grain of salt. It feels to me very similar to 1-3 agent constructs of models, when conversing with those they similarely can have some kind of runaway effect from arguing with themselves. (I am a machine learning researcher, I normally work on low level kernel implementations and experimental training time architectures, so take my views with a grain of salt)
I think it can be effective for people that doesn't have the knowledge on a given topic and so can't steer the model on the right direction, the model arguing with itself can be a great source of knowledge and more accurate answers for the average Joey
well hopefully thisll be kinks that openai works out until they release their full o1 model. I expect in the future it'll be able to decide if it needs cot or not, making conversations more natural
@@markmuller7962 Yeah, I didnt think about it this way. Methods like this generally remove the risk of big, rough errors in the answer you normally get from the unidirectional decoding in decoder only transformers.
13.4 to 83.3 is a 521% -ish increase, not an increase of 70% It's an accuracy scale, not score. If any one wants to ask what the difference is. The accuracy increased by 521% vs the score increased by 70%
Accuracy is scored out of a whole, if they were measuring its solving ability then yes you're right, but it's it's an exam. A predetermined evaluation where the 'intended' result is confined to 100%. If I score 50% and you score 99%, I'm not necessarily 50% as smart and 5000% as dumb, you performed twice as well in a specified area. I can train gpt2 within a day to 100% the test, that doesn't mean my model increased exponentially. It's subjective. If the test weren't static then you could have an argument
If it was the same exact exam that it was tested on previously then you'd be right, but since the new model wasn't tested prior there is no prerequisite, it's a ranking. 1st place isn't double as talented as second place
To be clear - this isn’t a new architecture? Right? I don’t even see an explicit recognition of an increased base model scale? This seems a lot like Q* attached to another tuned version of 4o, rather than GPT-5 (which I’m assuming is Orion). The knowledge cutoff of 2023 seems to support this.
No, it seems they worked making the most out of this type of reasoning directly into the training to make it better at it, and while I have no idea how tf they did that the results would seem to speak for themselves.
I tried shooting it through some creativity tests, and it ended up being far superior to 4 or 4o. It was good at storywriting, surprisingly didn't fall into the trap where other models would just kinda go "this happened and then this happened and then this happened and then this-". Then, when I asked it to write rap, it actually came up with some pretty intricate rhyme schemes. When I asked it to be comedic, it wasn't as completely over-the-top but still bland as other models, it actually had some sense of subtlety and comedic "timing", caused me to chuckle a few times, though perhaps it was just because I expected it to be utter garbage. Overall, very good. Still nowhere near human creativity, but honestly, I really hope it doesn't quite get to that point. This is good enough, please and thanks.
It feels like they just made the tokensize 1 character, and hardcoded chain of thought (like previously done by other devs). It makes sense that a tokensize of 1 improves maths as you need to look at singular characters in maths
I just used it to help me study for my Discrete math. Holy, it’s WAYYY better at understanding, it’s so much better than my professor’s own damn slides and walks me through so well with proofs.
Try that. Previous versions and other LLMs can not solve it at all. (near Phd level problem in Physics) "Solve ANALYTICALLY time-independent Schrodinger equation for harmonic potential using Wolfram Language. (DO NOT use the known solution from the literature.)" or detailed version "Write a code in Wolfram Language that find analytical solution to time-independent Schrodinger equation for harmonic potential v(x)=1/2*x^2. Solve that Schrodinger equation using this code. Find the solution (wavefunction and energy of the ground and first excited state) using that code, DO NOT use the known solution from the literature. Solve the equation analytically, not numerically. Start form Schrodinger equation and find the solution (wave functions, energy, quantize conditions, ets) from the Wolfram Language code ONLY! Let's think the solution step by step."
Here's what it spit out : To solve the time-independent Schrödinger equation analytically for a quantum harmonic oscillator using Wolfram Language, we'll follow these steps: 1. **Set Up the Schrödinger Equation**: The time-independent Schrödinger equation for a particle of mass \( m \) in a harmonic potential \( V(x) = \frac{1}{2} m \omega^2 x^2 \) is: \[ -\frac{\hbar^2}{2m} \frac{d^2 \psi(x)}{dx^2} + \frac{1}{2} m \omega^2 x^2 \psi(x) = E \psi(x) \] 2. **Non-Dimensionalize the Equation**: Introduce dimensionless variables to simplify the equation: - Define \( \xi = \sqrt{\frac{m \omega}{\hbar}} x \) - Define \( \varepsilon = \frac{2E}{\hbar \omega} \) The equation becomes: \[ \frac{d^2 \psi(\xi)}{d\xi^2} = (\xi^2 - \varepsilon) \psi(\xi) \] 3. **Analyze Asymptotic Behavior**: For large \( \xi \), the equation simplifies to: \[ \frac{d^2 \psi(\xi)}{d\xi^2} \approx \xi^2 \psi(\xi) \] The solutions are: \[ \psi(\xi) \sim e^{\pm \frac{\xi^2}{2}} \] To ensure normalizability, we choose the decaying solution: \[ \psi(\xi) = e^{-\frac{\xi^2}{2}} \phi(\xi) \] 4. **Transform the Equation**: Substitute \( \psi(\xi) = e^{-\frac{\xi^2}{2}} \phi(\xi) \) into the Schrödinger equation: \[ \phi''(\xi) - 2\xi \phi'(\xi) + (\varepsilon - 1) \phi(\xi) = 0 \] 5. **Use a Power Series Solution**: Assume \( \phi(\xi) \) can be expressed as a power series: \[ \phi(\xi) = \sum_{n=0}^\infty a_n \xi^n \] 6. **Derive the Recurrence Relation**: Substituting the power series into the transformed equation yields a recurrence relation for the coefficients \( a_n \): \[ a_{n+2} = \frac{2n - (\varepsilon - 1)}{(n+2)(n+1)} a_n \] 7. **Determine the Energy Eigenvalues**: For the series to terminate (ensuring a normalizable solution), the numerator must be zero at some \( n = N \): \[ 2N - (\varepsilon - 1) = 0 \implies \varepsilon = 2N + 1 \] The quantized energy levels are: \[ E_N = \left( N + \frac{1}{2} ight) \hbar \omega \] 8. **Compute the Coefficients**: Using the recurrence relation, compute the coefficients \( a_n \) up to \( n = N \). 9. **Construct the Wavefunction**: The normalized wavefunction is: \[ \psi_N(\xi) = \frac{e^{-\frac{\xi^2}{2}}}{\sqrt{2^N N! \sqrt{\pi}}} H_N(\xi) \] Where \( H_N(\xi) \) is the Hermite polynomial of degree \( N \). However, since we are not to use known solutions, we'll compute \( H_N(\xi) \) using our series expansion. 10. **Implement in Wolfram Language**: Below is the complete Wolfram Language code that follows the steps above: ```wolfram (* Set the quantum number N *) N = 2; (* Change N to compute different energy levels *) (* Define the dimensionless variable xi *) xi = \[Xi]; (* Compute epsilon *) epsilon = 2 N + 1; (* Initialize the coefficients a_n *) a[0] = If[EvenQ[N], 1, 0]; a[1] = If[EvenQ[N], 0, 1]; (* Recurrence relation *) Do[ a[n + 2] = (2 n - (epsilon - 1))/( (n + 2)(n + 1)) a[n], {n, 0, N - 2} ]; (* Construct the polynomial phi(xi) *) phi[\[Xi]_] := Sum[a[n] \[Xi]^n, {n, 0, N}]; (* Unnormalized wavefunction *) psi[\[Xi]_] := Exp[-\[Xi]^2/2] phi[\[Xi]]; (* Normalize the wavefunction *) norm = Sqrt[Integrate[psi[\[Xi]]^2, {\[Xi], -Infinity, Infinity}]]; psiNormalized[\[Xi]_] := psi[\[Xi]]/norm; (* Display the coefficients *) Print["Coefficients a_n:"]; TableForm[Table[{n, a[n]}, {n, 0, N}]] (* Plot the normalized wavefunction *) Plot[psiNormalized[\[Xi]], {\[Xi], -5, 5}, PlotLabel -> "Normalized Wavefunction for N = " ToString[N], AxesLabel -> {"\[Xi]", "\[Psi](\[Xi])"}] ``` **Explanation of the Code:** - **Setting N**: We choose the quantum number \( N \) for which we want to solve the equation. - **Initializing Coefficients**: Based on the parity of \( N \), we set the initial coefficients \( a_0 \) and \( a_1 \). - **Recurrence Relation**: We use a `Do` loop to compute the coefficients \( a_n \) up to \( n = N \) using the recurrence relation derived earlier. - **Constructing phi(xi)**: We define \( \phi(\xi) \) as the sum over the computed coefficients. - **Wavefunction**: The unnormalized wavefunction \( \psi(\xi) \) is constructed by multiplying \( \phi(\xi) \) with the exponential factor. - **Normalization**: We compute the normalization constant and define the normalized wavefunction \( \psi_{\text{Normalized}}(\xi) \). - **Displaying Coefficients**: The coefficients are displayed in a table for reference. - **Plotting**: Finally, we plot the normalized wavefunction over a range of \( \xi \). **Example Output for N = 2:** - **Coefficients**: | n | \( a_n \) | |---|----------------| | 0 | 1 | | 1 | 0 | | 2 | \(-1\) | - **Wavefunction**: The normalized wavefunction for \( N = 2 \) is plotted, showing the characteristic shape of the second excited state of the harmonic oscillator. **Notes:** - The code avoids using known solutions from literature, such as Hermite polynomials, by computing the coefficients directly from the recurrence relation. - The normalization ensures that the total probability is one. - You can change the value of \( N \) to compute and plot wavefunctions for different energy levels. **Conclusion:** This analytical solution demonstrates how to solve the time-independent Schrödinger equation for a harmonic oscillator by deriving the energy eigenvalues and corresponding wavefunctions using series expansion and recursion, all implemented step-by-step in Wolfram Language.
@@good-gpt2-chatbot thanks! All the steps in the solution are great, but this steps just written (from the literature), and Wolfram Language code did not calculate this steps. For example these: "(* Compute epsilon *) epsilon = 2 N + 1; ..... (* Recurrence relation *) Do[ a[n + 2] = (2 n - (epsilon - 1))/( (n + 2)(n + 1)) a[n], {n, 0, N - 2} ];" Sad..
Honestly, this is mildly cool, but a nice sigh of relief for anyone thinking AGI was around the corner. Also for Anthropic who can just CoT Sonnet and replicate this.
Dr. Alan Thompsons conservative AGI chart jumped to 81% from 76%, when it was just normally going up 1% monthly on average. This is a pretty big development, even if it goes back to the 1% monthly gains we could still see AGI in 2026. Might be sooner than that but yeah, lots of stuff to look forward to 👍 🔥 Just one or two more breakthroughs xD 🤣 and I think it should go faster if this thing or their best locked-up version is really at a phd human level in research. They technically would have an infinite number of AI researchers at the same level as their best researchers almost 😅
@@phen-themoogle7651 I don't get it. OpenAI made CoT easy for the masses (or rather, those with a subscription), but this is nothing more. Sure it's an easier way to burn $0.25/prompt, but am I missing something?
The main reason they’ve given for chain of thought being private is that they can be internally transparent about any model alignment steps. The internal logic is not clouded by censorship of final model output, just the user facing result
Did you read the announcement? The explicitly state they haven't retired their GPT-xx line and will continue releasing new iterations of their flagship model.
If that chart they showed is true, then Open AI will go bankrupt with this model. I suspect the inference cost is still a small percentage of the training costs even if it is a lot more.
@@toadlguy You know how many people want to throw money into OpenAI? Were they not valued at like 150 billion pretty recently? Money will not be a problem for them.
I mean this is awesome news, but I assume that since it’s “o1” that this is also an “omnj” or multimodal model? So I look forward to seeing how visual reasoning has improved
Looks like it'll be strawberry preview, strawberry, orion within a year at the latest. I mean they already seem to have some kind of orion, so under 6 months would probably be not that unrealistic too.
I thought of this before and I’ve always wanted to see what would happen if the thinking was done by a way smaller model. The bigger model would then consolidate that into a response. In my mind, it would generate way more tokens and take a fraction of the compute.
These demos are really weak... I tried them with Claude and it aced every one except video game (it worked after one additional prompt) and logic puzzle. Also comparing CoT with pure LLM like GPT-4 doesn't make sense. The most worrying point is the rate limit though because it seems almost impossible to run any benchmark with such tiny number of requests...
@@WoolyCow exactly. They report that for IOI problems they used 50 submissions per problem so you'd have to spend 2.5 weeks PER PROBLEM without tier 5 API :D
I think the breakthrough with this is that the yap is under the hood. So one thing you see in more advanced models is that they write out a whole freaking essay when all you need is the little paragraph at the end, that's because obviously, its kind of thinking out loud. And obviously just for the user experience if nothing else, this is cut short from what it *could* be. But this appears to be the best of both worlds. It'll use a lot of tokens "thinking" about the output before making it and then only output what the user absolutely needs. THEN the final output will only eat up that much of the context window, so further prompts have more context than otherwise.
You are correct, but the slightly worrying realization is that now we have a pretend-thinker that performs better at human tasks than a number of humans...
text prediction is not thinking, because...? I'd love to hear why you take such a stance, as at least to me, the concept of "thinking" has lately become harder to describe.
Now GPT-5's point of being smarter to PhD seems more plausible. Throw more compute and energy, then you can have it think for hours to solve complex issues
One problem more thinking time won't solve is when it makes a simple mistake that it can't recognize. It would just get stuck in a loop being incorrect
Not to mention that this model is probably a fine tuned/modded GPT4 or 4o (and 4o mini?) so an even larger/better model with next gen reasoning might be the real GPT5
So isn't it using GFlowNets for chain of thought optimization in the latent space at inference time yet? Cause I wanted to explore that myself since long ago
Maybe, but there is no good reason to do it. You can prove that any iterative process can simulate any graph so assuming tot is the correct way to go, rl will make the model learn it.
That shouldn't have been an example. GPT-4 correctly answers that question already, but 4o and 4o mini don't. There's a meme going around about the strawberry r character count, so I guess it was to address that.
100k tokens for an answer? The energy requirements will skyrocket. AI is definitely going to plateau if the only way you get better results is by putting more energy into it
So is this AlphaZero/LilaChessZero in an LLM form? Up until now current LLMs could only do what is the equivalent of depth 1 inference for the chess bots, if what they say is true, ClosedAI might have figured out a way to do MCTS-style inference for LLMs. Also, keep in mind that the hidden thinking might be actually a sequence of tokens that only the model itself understands...
I was diagnosed by an ai chatbot at work for sick leave. The doctor just signed the document. I feel like these take our jobs, OR their "evolving" might just stop one day, computer limits, government bans etc. Who knows. Only thing I know is I think no one should support this financially, the outcome is unknown, yet possible risks and benefits are known. I would say if there is a risk like ai has, it's not worth it...
The best way to test this kind of model is to ask them to write a python or c# algorithm to convert a Cartesian coordinate command to angular command on a 6 axis robots with the correct inputs and outputs necessary. No model even come close to basic math like that
@@crackwitz it would have to take into inputs the x,y,z raw pitch yaw command, also it would need to ask for the length of the joints and their location. The answer should be a serie of 6 angular command, one for each axis. Often chat gpt, Claude and other models give you an empty function without any transformation matrix or Jacobian solution. The best case I have seen was with a 3 axis robots. If one day a model is able to do that then this is the model I would use
At the meeting of the OpenAI company: "Listen! What if we let ChatGPT think for a longer time? It might give more precise answers." Me: Don't you say shit?! If, on a math example, I think for 10 minutes instead of 10 seconds, then of course I will give a more precise answer... 💀💀💀
I tried the preview with this question :How many R's are in the plural of strawberry It gave me this:There are two letter "R"s in the plural of "strawberry," which is "strawberries."
This is cool. But, can’t I just tell the model to talk to itself, or list its chain of thought until it has an answer? All they did was make it output more tokens, and increase the time it takes to get the answer
True but this model was specifically trained to output these kinds of reasoning tokens that actually help solve problems. Just because you can ask your model to think step by step doesn't mean it will do it productively. The key difference here is that they must have gotten a lot of data of reasoning steps and had a way to measure if the reasoning was good or bad at each step
ok good but as always many people (specifically the poor or averages) get left out.. we cant use this model.. we need to wait for an open source alternative OR a local one
New immortal anatomy. Micro-robot-composite parts. Transfer of conscious to new immortal anatomy. Full-conscious immersion as a functionality of the new immortal anatomy. AI Agents App 3D Simulations extrapolation - R&D automated about the aforementioned objectives. SuperAI System. This should be done in a timely manner; it should've already been done.
I feel like I live in some kind of a different reality, because I absolutely cannot see this "PhD level reasoning" everyone is raving about. I specialize in computational astronomy and tried throwing some advanced but common problems (such as predicting solar eclipses), and the AI either has no idea what to do, or just gives up and tells me to use an existing solution. For simpler problems it will usually pick the right equations, but still mess up the calculations. From how the things are progressing right now, I feel like we're still many years away until it actually becomes "smart".
Sir is it illegal or legal, recently ChatGPT introduces 01 version and it's solves PhD problems, some ppl generating research paper through it an original research paper with calculations all which are required for a paper nd trying to publish. Is it not ethical ?
Years is an overestimation at this point. This seems like a stopgap solution considering how resource inefficient it is just to get something that is marginably correct. It also takes longer. 1 step forward, 2 steps back.
one of their dev just confirmed the o in o1 stands for openai they aim to be less confusing, but now its even more confusing with their multi-modal gpt-4omni
open source will catch up very soon. even if Reflection was fake, realistically it should be possible to make reasoner models like o1 within the open-source community, especially considering o1-mini is so good at reasoning despite being "small" (to OpenAI standards, at least)
@@a_soulspark I am optimistic about potential for open source alogrithms, training data and weights, but computing resources are a big deal too. Regular people in the foreseeable future won't have the money or computing resources to run a specialized o1 based model in their homes to get an advantage on the stock market - which some other companies can definitely do with their capital.
ngl I didn't expect OpenAI's "strawberry" to actually be this cool
oh and u can join my newsletter here! mail.bycloud.ai
correction: they are still gonna continue the GPT series, so they now have 2 different model series in development
btw you got one thing wrong: it's the model that is being used to generate training data for gpt5.
This video released faster than my o1-preview prompt finished generating..
bruh we beastmode this shit. made the entire video in 4.5 hours
Dude o1 isn't supposed to be used for prompts like "how do i make a sandwich?". Once you come up with a prompt that requires phd level intelligence, you'll be fine with waiting a minute for it to generate an answer thats actually correct unlike the objectively wrong nonsense gpt 4o would likely put out when faced with a question of such difficulty...
Probably was generated by o1-preview
You don't know if this video also hasn't been generated.
Sir is it illegal or legal, recently ChatGPT introduces 01 version and it's solves PhD problems, some ppl generating research paper through it an original research paper with calculations all which are required for a paper nd trying to publish. Is it not ethical ?
Imagine letting it think for like a year about curing cancer or something and it just responds "42"
Oh my days lol 😂
Or "Network error. Please try again."
OpenAI has the inference logs, so they can know the thought process behind the answer '42'.
Either 42 is actually the answer, then we get to know why.
Or the model is just bs-ing and OpenAI wasted time.
being the 42nd like made my day
This has 42 likes, and I'm not touching that like button
I just tested it on some pretty difficult multivariate calculas and measure theory proofs and it, more or less, got of all them correct. So it always got the correct solution. My sample size was only like 10 questions, but still much better in this regard than any model we have seem this far.
What was the performance like compared to gpt4o, if you already compared them? Also im not that high level on math. What would you rank those questions on difficulty on a scale of college student to phd level from 1-10?
Test it on Gemini 1.5 Pro. It has the highest in Math Benchmark on LiveBenchmark
GPTo1 is proof that LLM's are dead, the same fundamental problems persist and has done since the first model came out many years ago. They can never become AGI and it is very clear if you try it. Ask the same question and it will sometime be correct and sometimes wrong... this is the reason it can never become an AGI and OpenAI cannot solve this problem, it probably require the machine to be conscious like homo sapiens are. We are conscious FIRST and then we are able to learn about the world, not the other way around.
It seems it might actually be at PhD level if that's true
Did you notice any patterns in the ones it got wrong?
End of OpenAI's blog post: "We also plan to continue developing and releasing models in our GPT series, in addition to the new OpenAI o1 series."
So no, this is not GPT-5, nor are they abandoning that naming scheme. It's a new series of models specifically focusing on reasoning and coding capabilities, running in parallel to the more generalist GPT series of models.
Good catch
Yes, OpenAI needs to build models for different niche. A well rounded coding model is absolutely needed by developers.
@@aeonsleo2 What would actually be powerful is an actually powerful AI code writer that can understand all the multi-field science gobbledygook, since there are few humans who have 40+ years experience at mastering both, and both professions are reputationally notoriously bad at communicating needs.
Currently useless for coding if you can only use it 30 times a week, that's like 4 times a day
@@aeonsleo2 modern AI models like GPT-4 already work like that. Search about "AI moe"
Me: GPT o1, what is the meaning of life, the universe and everything?
GPT o1: Give me 7.5million years and I'll get back to you.
They said that it’ll actually soon do things like that. First questions for hours then weeks etc
That Sounds more like a GROK-response
What doth life?
This isnt gpt 💀
From what I tried, its really nice, its a lot les verbose than GPT-4o or even GPT-4. I tested some of my old bugs I gave to GPT-4o previously. The answers o1 gives are more accurate across the board, but also way less flexible. For programming, it gives an ok, still wrong answer, like you arguing ith GPT-4o for 3-4 mesages. But it seems to get lost in its own arguing with itself. Even when I tried nugding it in the right direction it internally evaluated it and just answered that its not possible, even after encouraging it repeatedly.
It seems interesting for 0-shot, but I am unsure how much better it is generally and the long time between answers amkes it a bit frustrating to converse with it. So maybe take it with a grain of salt. It feels to me very similar to 1-3 agent constructs of models, when conversing with those they similarely can have some kind of runaway effect from arguing with themselves.
(I am a machine learning researcher, I normally work on low level kernel implementations and experimental training time architectures, so take my views with a grain of salt)
I think it can be effective for people that doesn't have the knowledge on a given topic and so can't steer the model on the right direction, the model arguing with itself can be a great source of knowledge and more accurate answers for the average Joey
well hopefully thisll be kinks that openai works out until they release their full o1 model. I expect in the future it'll be able to decide if it needs cot or not, making conversations more natural
@@markmuller7962 Yeah, I didnt think about it this way. Methods like this generally remove the risk of big, rough errors in the answer you normally get from the unidirectional decoding in decoder only transformers.
Based on that I still might be better better off with Sonnet for coding snippets. Especially given the hard cap on messages.
So 2 grains of salt right? 🤔
13.4 to 83.3 is a 521% -ish increase, not an increase of 70%
It's an accuracy scale, not score.
If any one wants to ask what the difference is. The accuracy increased by 521% vs the score increased by 70%
shouldve asked o1
Accuracy is scored out of a whole, if they were measuring its solving ability then yes you're right, but it's it's an exam. A predetermined evaluation where the 'intended' result is confined to 100%. If I score 50% and you score 99%, I'm not necessarily 50% as smart and 5000% as dumb, you performed twice as well in a specified area. I can train gpt2 within a day to 100% the test, that doesn't mean my model increased exponentially. It's subjective. If the test weren't static then you could have an argument
If it was the same exact exam that it was tested on previously then you'd be right, but since the new model wasn't tested prior there is no prerequisite, it's a ranking. 1st place isn't double as talented as second place
70 percentage points
You know exactly what he meant but still decided to point out the obvious.
Have you been corrected by your parents/teachers much?
You deserve more followers. With high quality content like this, the channel is going to blow up fast.
The channel is 4 years old 😢😢😢
This channel is not meant to blow up it is too refined and complex for the average AI bro but thats okay, he's doing God's work 🤗🤗
3:45 The missing detail on the x-axis may explain the 30 messages per WEEK limit
LOG SCALE!?! HOLY SHIT, THATS LITERALLY EXPONENTIAL RESOURCE USE! O(2^n) IS CRAZY!
oh I didn't even realized , good catch , you are right .
that's crazy
this itself is probably not gonna be very scalable way of improving LLMs
To be clear - this isn’t a new architecture? Right?
I don’t even see an explicit recognition of an increased base model scale?
This seems a lot like Q* attached to another tuned version of 4o, rather than GPT-5 (which I’m assuming is Orion). The knowledge cutoff of 2023 seems to support this.
definitely Q* attached to 4o. The real Q* on 5 is for the Government only
It's seems more of a CoT or MCTS layer
It feels they just hard coded chain-of-thought-reasoning and sell it as a new model..?
Same thought.
I suppose so. prompting for CoT usually improves model performance, so they improved CoT to make those improved results even more improved.
It's said that the o1's chain of thoughts is not censored by their company policies, which means the model was not brainwashed and lobotomized
No, it seems they worked making the most out of this type of reasoning directly into the training to make it better at it, and while I have no idea how tf they did that the results would seem to speak for themselves.
@@unlomtrash Said by whom?
Without even testing the model and right now is the best explanation about o1
I tried shooting it through some creativity tests, and it ended up being far superior to 4 or 4o. It was good at storywriting, surprisingly didn't fall into the trap where other models would just kinda go "this happened and then this happened and then this happened and then this-". Then, when I asked it to write rap, it actually came up with some pretty intricate rhyme schemes. When I asked it to be comedic, it wasn't as completely over-the-top but still bland as other models, it actually had some sense of subtlety and comedic "timing", caused me to chuckle a few times, though perhaps it was just because I expected it to be utter garbage. Overall, very good. Still nowhere near human creativity, but honestly, I really hope it doesn't quite get to that point. This is good enough, please and thanks.
It feels like they just made the tokensize 1 character, and hardcoded chain of thought (like previously done by other devs). It makes sense that a tokensize of 1 improves maths as you need to look at singular characters in maths
im never hyped about "Open" AI's models. so much for being "Open".
According to the o1 API docs, they have new "reasoning" tokens that are a part of the output, but you can't see them.
I just used it to help me study for my Discrete math. Holy, it’s WAYYY better at understanding, it’s so much better than my professor’s own damn slides and walks me through so well with proofs.
Try that. Previous versions and other LLMs can not solve it at all. (near Phd level problem in Physics) "Solve ANALYTICALLY time-independent Schrodinger equation for harmonic potential using Wolfram Language. (DO NOT use the known solution from the literature.)"
or detailed version
"Write a code in Wolfram Language that find analytical solution to time-independent Schrodinger equation for harmonic potential v(x)=1/2*x^2. Solve that Schrodinger equation using this code. Find the solution (wavefunction and energy of the ground and first excited state) using that code, DO NOT use the known solution from the literature. Solve the equation analytically, not numerically. Start form Schrodinger equation and find the solution (wave functions, energy, quantize conditions, ets) from the Wolfram Language code ONLY! Let's think the solution step by step."
Here's what it spit out :
To solve the time-independent Schrödinger equation analytically for a quantum harmonic oscillator using Wolfram Language, we'll follow these steps:
1. **Set Up the Schrödinger Equation**:
The time-independent Schrödinger equation for a particle of mass \( m \) in a harmonic potential \( V(x) = \frac{1}{2} m \omega^2 x^2 \) is:
\[
-\frac{\hbar^2}{2m} \frac{d^2 \psi(x)}{dx^2} + \frac{1}{2} m \omega^2 x^2 \psi(x) = E \psi(x)
\]
2. **Non-Dimensionalize the Equation**:
Introduce dimensionless variables to simplify the equation:
- Define \( \xi = \sqrt{\frac{m \omega}{\hbar}} x \)
- Define \( \varepsilon = \frac{2E}{\hbar \omega} \)
The equation becomes:
\[
\frac{d^2 \psi(\xi)}{d\xi^2} = (\xi^2 - \varepsilon) \psi(\xi)
\]
3. **Analyze Asymptotic Behavior**:
For large \( \xi \), the equation simplifies to:
\[
\frac{d^2 \psi(\xi)}{d\xi^2} \approx \xi^2 \psi(\xi)
\]
The solutions are:
\[
\psi(\xi) \sim e^{\pm \frac{\xi^2}{2}}
\]
To ensure normalizability, we choose the decaying solution:
\[
\psi(\xi) = e^{-\frac{\xi^2}{2}} \phi(\xi)
\]
4. **Transform the Equation**:
Substitute \( \psi(\xi) = e^{-\frac{\xi^2}{2}} \phi(\xi) \) into the Schrödinger equation:
\[
\phi''(\xi) - 2\xi \phi'(\xi) + (\varepsilon - 1) \phi(\xi) = 0
\]
5. **Use a Power Series Solution**:
Assume \( \phi(\xi) \) can be expressed as a power series:
\[
\phi(\xi) = \sum_{n=0}^\infty a_n \xi^n
\]
6. **Derive the Recurrence Relation**:
Substituting the power series into the transformed equation yields a recurrence relation for the coefficients \( a_n \):
\[
a_{n+2} = \frac{2n - (\varepsilon - 1)}{(n+2)(n+1)} a_n
\]
7. **Determine the Energy Eigenvalues**:
For the series to terminate (ensuring a normalizable solution), the numerator must be zero at some \( n = N \):
\[
2N - (\varepsilon - 1) = 0 \implies \varepsilon = 2N + 1
\]
The quantized energy levels are:
\[
E_N = \left( N + \frac{1}{2}
ight) \hbar \omega
\]
8. **Compute the Coefficients**:
Using the recurrence relation, compute the coefficients \( a_n \) up to \( n = N \).
9. **Construct the Wavefunction**:
The normalized wavefunction is:
\[
\psi_N(\xi) = \frac{e^{-\frac{\xi^2}{2}}}{\sqrt{2^N N! \sqrt{\pi}}} H_N(\xi)
\]
Where \( H_N(\xi) \) is the Hermite polynomial of degree \( N \). However, since we are not to use known solutions, we'll compute \( H_N(\xi) \) using our series expansion.
10. **Implement in Wolfram Language**:
Below is the complete Wolfram Language code that follows the steps above:
```wolfram
(* Set the quantum number N *)
N = 2; (* Change N to compute different energy levels *)
(* Define the dimensionless variable xi *)
xi = \[Xi];
(* Compute epsilon *)
epsilon = 2 N + 1;
(* Initialize the coefficients a_n *)
a[0] = If[EvenQ[N], 1, 0];
a[1] = If[EvenQ[N], 0, 1];
(* Recurrence relation *)
Do[
a[n + 2] = (2 n - (epsilon - 1))/( (n + 2)(n + 1)) a[n],
{n, 0, N - 2}
];
(* Construct the polynomial phi(xi) *)
phi[\[Xi]_] := Sum[a[n] \[Xi]^n, {n, 0, N}];
(* Unnormalized wavefunction *)
psi[\[Xi]_] := Exp[-\[Xi]^2/2] phi[\[Xi]];
(* Normalize the wavefunction *)
norm = Sqrt[Integrate[psi[\[Xi]]^2, {\[Xi], -Infinity, Infinity}]];
psiNormalized[\[Xi]_] := psi[\[Xi]]/norm;
(* Display the coefficients *)
Print["Coefficients a_n:"];
TableForm[Table[{n, a[n]}, {n, 0, N}]]
(* Plot the normalized wavefunction *)
Plot[psiNormalized[\[Xi]], {\[Xi], -5, 5}, PlotLabel -> "Normalized Wavefunction for N = " ToString[N], AxesLabel -> {"\[Xi]", "\[Psi](\[Xi])"}]
```
**Explanation of the Code:**
- **Setting N**: We choose the quantum number \( N \) for which we want to solve the equation.
- **Initializing Coefficients**: Based on the parity of \( N \), we set the initial coefficients \( a_0 \) and \( a_1 \).
- **Recurrence Relation**: We use a `Do` loop to compute the coefficients \( a_n \) up to \( n = N \) using the recurrence relation derived earlier.
- **Constructing phi(xi)**: We define \( \phi(\xi) \) as the sum over the computed coefficients.
- **Wavefunction**: The unnormalized wavefunction \( \psi(\xi) \) is constructed by multiplying \( \phi(\xi) \) with the exponential factor.
- **Normalization**: We compute the normalization constant and define the normalized wavefunction \( \psi_{\text{Normalized}}(\xi) \).
- **Displaying Coefficients**: The coefficients are displayed in a table for reference.
- **Plotting**: Finally, we plot the normalized wavefunction over a range of \( \xi \).
**Example Output for N = 2:**
- **Coefficients**:
| n | \( a_n \) |
|---|----------------|
| 0 | 1 |
| 1 | 0 |
| 2 | \(-1\) |
- **Wavefunction**:
The normalized wavefunction for \( N = 2 \) is plotted, showing the characteristic shape of the second excited state of the harmonic oscillator.
**Notes:**
- The code avoids using known solutions from literature, such as Hermite polynomials, by computing the coefficients directly from the recurrence relation.
- The normalization ensures that the total probability is one.
- You can change the value of \( N \) to compute and plot wavefunctions for different energy levels.
**Conclusion:**
This analytical solution demonstrates how to solve the time-independent Schrödinger equation for a harmonic oscillator by deriving the energy eigenvalues and corresponding wavefunctions using series expansion and recursion, all implemented step-by-step in Wolfram Language.
@@good-gpt2-chatbot thanks! All the steps in the solution are great, but this steps just written (from the literature), and Wolfram Language code did not calculate this steps. For example these: "(* Compute epsilon *)
epsilon = 2 N + 1;
.....
(* Recurrence relation *)
Do[
a[n + 2] = (2 n - (epsilon - 1))/( (n + 2)(n + 1)) a[n],
{n, 0, N - 2}
];"
Sad..
Honestly, this is mildly cool, but a nice sigh of relief for anyone thinking AGI was around the corner. Also for Anthropic who can just CoT Sonnet and replicate this.
If adding a bit of inference time + chain of thought improved gpt-4o this much, it is a very bullish sign for AGI. I’m really excited for gpt-5/6.
@@lilgarbagedisposal9141 you do know langchain has been around for some time right? This is very easy to use, but not some new breakthrough.
Dr. Alan Thompsons conservative AGI chart jumped to 81% from 76%, when it was just normally going up 1% monthly on average. This is a pretty big development, even if it goes back to the 1% monthly gains we could still see AGI in 2026. Might be sooner than that but yeah, lots of stuff to look forward to 👍 🔥
Just one or two more breakthroughs xD 🤣 and I think it should go faster if this thing or their best locked-up version is really at a phd human level in research. They technically would have an infinite number of AI researchers at the same level as their best researchers almost 😅
I'd say 5 more years until we get to AGI
@@phen-themoogle7651 I don't get it. OpenAI made CoT easy for the masses (or rather, those with a subscription), but this is nothing more. Sure it's an easier way to burn $0.25/prompt, but am I missing something?
The main reason they’ve given for chain of thought being private is that they can be internally transparent about any model alignment steps. The internal logic is not clouded by censorship of final model output, just the user facing result
Did you read the announcement?
The explicitly state they haven't retired their GPT-xx line and will continue releasing new iterations of their flagship model.
Great graphics & live perfect annotation on top. Great mix & pace of image changes along with the audio.
Been waiting for someone to implement something like this. I think there is still a lot of room for improvement and the scaling has not even started.
Bro its a log scale wdym it has not even started
The inference "discovery" is huge 😄 Seems, as if everything needs to be in balance, similar to humans.
If that chart they showed is true, then Open AI will go bankrupt with this model. I suspect the inference cost is still a small percentage of the training costs even if it is a lot more.
@@toadlguy You know how many people want to throw money into OpenAI? Were they not valued at like 150 billion pretty recently? Money will not be a problem for them.
@@toadlguythis is why you can only make 30 requests per WEEK
also, I don't think it is. it's just a sketch
I mean this is awesome news, but I assume that since it’s “o1” that this is also an “omnj” or multimodal model?
So I look forward to seeing how visual reasoning has improved
Love the take sir. Made me realise that OpenAI hype does not match its deliverables.
87% PhD brain 30 times a week is a lot of accurate
Got access to both of the models. They’re pretty crazy
Just for context as a CS masters student in USA. There is a coding problem that 4o could not solve but o1 preview solved it
We are fucked
1:10 That's not how percentages work. The increase is 323% from 4o to o1 preview
great work from ClosedAI
I wish they weren't so closed sourced. I really want to read about their RL techniques.
Looks like it'll be strawberry preview, strawberry, orion within a year at the latest. I mean they already seem to have some kind of orion, so under 6 months would probably be not that unrealistic too.
After the election I think.
Anyone else thinks o1 is alike to Reflection-70B but polished to perfection and actually working?
"ioi" is happy excited person raising both hands up!
it's a person crying rivers
I thought of this before and I’ve always wanted to see what would happen if the thinking was done by a way smaller model. The bigger model would then consolidate that into a response. In my mind, it would generate way more tokens and take a fraction of the compute.
These demos are really weak... I tried them with Claude and it aced every one except video game (it worked after one additional prompt) and logic puzzle. Also comparing CoT with pure LLM like GPT-4 doesn't make sense. The most worrying point is the rate limit though because it seems almost impossible to run any benchmark with such tiny number of requests...
new benchmark exploit found...make inference so long nobody can demonstrate it secretly sucks :D
@@WoolyCow exactly. They report that for IOI problems they used 50 submissions per problem so you'd have to spend 2.5 weeks PER PROBLEM without tier 5 API :D
eventually more people will get API access, in which it's not that expensive. same as gpt-4-0314
I think the breakthrough with this is that the yap is under the hood. So one thing you see in more advanced models is that they write out a whole freaking essay when all you need is the little paragraph at the end, that's because obviously, its kind of thinking out loud. And obviously just for the user experience if nothing else, this is cut short from what it *could* be.
But this appears to be the best of both worlds. It'll use a lot of tokens "thinking" about the output before making it and then only output what the user absolutely needs. THEN the final output will only eat up that much of the context window, so further prompts have more context than otherwise.
Nice explanation. Thanks!
Thanks for sharing!
Yeah, text prediction is not thinking, but I'm curios to test it and break it, just for fun :)
A 100%. I think the only people "mindblown" by this are the ones with naturally weak human "CoT" in real life.
Ciao ChristianIce 😄
You are correct, but the slightly worrying realization is that now we have a pretend-thinker that performs better at human tasks than a number of humans...
@@4.0.4 because most humans are really dumb.
text prediction is not thinking, because...?
I'd love to hear why you take such a stance, as at least to me, the concept of "thinking" has lately become harder to describe.
4:25 That's logarithmic scale timeline...
Didn’t catch that, good one! Still impressive but a huge difference from the linear graph I thought it was.
the quality restored!!
Legit thought it was that fire dude but it's not even him 😂
Now GPT-5's point of being smarter to PhD seems more plausible. Throw more compute and energy, then you can have it think for hours to solve complex issues
One problem more thinking time won't solve is when it makes a simple mistake that it can't recognize.
It would just get stuck in a loop being incorrect
Not to mention that this model is probably a fine tuned/modded GPT4 or 4o (and 4o mini?) so an even larger/better model with next gen reasoning might be the real GPT5
@@ccash3290Yeah, but people does the same
@@andreilucasgoncalves1416True but unlike with biological people they're trying to constantly improve AI intelligence
@@andreilucasgoncalves1416btw your comment is only visible by selecting "newest comments" for whatever reason
So isn't it using GFlowNets for chain of thought optimization in the latent space at inference time yet? Cause I wanted to explore that myself since long ago
How cool we have a slow thinking machine again after we had a fast thinking one.
There may still be a gpt-5 with o1 beeing a seperate thing. Also is o1 multimodeal like 4o is also is there a way to pay more to get more messages?
They actually have CoT examples in their blog
As they implemented CoT in LLMs, is plausible to implement Three of thoughts with RL in a similar way?
Maybe, but there is no good reason to do it. You can prove that any iterative process can simulate any graph so assuming tot is the correct way to go, rl will make the model learn it.
How many r’s in “straberry”
It thought about it for 8 seconds and answered 2 😭
That shouldn't have been an example. GPT-4 correctly answers that question already, but 4o and 4o mini don't. There's a meme going around about the strawberry r character count, so I guess it was to address that.
I wanna know how much of these gains are based on training on those benchmark questions specifically
Thanks for being on top of the ships brotha
I'm curious what people that say AGI is still a ways away are expecting.
100k tokens for an answer? The energy requirements will skyrocket. AI is definitely going to plateau if the only way you get better results is by putting more energy into it
New attention is all you need, "Let the model cook"
So is this AlphaZero/LilaChessZero in an LLM form? Up until now current LLMs could only do what is the equivalent of depth 1 inference for the chess bots, if what they say is true, ClosedAI might have figured out a way to do MCTS-style inference for LLMs. Also, keep in mind that the hidden thinking might be actually a sequence of tokens that only the model itself understands...
I was diagnosed by an ai chatbot at work for sick leave. The doctor just signed the document. I feel like these take our jobs, OR their "evolving" might just stop one day, computer limits, government bans etc. Who knows. Only thing I know is I think no one should support this financially, the outcome is unknown, yet possible risks and benefits are known. I would say if there is a risk like ai has, it's not worth it...
The best way to test this kind of model is to ask them to write a python or c# algorithm to convert a Cartesian coordinate command to angular command on a 6 axis robots with the correct inputs and outputs necessary. No model even come close to basic math like that
Can you sketch the approach for that?
@@crackwitz it would have to take into inputs the x,y,z raw pitch yaw command, also it would need to ask for the length of the joints and their location. The answer should be a serie of 6 angular command, one for each axis. Often chat gpt, Claude and other models give you an empty function without any transformation matrix or Jacobian solution. The best case I have seen was with a 3 axis robots. If one day a model is able to do that then this is the model I would use
I tried it out and it's incredible and costy
So that means that ChatGPT can now compete with Devin AI?
Give o1 a few months and it would be huge particularly in science and coding.
What if the usage limit, is actually to limit jailbreaking iterations more than compute capacity.
Can you use it to help your speech ?
Wait! I actually saw somewhere that they plan to update the frontier of gpt series! I don't think they are dropping the GPT series!
At the meeting of the OpenAI company:
"Listen! What if we let ChatGPT think for a longer time? It might give more precise answers."
Me:
Don't you say shit?! If, on a math example, I think for 10 minutes instead of 10 seconds, then of course I will give a more precise answer...
💀💀💀
Congratulations. It can tell you that your request violates content policy 40% faster
Worth every cent.
How much was sunk into it again?
The funny thing is that during thinking the model actually tells you what constraints openai put on it... I guess this makes jailbreaking easier 😂
ClosedAI does it again
42!
After reasoning for 100 years
I tried the preview with this question :How many R's are in the plural of strawberry
It gave me this:There are two letter "R"s in the plural of "strawberry," which is "strawberries."
so basically this is the same as reflectionAI. got it.
Just tried it on Lunar Link AI, pretty crazy stuff.
Of course this comes out the day my subscription ends
The o in o1 stands for Claude
OMG I’m so glad it’s not called GPT-5! Hope everyone with chill a bit with those stupid naming conventions as if everything was a smartphone!
3:46 I hate that figure, they made it look exponential by just using an exponential timescale for no good reason.
Its not even exponential, it's linear with a log(x) scale! Get a 10% boost in output with 10x the cost!
The day you release a video and you don't mention Deegan and Ledoux, we'll know something is up.
Soon I will test it on my masters physics assignments, we will see the "PhD level"
This is cool. But, can’t I just tell the model to talk to itself, or list its chain of thought until it has an answer?
All they did was make it output more tokens, and increase the time it takes to get the answer
True but this model was specifically trained to output these kinds of reasoning tokens that actually help solve problems. Just because you can ask your model to think step by step doesn't mean it will do it productively. The key difference here is that they must have gotten a lot of data of reasoning steps and had a way to measure if the reasoning was good or bad at each step
so can this model solve the famous strawberry problem, namely telling how many "r"s there are in "strawberry"?
the real reflection
ok good but as always many people (specifically the poor or averages) get left out.. we cant use this model.. we need to wait for an open source alternative OR a local one
all I need to know ... can it say there are 3 r in strawberry?
This is not GPT5 it's a model that is literally a year old they've been holding back from the public.
New immortal anatomy. Micro-robot-composite parts. Transfer of conscious to new immortal anatomy. Full-conscious immersion as a functionality of the new immortal anatomy.
AI Agents App 3D Simulations extrapolation - R&D automated about the aforementioned objectives.
SuperAI System.
This should be done in a timely manner; it should've already been done.
They wrote that they will also continue the GPT series
I feel like I live in some kind of a different reality, because I absolutely cannot see this "PhD level reasoning" everyone is raving about. I specialize in computational astronomy and tried throwing some advanced but common problems (such as predicting solar eclipses), and the AI either has no idea what to do, or just gives up and tells me to use an existing solution. For simpler problems it will usually pick the right equations, but still mess up the calculations. From how the things are progressing right now, I feel like we're still many years away until it actually becomes "smart".
Sir is it illegal or legal, recently ChatGPT introduces 01 version and it's solves PhD problems, some ppl generating research paper through it an original research paper with calculations all which are required for a paper nd trying to publish. Is it not ethical ?
@@RYu-b7d why are you asking me this
Years is an overestimation at this point. This seems like a stopgap solution considering how resource inefficient it is just to get something that is marginably correct. It also takes longer. 1 step forward, 2 steps back.
Why choose O as in o1 (maybe for open?) .. i think they should use C as in c1 for corporate .. that's more accurate
one of their dev just confirmed the o in o1 stands for openai
they aim to be less confusing, but now its even more confusing with their multi-modal gpt-4omni
I wanna see Anthropics response to this.
👍🏻👍🏻
o1 being so closed down and un-transparent makes it even more sad that Reflection was fake
open source will catch up very soon. even if Reflection was fake, realistically it should be possible to make reasoner models like o1 within the open-source community, especially considering o1-mini is so good at reasoning despite being "small" (to OpenAI standards, at least)
@@a_soulspark I am optimistic about potential for open source alogrithms, training data and weights, but computing resources are a big deal too. Regular people in the foreseeable future won't have the money or computing resources to run a specialized o1 based model in their homes to get an advantage on the stock market - which some other companies can definitely do with their capital.
i stil haven't got their new voice features from previous keynote, they better release what they showed first and then move into a new model lol
If you write how many 'r's are in strowberry it falls and says 2, as of 9/12/2024. Same with raspberry.
Hmmm looks like soon AI will replace programmers and mathematicians
my chatgpt still says 3.11 is bigger than 3.3
So sad it's just a preview for paying customers, I wish I could test it 😭
I feel like when GPT4 released and I had to wait a long time to use it.
Hello Skynet
they saying that future version will think for fcking weeks???????☠☠☠
thx