Is This GPT-5? OpenAI o1 Full Breakdown

Поделиться
HTML-код
  • Опубликовано: 5 ноя 2024

Комментарии • 414

  • @bycloudAI
    @bycloudAI  Месяц назад +27

    ngl I didn't expect OpenAI's "strawberry" to actually be this cool
    oh and u can join my newsletter here! mail.bycloud.ai
    correction: they are still gonna continue the GPT series, so they now have 2 different model series in development

    • @robertputneydrake
      @robertputneydrake Месяц назад

      btw you got one thing wrong: it's the model that is being used to generate training data for gpt5.

  • @seeibe
    @seeibe Месяц назад +426

    This video released faster than my o1-preview prompt finished generating..

    • @Askejm
      @Askejm Месяц назад +24

      bruh we beastmode this shit. made the entire video in 4.5 hours

    • @mAny_oThERSs
      @mAny_oThERSs Месяц назад +21

      Dude o1 isn't supposed to be used for prompts like "how do i make a sandwich?". Once you come up with a prompt that requires phd level intelligence, you'll be fine with waiting a minute for it to generate an answer thats actually correct unlike the objectively wrong nonsense gpt 4o would likely put out when faced with a question of such difficulty...

    • @JamesBrookT
      @JamesBrookT Месяц назад +2

      Probably was generated by o1-preview

    • @jendabekCZ
      @jendabekCZ Месяц назад

      You don't know if this video also hasn't been generated.

    • @RYu-b7d
      @RYu-b7d Месяц назад

      Sir is it illegal or legal, recently ChatGPT introduces 01 version and it's solves PhD problems, some ppl generating research paper through it an original research paper with calculations all which are required for a paper nd trying to publish. Is it not ethical ?

  • @1cristianBEST
    @1cristianBEST Месяц назад +583

    Imagine letting it think for like a year about curing cancer or something and it just responds "42"

    • @mylittleheartscar
      @mylittleheartscar Месяц назад +11

      Oh my days lol 😂

    • @BackTiVi
      @BackTiVi Месяц назад +57

      Or "Network error. Please try again."

    • @dojjy5798
      @dojjy5798 Месяц назад +25

      OpenAI has the inference logs, so they can know the thought process behind the answer '42'.
      Either 42 is actually the answer, then we get to know why.
      Or the model is just bs-ing and OpenAI wasted time.

    • @magickitty5475
      @magickitty5475 Месяц назад +5

      being the 42nd like made my day

    • @PuneetChhikara
      @PuneetChhikara Месяц назад +7

      This has 42 likes, and I'm not touching that like button

  • @TheTriforcemaster01
    @TheTriforcemaster01 Месяц назад +151

    I just tested it on some pretty difficult multivariate calculas and measure theory proofs and it, more or less, got of all them correct. So it always got the correct solution. My sample size was only like 10 questions, but still much better in this regard than any model we have seem this far.

    • @mAny_oThERSs
      @mAny_oThERSs Месяц назад +5

      What was the performance like compared to gpt4o, if you already compared them? Also im not that high level on math. What would you rank those questions on difficulty on a scale of college student to phd level from 1-10?

    • @SliverHell
      @SliverHell Месяц назад

      Test it on Gemini 1.5 Pro. It has the highest in Math Benchmark on LiveBenchmark

    • @Danuxsy
      @Danuxsy Месяц назад

      GPTo1 is proof that LLM's are dead, the same fundamental problems persist and has done since the first model came out many years ago. They can never become AGI and it is very clear if you try it. Ask the same question and it will sometime be correct and sometimes wrong... this is the reason it can never become an AGI and OpenAI cannot solve this problem, it probably require the machine to be conscious like homo sapiens are. We are conscious FIRST and then we are able to learn about the world, not the other way around.

    • @Weirdgeek83
      @Weirdgeek83 Месяц назад +9

      It seems it might actually be at PhD level if that's true

    • @lilgarbagedisposal9141
      @lilgarbagedisposal9141 Месяц назад +2

      Did you notice any patterns in the ones it got wrong?

  • @nineflames2863
    @nineflames2863 Месяц назад +84

    End of OpenAI's blog post: "We also plan to continue developing and releasing models in our GPT series, in addition to the new OpenAI o1 series."
    So no, this is not GPT-5, nor are they abandoning that naming scheme. It's a new series of models specifically focusing on reasoning and coding capabilities, running in parallel to the more generalist GPT series of models.

    • @dericksozo1135
      @dericksozo1135 Месяц назад

      Good catch

    • @aeonsleo2
      @aeonsleo2 Месяц назад +6

      Yes, OpenAI needs to build models for different niche. A well rounded coding model is absolutely needed by developers.

    • @FasutonemuMyoji
      @FasutonemuMyoji Месяц назад +2

      @@aeonsleo2 What would actually be powerful is an actually powerful AI code writer that can understand all the multi-field science gobbledygook, since there are few humans who have 40+ years experience at mastering both, and both professions are reputationally notoriously bad at communicating needs.

    • @drgoldenpants
      @drgoldenpants Месяц назад

      Currently useless for coding if you can only use it 30 times a week, that's like 4 times a day

    • @PokeRuto898
      @PokeRuto898 Месяц назад

      ​@@aeonsleo2 modern AI models like GPT-4 already work like that. Search about "AI moe"

  • @Remystia
    @Remystia Месяц назад +51

    Me: GPT o1, what is the meaning of life, the universe and everything?
    GPT o1: Give me 7.5million years and I'll get back to you.

    • @Sheeshening
      @Sheeshening Месяц назад +4

      They said that it’ll actually soon do things like that. First questions for hours then weeks etc

    • @MindcraftMC
      @MindcraftMC Месяц назад +2

      That Sounds more like a GROK-response

    • @AKU666
      @AKU666 Месяц назад

      What doth life?

    • @TheBoeingCompany-h9z
      @TheBoeingCompany-h9z 21 день назад

      This isnt gpt 💀

  • @darlokt51
    @darlokt51 Месяц назад +96

    From what I tried, its really nice, its a lot les verbose than GPT-4o or even GPT-4. I tested some of my old bugs I gave to GPT-4o previously. The answers o1 gives are more accurate across the board, but also way less flexible. For programming, it gives an ok, still wrong answer, like you arguing ith GPT-4o for 3-4 mesages. But it seems to get lost in its own arguing with itself. Even when I tried nugding it in the right direction it internally evaluated it and just answered that its not possible, even after encouraging it repeatedly.
    It seems interesting for 0-shot, but I am unsure how much better it is generally and the long time between answers amkes it a bit frustrating to converse with it. So maybe take it with a grain of salt. It feels to me very similar to 1-3 agent constructs of models, when conversing with those they similarely can have some kind of runaway effect from arguing with themselves.
    (I am a machine learning researcher, I normally work on low level kernel implementations and experimental training time architectures, so take my views with a grain of salt)

    • @markmuller7962
      @markmuller7962 Месяц назад +9

      I think it can be effective for people that doesn't have the knowledge on a given topic and so can't steer the model on the right direction, the model arguing with itself can be a great source of knowledge and more accurate answers for the average Joey

    • @Askejm
      @Askejm Месяц назад

      well hopefully thisll be kinks that openai works out until they release their full o1 model. I expect in the future it'll be able to decide if it needs cot or not, making conversations more natural

    • @darlokt51
      @darlokt51 Месяц назад +6

      @@markmuller7962 Yeah, I didnt think about it this way. Methods like this generally remove the risk of big, rough errors in the answer you normally get from the unidirectional decoding in decoder only transformers.

    • @denisquarte7177
      @denisquarte7177 Месяц назад +1

      Based on that I still might be better better off with Sonnet for coding snippets. Especially given the hard cap on messages.

    • @PurpleBird-mh7vb
      @PurpleBird-mh7vb Месяц назад +3

      So 2 grains of salt right? 🤔

  • @rakly3473
    @rakly3473 Месяц назад +142

    13.4 to 83.3 is a 521% -ish increase, not an increase of 70%
    It's an accuracy scale, not score.
    If any one wants to ask what the difference is. The accuracy increased by 521% vs the score increased by 70%

    • @Askejm
      @Askejm Месяц назад +13

      shouldve asked o1

    • @knawdawg
      @knawdawg Месяц назад +4

      Accuracy is scored out of a whole, if they were measuring its solving ability then yes you're right, but it's it's an exam. A predetermined evaluation where the 'intended' result is confined to 100%. If I score 50% and you score 99%, I'm not necessarily 50% as smart and 5000% as dumb, you performed twice as well in a specified area. I can train gpt2 within a day to 100% the test, that doesn't mean my model increased exponentially. It's subjective. If the test weren't static then you could have an argument

    • @knawdawg
      @knawdawg Месяц назад

      If it was the same exact exam that it was tested on previously then you'd be right, but since the new model wasn't tested prior there is no prerequisite, it's a ranking. 1st place isn't double as talented as second place

    • @ryzikx
      @ryzikx Месяц назад +18

      70 percentage points

    • @freeottis
      @freeottis Месяц назад +1

      You know exactly what he meant but still decided to point out the obvious.
      Have you been corrected by your parents/teachers much?

  • @matiasbadino2925
    @matiasbadino2925 Месяц назад +33

    You deserve more followers. With high quality content like this, the channel is going to blow up fast.

    • @ritpop
      @ritpop Месяц назад

      The channel is 4 years old 😢😢😢

    • @setop123
      @setop123 Месяц назад +1

      This channel is not meant to blow up it is too refined and complex for the average AI bro but thats okay, he's doing God's work 🤗🤗

  • @0xdeadbeef444
    @0xdeadbeef444 Месяц назад +40

    3:45 The missing detail on the x-axis may explain the 30 messages per WEEK limit

    • @adissentingopinion848
      @adissentingopinion848 Месяц назад +9

      LOG SCALE!?! HOLY SHIT, THATS LITERALLY EXPONENTIAL RESOURCE USE! O(2^n) IS CRAZY!

    • @que3no085
      @que3no085 Месяц назад +3

      oh I didn't even realized , good catch , you are right .
      that's crazy
      this itself is probably not gonna be very scalable way of improving LLMs

  • @libertyafterdark6439
    @libertyafterdark6439 Месяц назад +31

    To be clear - this isn’t a new architecture? Right?
    I don’t even see an explicit recognition of an increased base model scale?
    This seems a lot like Q* attached to another tuned version of 4o, rather than GPT-5 (which I’m assuming is Orion). The knowledge cutoff of 2023 seems to support this.

    • @etc834
      @etc834 Месяц назад

      definitely Q* attached to 4o. The real Q* on 5 is for the Government only

    • @cybertruck2008
      @cybertruck2008 Месяц назад +1

      It's seems more of a CoT or MCTS layer

  • @socaity-ai
    @socaity-ai Месяц назад +60

    It feels they just hard coded chain-of-thought-reasoning and sell it as a new model..?

    • @ayoubfr8660
      @ayoubfr8660 Месяц назад +14

      Same thought.

    • @a_soulspark
      @a_soulspark Месяц назад +7

      I suppose so. prompting for CoT usually improves model performance, so they improved CoT to make those improved results even more improved.

    • @unlomtrash
      @unlomtrash Месяц назад +14

      It's said that the o1's chain of thoughts is not censored by their company policies, which means the model was not brainwashed and lobotomized

    • @alansmithee419
      @alansmithee419 Месяц назад +6

      No, it seems they worked making the most out of this type of reasoning directly into the training to make it better at it, and while I have no idea how tf they did that the results would seem to speak for themselves.

    • @alansmithee419
      @alansmithee419 Месяц назад +1

      @@unlomtrash Said by whom?

  • @andrescmarin
    @andrescmarin Месяц назад +2

    Without even testing the model and right now is the best explanation about o1

  • @casual_speedrunner1482
    @casual_speedrunner1482 Месяц назад +4

    I tried shooting it through some creativity tests, and it ended up being far superior to 4 or 4o. It was good at storywriting, surprisingly didn't fall into the trap where other models would just kinda go "this happened and then this happened and then this happened and then this-". Then, when I asked it to write rap, it actually came up with some pretty intricate rhyme schemes. When I asked it to be comedic, it wasn't as completely over-the-top but still bland as other models, it actually had some sense of subtlety and comedic "timing", caused me to chuckle a few times, though perhaps it was just because I expected it to be utter garbage. Overall, very good. Still nowhere near human creativity, but honestly, I really hope it doesn't quite get to that point. This is good enough, please and thanks.

  • @youtube_gaming
    @youtube_gaming Месяц назад +4

    It feels like they just made the tokensize 1 character, and hardcoded chain of thought (like previously done by other devs). It makes sense that a tokensize of 1 improves maths as you need to look at singular characters in maths

  • @thegameboyshow
    @thegameboyshow Месяц назад +7

    im never hyped about "Open" AI's models. so much for being "Open".

    • @dericksozo1135
      @dericksozo1135 Месяц назад +1

      According to the o1 API docs, they have new "reasoning" tokens that are a part of the output, but you can't see them.

  • @mta1567
    @mta1567 Месяц назад +2

    I just used it to help me study for my Discrete math. Holy, it’s WAYYY better at understanding, it’s so much better than my professor’s own damn slides and walks me through so well with proofs.

  • @ИванИванов-л9щ6ч
    @ИванИванов-л9щ6ч Месяц назад +8

    Try that. Previous versions and other LLMs can not solve it at all. (near Phd level problem in Physics) "Solve ANALYTICALLY time-independent Schrodinger equation for harmonic potential using Wolfram Language. (DO NOT use the known solution from the literature.)"
    or detailed version
    "Write a code in Wolfram Language that find analytical solution to time-independent Schrodinger equation for harmonic potential v(x)=1/2*x^2. Solve that Schrodinger equation using this code. Find the solution (wavefunction and energy of the ground and first excited state) using that code, DO NOT use the known solution from the literature. Solve the equation analytically, not numerically. Start form Schrodinger equation and find the solution (wave functions, energy, quantize conditions, ets) from the Wolfram Language code ONLY! Let's think the solution step by step."

    • @good-gpt2-chatbot
      @good-gpt2-chatbot Месяц назад +4

      Here's what it spit out :
      To solve the time-independent Schrödinger equation analytically for a quantum harmonic oscillator using Wolfram Language, we'll follow these steps:
      1. **Set Up the Schrödinger Equation**:
      The time-independent Schrödinger equation for a particle of mass \( m \) in a harmonic potential \( V(x) = \frac{1}{2} m \omega^2 x^2 \) is:
      \[
      -\frac{\hbar^2}{2m} \frac{d^2 \psi(x)}{dx^2} + \frac{1}{2} m \omega^2 x^2 \psi(x) = E \psi(x)
      \]
      2. **Non-Dimensionalize the Equation**:
      Introduce dimensionless variables to simplify the equation:
      - Define \( \xi = \sqrt{\frac{m \omega}{\hbar}} x \)
      - Define \( \varepsilon = \frac{2E}{\hbar \omega} \)
      The equation becomes:
      \[
      \frac{d^2 \psi(\xi)}{d\xi^2} = (\xi^2 - \varepsilon) \psi(\xi)
      \]
      3. **Analyze Asymptotic Behavior**:
      For large \( \xi \), the equation simplifies to:
      \[
      \frac{d^2 \psi(\xi)}{d\xi^2} \approx \xi^2 \psi(\xi)
      \]
      The solutions are:
      \[
      \psi(\xi) \sim e^{\pm \frac{\xi^2}{2}}
      \]
      To ensure normalizability, we choose the decaying solution:
      \[
      \psi(\xi) = e^{-\frac{\xi^2}{2}} \phi(\xi)
      \]
      4. **Transform the Equation**:
      Substitute \( \psi(\xi) = e^{-\frac{\xi^2}{2}} \phi(\xi) \) into the Schrödinger equation:
      \[
      \phi''(\xi) - 2\xi \phi'(\xi) + (\varepsilon - 1) \phi(\xi) = 0
      \]
      5. **Use a Power Series Solution**:
      Assume \( \phi(\xi) \) can be expressed as a power series:
      \[
      \phi(\xi) = \sum_{n=0}^\infty a_n \xi^n
      \]
      6. **Derive the Recurrence Relation**:
      Substituting the power series into the transformed equation yields a recurrence relation for the coefficients \( a_n \):
      \[
      a_{n+2} = \frac{2n - (\varepsilon - 1)}{(n+2)(n+1)} a_n
      \]
      7. **Determine the Energy Eigenvalues**:
      For the series to terminate (ensuring a normalizable solution), the numerator must be zero at some \( n = N \):
      \[
      2N - (\varepsilon - 1) = 0 \implies \varepsilon = 2N + 1
      \]
      The quantized energy levels are:
      \[
      E_N = \left( N + \frac{1}{2}
      ight) \hbar \omega
      \]
      8. **Compute the Coefficients**:
      Using the recurrence relation, compute the coefficients \( a_n \) up to \( n = N \).
      9. **Construct the Wavefunction**:
      The normalized wavefunction is:
      \[
      \psi_N(\xi) = \frac{e^{-\frac{\xi^2}{2}}}{\sqrt{2^N N! \sqrt{\pi}}} H_N(\xi)
      \]
      Where \( H_N(\xi) \) is the Hermite polynomial of degree \( N \). However, since we are not to use known solutions, we'll compute \( H_N(\xi) \) using our series expansion.
      10. **Implement in Wolfram Language**:
      Below is the complete Wolfram Language code that follows the steps above:
      ```wolfram
      (* Set the quantum number N *)
      N = 2; (* Change N to compute different energy levels *)
      (* Define the dimensionless variable xi *)
      xi = \[Xi];
      (* Compute epsilon *)
      epsilon = 2 N + 1;
      (* Initialize the coefficients a_n *)
      a[0] = If[EvenQ[N], 1, 0];
      a[1] = If[EvenQ[N], 0, 1];
      (* Recurrence relation *)
      Do[
      a[n + 2] = (2 n - (epsilon - 1))/( (n + 2)(n + 1)) a[n],
      {n, 0, N - 2}
      ];
      (* Construct the polynomial phi(xi) *)
      phi[\[Xi]_] := Sum[a[n] \[Xi]^n, {n, 0, N}];
      (* Unnormalized wavefunction *)
      psi[\[Xi]_] := Exp[-\[Xi]^2/2] phi[\[Xi]];
      (* Normalize the wavefunction *)
      norm = Sqrt[Integrate[psi[\[Xi]]^2, {\[Xi], -Infinity, Infinity}]];
      psiNormalized[\[Xi]_] := psi[\[Xi]]/norm;
      (* Display the coefficients *)
      Print["Coefficients a_n:"];
      TableForm[Table[{n, a[n]}, {n, 0, N}]]
      (* Plot the normalized wavefunction *)
      Plot[psiNormalized[\[Xi]], {\[Xi], -5, 5}, PlotLabel -> "Normalized Wavefunction for N = " ToString[N], AxesLabel -> {"\[Xi]", "\[Psi](\[Xi])"}]
      ```
      **Explanation of the Code:**
      - **Setting N**: We choose the quantum number \( N \) for which we want to solve the equation.
      - **Initializing Coefficients**: Based on the parity of \( N \), we set the initial coefficients \( a_0 \) and \( a_1 \).
      - **Recurrence Relation**: We use a `Do` loop to compute the coefficients \( a_n \) up to \( n = N \) using the recurrence relation derived earlier.
      - **Constructing phi(xi)**: We define \( \phi(\xi) \) as the sum over the computed coefficients.
      - **Wavefunction**: The unnormalized wavefunction \( \psi(\xi) \) is constructed by multiplying \( \phi(\xi) \) with the exponential factor.
      - **Normalization**: We compute the normalization constant and define the normalized wavefunction \( \psi_{\text{Normalized}}(\xi) \).
      - **Displaying Coefficients**: The coefficients are displayed in a table for reference.
      - **Plotting**: Finally, we plot the normalized wavefunction over a range of \( \xi \).
      **Example Output for N = 2:**
      - **Coefficients**:
      | n | \( a_n \) |
      |---|----------------|
      | 0 | 1 |
      | 1 | 0 |
      | 2 | \(-1\) |
      - **Wavefunction**:
      The normalized wavefunction for \( N = 2 \) is plotted, showing the characteristic shape of the second excited state of the harmonic oscillator.
      **Notes:**
      - The code avoids using known solutions from literature, such as Hermite polynomials, by computing the coefficients directly from the recurrence relation.
      - The normalization ensures that the total probability is one.
      - You can change the value of \( N \) to compute and plot wavefunctions for different energy levels.
      **Conclusion:**
      This analytical solution demonstrates how to solve the time-independent Schrödinger equation for a harmonic oscillator by deriving the energy eigenvalues and corresponding wavefunctions using series expansion and recursion, all implemented step-by-step in Wolfram Language.

    • @ИванИванов-л9щ6ч
      @ИванИванов-л9щ6ч Месяц назад +2

      @@good-gpt2-chatbot thanks! All the steps in the solution are great, but this steps just written (from the literature), and Wolfram Language code did not calculate this steps. For example these: "(* Compute epsilon *)
      epsilon = 2 N + 1;
      .....
      (* Recurrence relation *)
      Do[
      a[n + 2] = (2 n - (epsilon - 1))/( (n + 2)(n + 1)) a[n],
      {n, 0, N - 2}
      ];"
      Sad..

  • @4.0.4
    @4.0.4 Месяц назад +9

    Honestly, this is mildly cool, but a nice sigh of relief for anyone thinking AGI was around the corner. Also for Anthropic who can just CoT Sonnet and replicate this.

    • @lilgarbagedisposal9141
      @lilgarbagedisposal9141 Месяц назад +2

      If adding a bit of inference time + chain of thought improved gpt-4o this much, it is a very bullish sign for AGI. I’m really excited for gpt-5/6.

    • @4.0.4
      @4.0.4 Месяц назад

      @@lilgarbagedisposal9141 you do know langchain has been around for some time right? This is very easy to use, but not some new breakthrough.

    • @phen-themoogle7651
      @phen-themoogle7651 Месяц назад +2

      Dr. Alan Thompsons conservative AGI chart jumped to 81% from 76%, when it was just normally going up 1% monthly on average. This is a pretty big development, even if it goes back to the 1% monthly gains we could still see AGI in 2026. Might be sooner than that but yeah, lots of stuff to look forward to 👍 🔥
      Just one or two more breakthroughs xD 🤣 and I think it should go faster if this thing or their best locked-up version is really at a phd human level in research. They technically would have an infinite number of AI researchers at the same level as their best researchers almost 😅

    • @SNP2082
      @SNP2082 Месяц назад +1

      I'd say 5 more years until we get to AGI

    • @4.0.4
      @4.0.4 Месяц назад

      @@phen-themoogle7651 I don't get it. OpenAI made CoT easy for the masses (or rather, those with a subscription), but this is nothing more. Sure it's an easier way to burn $0.25/prompt, but am I missing something?

  • @maboesanman
    @maboesanman Месяц назад +1

    The main reason they’ve given for chain of thought being private is that they can be internally transparent about any model alignment steps. The internal logic is not clouded by censorship of final model output, just the user facing result

  • @guanxinated
    @guanxinated Месяц назад +4

    Did you read the announcement?
    The explicitly state they haven't retired their GPT-xx line and will continue releasing new iterations of their flagship model.

  • @ScriptureFirst
    @ScriptureFirst Месяц назад

    Great graphics & live perfect annotation on top. Great mix & pace of image changes along with the audio.

  • @user-pt1kj5uw3b
    @user-pt1kj5uw3b Месяц назад +6

    Been waiting for someone to implement something like this. I think there is still a lot of room for improvement and the scaling has not even started.

    • @RomeTWguy
      @RomeTWguy Месяц назад

      Bro its a log scale wdym it has not even started

  • @RedOneM
    @RedOneM Месяц назад +16

    The inference "discovery" is huge 😄 Seems, as if everything needs to be in balance, similar to humans.

    • @toadlguy
      @toadlguy Месяц назад +3

      If that chart they showed is true, then Open AI will go bankrupt with this model. I suspect the inference cost is still a small percentage of the training costs even if it is a lot more.

    • @lilgarbagedisposal9141
      @lilgarbagedisposal9141 Месяц назад +3

      @@toadlguy You know how many people want to throw money into OpenAI? Were they not valued at like 150 billion pretty recently? Money will not be a problem for them.

    • @Askejm
      @Askejm Месяц назад +1

      @@toadlguythis is why you can only make 30 requests per WEEK
      also, I don't think it is. it's just a sketch

  • @witext
    @witext Месяц назад +5

    I mean this is awesome news, but I assume that since it’s “o1” that this is also an “omnj” or multimodal model?
    So I look forward to seeing how visual reasoning has improved

  • @KuyaAndreV
    @KuyaAndreV Месяц назад

    Love the take sir. Made me realise that OpenAI hype does not match its deliverables.

  • @MrValgard
    @MrValgard Месяц назад +8

    87% PhD brain 30 times a week is a lot of accurate

  • @leeishere7448
    @leeishere7448 26 дней назад

    Got access to both of the models. They’re pretty crazy

  • @laughoutmeow
    @laughoutmeow Месяц назад +1

    Just for context as a CS masters student in USA. There is a coding problem that 4o could not solve but o1 preview solved it
    We are fucked

  • @IIlllllIIIllllIIIlIllI
    @IIlllllIIIllllIIIlIllI Месяц назад +2

    1:10 That's not how percentages work. The increase is 323% from 4o to o1 preview

  • @krishp1104
    @krishp1104 Месяц назад +5

    great work from ClosedAI

  • @user-pt1kj5uw3b
    @user-pt1kj5uw3b Месяц назад +3

    I wish they weren't so closed sourced. I really want to read about their RL techniques.

  • @mAny_oThERSs
    @mAny_oThERSs Месяц назад +2

    Looks like it'll be strawberry preview, strawberry, orion within a year at the latest. I mean they already seem to have some kind of orion, so under 6 months would probably be not that unrealistic too.

  • @jonathanozik5442
    @jonathanozik5442 Месяц назад

    Anyone else thinks o1 is alike to Reflection-70B but polished to perfection and actually working?

  • @GrzegorzWierzowiecki
    @GrzegorzWierzowiecki Месяц назад

    "ioi" is happy excited person raising both hands up!

  • @smellthel
    @smellthel Месяц назад

    I thought of this before and I’ve always wanted to see what would happen if the thinking was done by a way smaller model. The bigger model would then consolidate that into a response. In my mind, it would generate way more tokens and take a fraction of the compute.

  • @jakubbartczuk3956
    @jakubbartczuk3956 Месяц назад +20

    These demos are really weak... I tried them with Claude and it aced every one except video game (it worked after one additional prompt) and logic puzzle. Also comparing CoT with pure LLM like GPT-4 doesn't make sense. The most worrying point is the rate limit though because it seems almost impossible to run any benchmark with such tiny number of requests...

    • @WoolyCow
      @WoolyCow Месяц назад +7

      new benchmark exploit found...make inference so long nobody can demonstrate it secretly sucks :D

    • @jakubbartczuk3956
      @jakubbartczuk3956 Месяц назад +1

      @@WoolyCow exactly. They report that for IOI problems they used 50 submissions per problem so you'd have to spend 2.5 weeks PER PROBLEM without tier 5 API :D

    • @Askejm
      @Askejm Месяц назад

      eventually more people will get API access, in which it's not that expensive. same as gpt-4-0314

  • @Yipper64
    @Yipper64 Месяц назад

    I think the breakthrough with this is that the yap is under the hood. So one thing you see in more advanced models is that they write out a whole freaking essay when all you need is the little paragraph at the end, that's because obviously, its kind of thinking out loud. And obviously just for the user experience if nothing else, this is cut short from what it *could* be.
    But this appears to be the best of both worlds. It'll use a lot of tokens "thinking" about the output before making it and then only output what the user absolutely needs. THEN the final output will only eat up that much of the context window, so further prompts have more context than otherwise.

  • @dericksozo1135
    @dericksozo1135 Месяц назад

    Nice explanation. Thanks!

  • @tahirk9
    @tahirk9 Месяц назад

    Thanks for sharing!

  • @ChristianIce
    @ChristianIce Месяц назад +8

    Yeah, text prediction is not thinking, but I'm curios to test it and break it, just for fun :)

    • @ayoubfr8660
      @ayoubfr8660 Месяц назад +3

      A 100%. I think the only people "mindblown" by this are the ones with naturally weak human "CoT" in real life.

    • @markmuller7962
      @markmuller7962 Месяц назад +1

      Ciao ChristianIce 😄

    • @4.0.4
      @4.0.4 Месяц назад +5

      You are correct, but the slightly worrying realization is that now we have a pretend-thinker that performs better at human tasks than a number of humans...

    • @ayoubfr8660
      @ayoubfr8660 Месяц назад

      @@4.0.4 because most humans are really dumb.

    • @a_soulspark
      @a_soulspark Месяц назад +4

      text prediction is not thinking, because...?
      I'd love to hear why you take such a stance, as at least to me, the concept of "thinking" has lately become harder to describe.

  • @Sedokun
    @Sedokun Месяц назад +6

    4:25 That's logarithmic scale timeline...

    • @Milark
      @Milark Месяц назад +1

      Didn’t catch that, good one! Still impressive but a huge difference from the linear graph I thought it was.

  • @cdkw2
    @cdkw2 Месяц назад

    the quality restored!!

  • @GhostlyOnion
    @GhostlyOnion Месяц назад +3

    Legit thought it was that fire dude but it's not even him 😂

  • @mehdihassan8316
    @mehdihassan8316 Месяц назад +18

    Now GPT-5's point of being smarter to PhD seems more plausible. Throw more compute and energy, then you can have it think for hours to solve complex issues

    • @ccash3290
      @ccash3290 Месяц назад +7

      One problem more thinking time won't solve is when it makes a simple mistake that it can't recognize.
      It would just get stuck in a loop being incorrect

    • @markmuller7962
      @markmuller7962 Месяц назад +1

      Not to mention that this model is probably a fine tuned/modded GPT4 or 4o (and 4o mini?) so an even larger/better model with next gen reasoning might be the real GPT5

    • @andreilucasgoncalves1416
      @andreilucasgoncalves1416 Месяц назад

      ​@@ccash3290Yeah, but people does the same

    • @markmuller7962
      @markmuller7962 Месяц назад +2

      ​@@andreilucasgoncalves1416True but unlike with biological people they're trying to constantly improve AI intelligence

    • @markmuller7962
      @markmuller7962 Месяц назад +1

      ​@@andreilucasgoncalves1416btw your comment is only visible by selecting "newest comments" for whatever reason

  • @_XoR_
    @_XoR_ Месяц назад

    So isn't it using GFlowNets for chain of thought optimization in the latent space at inference time yet? Cause I wanted to explore that myself since long ago

  • @ilyab.5127
    @ilyab.5127 Месяц назад

    How cool we have a slow thinking machine again after we had a fast thinking one.

  • @KeinNiemand
    @KeinNiemand Месяц назад

    There may still be a gpt-5 with o1 beeing a seperate thing. Also is o1 multimodeal like 4o is also is there a way to pay more to get more messages?

  • @mrtats6590
    @mrtats6590 Месяц назад +1

    They actually have CoT examples in their blog

  • @ivo22mun
    @ivo22mun Месяц назад +2

    As they implemented CoT in LLMs, is plausible to implement Three of thoughts with RL in a similar way?

    • @keypey8256
      @keypey8256 Месяц назад +3

      Maybe, but there is no good reason to do it. You can prove that any iterative process can simulate any graph so assuming tot is the correct way to go, rl will make the model learn it.

  • @SeoFernando
    @SeoFernando Месяц назад +3

    How many r’s in “straberry”
    It thought about it for 8 seconds and answered 2 😭

    • @dericksozo1135
      @dericksozo1135 Месяц назад

      That shouldn't have been an example. GPT-4 correctly answers that question already, but 4o and 4o mini don't. There's a meme going around about the strawberry r character count, so I guess it was to address that.

  • @KayButtonJay
    @KayButtonJay Месяц назад

    I wanna know how much of these gains are based on training on those benchmark questions specifically

  • @home1250
    @home1250 Месяц назад +1

    Thanks for being on top of the ships brotha

  • @lowmax4431
    @lowmax4431 Месяц назад

    I'm curious what people that say AGI is still a ways away are expecting.

  • @zellator
    @zellator Месяц назад +1

    100k tokens for an answer? The energy requirements will skyrocket. AI is definitely going to plateau if the only way you get better results is by putting more energy into it

  • @ritpop
    @ritpop Месяц назад

    New attention is all you need, "Let the model cook"

  • @vladimirtchuiev2218
    @vladimirtchuiev2218 Месяц назад

    So is this AlphaZero/LilaChessZero in an LLM form? Up until now current LLMs could only do what is the equivalent of depth 1 inference for the chess bots, if what they say is true, ClosedAI might have figured out a way to do MCTS-style inference for LLMs. Also, keep in mind that the hidden thinking might be actually a sequence of tokens that only the model itself understands...

  • @Lee_-Wzonsss5222-ky2hd
    @Lee_-Wzonsss5222-ky2hd Месяц назад

    I was diagnosed by an ai chatbot at work for sick leave. The doctor just signed the document. I feel like these take our jobs, OR their "evolving" might just stop one day, computer limits, government bans etc. Who knows. Only thing I know is I think no one should support this financially, the outcome is unknown, yet possible risks and benefits are known. I would say if there is a risk like ai has, it's not worth it...

  • @dragonmares59110
    @dragonmares59110 Месяц назад +4

    The best way to test this kind of model is to ask them to write a python or c# algorithm to convert a Cartesian coordinate command to angular command on a 6 axis robots with the correct inputs and outputs necessary. No model even come close to basic math like that

    • @crackwitz
      @crackwitz Месяц назад

      Can you sketch the approach for that?

    • @dragonmares59110
      @dragonmares59110 Месяц назад

      @@crackwitz it would have to take into inputs the x,y,z raw pitch yaw command, also it would need to ask for the length of the joints and their location. The answer should be a serie of 6 angular command, one for each axis. Often chat gpt, Claude and other models give you an empty function without any transformation matrix or Jacobian solution. The best case I have seen was with a 3 axis robots. If one day a model is able to do that then this is the model I would use

  • @MoiCestVinci
    @MoiCestVinci Месяц назад

    I tried it out and it's incredible and costy

  • @mvd_01
    @mvd_01 Месяц назад +1

    So that means that ChatGPT can now compete with Devin AI?

  • @aeonsleo2
    @aeonsleo2 Месяц назад

    Give o1 a few months and it would be huge particularly in science and coding.

  • @KuyaAndreV
    @KuyaAndreV Месяц назад

    What if the usage limit, is actually to limit jailbreaking iterations more than compute capacity.

  • @TheGuitarist36
    @TheGuitarist36 Месяц назад

    Can you use it to help your speech ?

  • @MrLeo000
    @MrLeo000 Месяц назад +1

    Wait! I actually saw somewhere that they plan to update the frontier of gpt series! I don't think they are dropping the GPT series!

  • @IceGamer7810
    @IceGamer7810 Месяц назад

    At the meeting of the OpenAI company:
    "Listen! What if we let ChatGPT think for a longer time? It might give more precise answers."
    Me:
    Don't you say shit?! If, on a math example, I think for 10 minutes instead of 10 seconds, then of course I will give a more precise answer...
    💀💀💀

  • @themalcontent100
    @themalcontent100 Месяц назад

    Congratulations. It can tell you that your request violates content policy 40% faster
    Worth every cent.
    How much was sunk into it again?

  • @Tracing0029
    @Tracing0029 Месяц назад

    The funny thing is that during thinking the model actually tells you what constraints openai put on it... I guess this makes jailbreaking easier 😂

  • @yagoa
    @yagoa Месяц назад

    ClosedAI does it again

  • @joao2011770
    @joao2011770 Месяц назад

    42!
    After reasoning for 100 years

  • @AhmEd-ft4fz
    @AhmEd-ft4fz Месяц назад

    I tried the preview with this question :How many R's are in the plural of strawberry
    It gave me this:There are two letter "R"s in the plural of "strawberry," which is "strawberries."

  • @GraveUypo
    @GraveUypo Месяц назад +1

    so basically this is the same as reflectionAI. got it.

  • @yaoke259
    @yaoke259 Месяц назад

    Just tried it on Lunar Link AI, pretty crazy stuff.

  • @Person5385-w4s
    @Person5385-w4s Месяц назад

    Of course this comes out the day my subscription ends

  • @Gilotopia
    @Gilotopia Месяц назад

    The o in o1 stands for Claude

  • @jeanchindeko5477
    @jeanchindeko5477 Месяц назад +1

    OMG I’m so glad it’s not called GPT-5! Hope everyone with chill a bit with those stupid naming conventions as if everything was a smartphone!

  • @PhilfreezeCH
    @PhilfreezeCH Месяц назад +3

    3:46 I hate that figure, they made it look exponential by just using an exponential timescale for no good reason.

    • @adissentingopinion848
      @adissentingopinion848 Месяц назад +1

      Its not even exponential, it's linear with a log(x) scale! Get a 10% boost in output with 10x the cost!

  • @simeonnnnn
    @simeonnnnn Месяц назад

    The day you release a video and you don't mention Deegan and Ledoux, we'll know something is up.

  • @igorsawicki4905
    @igorsawicki4905 Месяц назад

    Soon I will test it on my masters physics assignments, we will see the "PhD level"

  • @redthunder6183
    @redthunder6183 Месяц назад

    This is cool. But, can’t I just tell the model to talk to itself, or list its chain of thought until it has an answer?
    All they did was make it output more tokens, and increase the time it takes to get the answer

    • @Alex-fh4my
      @Alex-fh4my Месяц назад +4

      True but this model was specifically trained to output these kinds of reasoning tokens that actually help solve problems. Just because you can ask your model to think step by step doesn't mean it will do it productively. The key difference here is that they must have gotten a lot of data of reasoning steps and had a way to measure if the reasoning was good or bad at each step

  • @commonwombat-h6r
    @commonwombat-h6r Месяц назад

    so can this model solve the famous strawberry problem, namely telling how many "r"s there are in "strawberry"?

  • @6little6fang6
    @6little6fang6 Месяц назад

    the real reflection

  • @Likemea
    @Likemea Месяц назад

    ok good but as always many people (specifically the poor or averages) get left out.. we cant use this model.. we need to wait for an open source alternative OR a local one

  • @ayushmanbt
    @ayushmanbt Месяц назад

    all I need to know ... can it say there are 3 r in strawberry?

  • @orangehatmusic225
    @orangehatmusic225 Месяц назад

    This is not GPT5 it's a model that is literally a year old they've been holding back from the public.

  • @richardcraddock1965
    @richardcraddock1965 Месяц назад

    New immortal anatomy. Micro-robot-composite parts. Transfer of conscious to new immortal anatomy. Full-conscious immersion as a functionality of the new immortal anatomy.
    AI Agents App 3D Simulations extrapolation - R&D automated about the aforementioned objectives.
    SuperAI System.
    This should be done in a timely manner; it should've already been done.

  • @rady7273
    @rady7273 Месяц назад

    They wrote that they will also continue the GPT series

  • @ArtUniverse
    @ArtUniverse Месяц назад

    I feel like I live in some kind of a different reality, because I absolutely cannot see this "PhD level reasoning" everyone is raving about. I specialize in computational astronomy and tried throwing some advanced but common problems (such as predicting solar eclipses), and the AI either has no idea what to do, or just gives up and tells me to use an existing solution. For simpler problems it will usually pick the right equations, but still mess up the calculations. From how the things are progressing right now, I feel like we're still many years away until it actually becomes "smart".

    • @RYu-b7d
      @RYu-b7d Месяц назад

      Sir is it illegal or legal, recently ChatGPT introduces 01 version and it's solves PhD problems, some ppl generating research paper through it an original research paper with calculations all which are required for a paper nd trying to publish. Is it not ethical ?

    • @ArtUniverse
      @ArtUniverse Месяц назад

      @@RYu-b7d why are you asking me this

    • @Boopy357
      @Boopy357 Месяц назад

      Years is an overestimation at this point. This seems like a stopgap solution considering how resource inefficient it is just to get something that is marginably correct. It also takes longer. 1 step forward, 2 steps back.

  • @Walid-Lamraoui
    @Walid-Lamraoui Месяц назад

    Why choose O as in o1 (maybe for open?) .. i think they should use C as in c1 for corporate .. that's more accurate

    • @bycloudAI
      @bycloudAI  Месяц назад +1

      one of their dev just confirmed the o in o1 stands for openai
      they aim to be less confusing, but now its even more confusing with their multi-modal gpt-4omni

  • @mrd6869
    @mrd6869 Месяц назад

    I wanna see Anthropics response to this.

  • @DLSSelfGamer
    @DLSSelfGamer Месяц назад

    👍🏻👍🏻

  • @telotawa
    @telotawa Месяц назад +1

    o1 being so closed down and un-transparent makes it even more sad that Reflection was fake

    • @a_soulspark
      @a_soulspark Месяц назад

      open source will catch up very soon. even if Reflection was fake, realistically it should be possible to make reasoner models like o1 within the open-source community, especially considering o1-mini is so good at reasoning despite being "small" (to OpenAI standards, at least)

    • @shivanandvp
      @shivanandvp Месяц назад

      @@a_soulspark I am optimistic about potential for open source alogrithms, training data and weights, but computing resources are a big deal too. Regular people in the foreseeable future won't have the money or computing resources to run a specialized o1 based model in their homes to get an advantage on the stock market - which some other companies can definitely do with their capital.

  • @TheMehranKhan
    @TheMehranKhan Месяц назад

    i stil haven't got their new voice features from previous keynote, they better release what they showed first and then move into a new model lol

  • @nigel-uno
    @nigel-uno Месяц назад +1

    If you write how many 'r's are in strowberry it falls and says 2, as of 9/12/2024. Same with raspberry.

  • @mndtr0
    @mndtr0 Месяц назад

    Hmmm looks like soon AI will replace programmers and mathematicians

  • @jiinkC
    @jiinkC Месяц назад +1

    my chatgpt still says 3.11 is bigger than 3.3

  • @SrIgort
    @SrIgort Месяц назад

    So sad it's just a preview for paying customers, I wish I could test it 😭
    I feel like when GPT4 released and I had to wait a long time to use it.

  • @dadballers
    @dadballers Месяц назад

    Hello Skynet

  • @nobodygonnaknow8771
    @nobodygonnaknow8771 Месяц назад +1

    they saying that future version will think for fcking weeks???????☠☠☠

  • @watcher8582
    @watcher8582 Месяц назад

    thx