Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems?

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024
  • НаукаНаука

Комментарии • 1 тыс.

  • @KyleKabasares_PhD
    @KyleKabasares_PhD  2 месяца назад +87

    Hi everyone, thank you so much for the feedback! I couldn't have expected this kind of attention on my video in the first 48 hours. I've taken some of your suggestions in the comments and have created a Part 2: ruclips.net/video/a8QvnIAGjPA/видео.html
    Please consider watching!

    • @AlfarrisiMuammar
      @AlfarrisiMuammar 2 месяца назад +1

      Open ai say The real Ai o1 version of will be out before the end of 2024.

    • @User.70793
      @User.70793 2 месяца назад

      UNIVERSAL BASIC ICOME 2025

    • @xspydazx
      @xspydazx 2 месяца назад

      it was funny !
      the thing is to keep up with the technolgys ad current innovations being deployed as it should not be hard to emulate these neural networks wth the open sourced models ! the aim is to train the local models as best you can at the highest point you capability but keep aware that technolgy needs to adavce to hande these heavy tensor calcualtions hence local model will be able to perform these tasks without the eed of outseide intervenetion so get an early start !
      or it will be a nightmre of stuy to catch up : it has taken me a full year of constant python etc doing this training and implementation to keep up and get ahead ! .. that gap is widening @
      Just expect to be able to host a 24b or 70b local within the next two years ! , a full genrative model ! ( so you could host multiple mini 7b agents at the same time ! hence a power full system ! ( agentic ! )

    • @debragotenks
      @debragotenks Месяц назад +1

      How much did open ai pay you to make this ad?

    • @User.70793
      @User.70793 Месяц назад

      @@AlfarrisiMuammar I can't wait I'm still intrepid for GPT 5

  • @An_Attempt
    @An_Attempt Месяц назад +347

    It is worth noting that GPT has probably 'read' ever thousands of answer books on Jackson, As well as all of Stack Exchange, as well as several study guides on Jackson in particular. So if you want to really test GPT ability you probably need to create novel questions that will not be found in any textbook or online form.

    • @gabrielbarrantes6946
      @gabrielbarrantes6946 Месяц назад +27

      Exactly, problems that solve students are already done somewhere on the internet, is just about googling it and copy paste the solution.

    • @taragnor
      @taragnor Месяц назад +42

      It's the same issue with AI being "great at programming" because it's extensively trained on leetcode problems.

    • @gabrielbarrantes6946
      @gabrielbarrantes6946 Месяц назад +8

      @@taragnor being good at leetcode is not even being good at programming.

    • @khiemgom
      @khiemgom Месяц назад +5

      @@gabrielbarrantes6946 its doesnt have access to the internet

    • @ShayPatrickCormacTHEHUNTER
      @ShayPatrickCormacTHEHUNTER Месяц назад +3

      @@gabrielbarrantes6946 said the web dev.

  • @kylechickering5890
    @kylechickering5890 Месяц назад +41

    Note 2: I haven’t looked through the answers, but in cases where GPT knows what the answer should be, it will make up crap to fill in the middle. I’ve asked it many PhD level math questions where it has hallucinated its way to a “plausible” answer.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  Месяц назад +8

      I'm planning on making a follow up video on comparing my approach to solving this problem with ChatGPT's! Thanks for pointing that out

    • @omatbaydaui722
      @omatbaydaui722 Месяц назад +7

      @@KyleKabasares_PhD that's not what he was saying. You were providing GPT 1o the answers, so of course it would give you the right answers since you provided them for it. To know if it truly solves PhD questions, you shoudn't give it questions like :"prove that this formula is verified" but rather " what is the formula for ... ?"

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  Месяц назад +6

      @@omatbaydaui722 I understood what he was saying. I’ve verified it doing problems correctly from start to finish (non-Jackson) without knowing the answer in advance! But in those cases I actually did the problem unlike here, so I’m planning on revisiting the problems in this video.

  • @nexicturbo
    @nexicturbo 2 месяца назад +39

    Crazy part is that this isn’t even the full model, which is even better

    • @CommentGuard717
      @CommentGuard717 2 месяца назад +5

      Yeah, it's not even a beta, it's a preview. And it's still using the last gen model. They're coming out with a new model pretty soon.

    • @CrazyAi166
      @CrazyAi166 2 месяца назад

      To us who's math students😂😂❤

    • @alvaroluffy1
      @alvaroluffy1 2 месяца назад +1

      @@CommentGuard717 yeah, imagine the GPT 5 implemented not preview version. Thats gonna be fucking wild and its not that far from now

    • @Ken-vy7zu
      @Ken-vy7zu 2 месяца назад

      @alvaroluffy1 well, they are now working on chatgpt 6.0

    • @alvaroluffy1
      @alvaroluffy1 2 месяца назад

      @@Ken-vy7zu shut up you know nothing, stop making up things, they are still working on gpt-5 you realize that right?

  • @JohnSmall314
    @JohnSmall314 2 месяца назад +50

    I just tried o1 with some fairly simple integrals, which it got badly wrong and I had to guide it to the correct answer. So I'd advise checking every step in the derivation.

  • @amcmillion3
    @amcmillion3 2 месяца назад +81

    The issue with this "test" is that the solutions to one of the most famous physics books are certainly in it's training data. Give it a problem for which there is no known solution. Or at a minimum give it a problem from an unknown text book. Find an old physics or math book from 100 years ago that has no digital copies. Then ask it questions from that and see how it does.

    • @akarshghale2932
      @akarshghale2932 2 месяца назад +1

      Exactly

    • @Lucid.28
      @Lucid.28 2 месяца назад +2

      @@amcmillion3 yes I did it and it’s not very accurate , I feed in 3 jee advance questions and out of which it could only answer 1 correctly , 1 he did it wrong even with hints and 1 he had to solve wrong first and than with hints it was able to solve it

    • @apache937
      @apache937 2 месяца назад

      best of all, make your own

    • @pratikpaharia
      @pratikpaharia Месяц назад +2

      If they were JEE Mains level questions, than solving 1/3 would put it at the same level as those qualifying in top 1000 for the exams. FYI, The highest marks in JEE Mains were usually around 33-35%. I’d wager that would be folks with l an IQ level of ~130+, that is pretty damn good for GenAI. On the normal distribution curve of IQ where 100 is the population average, 130 should at least yield 1 or 2 sigma of confidence level to the statement “GenAI has definitively exceeded the average of the human intelligence level”

    • @Lucid.28
      @Lucid.28 Месяц назад

      @@pratikpaharia nah

  • @chudchadanstud
    @chudchadanstud 2 месяца назад +42

    They told me AI would replace hard labourers and Fast food workers first leaving us more time to think so I went to college now I'm college and I'm the first one being replaced.

    • @phen-themoogle7651
      @phen-themoogle7651 2 месяца назад +5

      Don't worry, everyone will be replaced in 3-5 years💀

    • @DeathDealerX07
      @DeathDealerX07 2 месяца назад

      ​@@phen-themoogle7651 💯

    • @avijit849
      @avijit849 2 месяца назад +4

      yeah it's everyone. from labourers to physicists, Ai could do everything much more effectively.
      the biggest surprise was creativity, that Ai could create art.

    • @meanmachine99999
      @meanmachine99999 Месяц назад

      Just don’t be a data analyst and if you want to be a computer scientist get out of school and get into the languages and start building plenty of budding industries right now

    • @glub1381
      @glub1381 Месяц назад

      @@avijit849 ai is not creative and I don't believe it ever will be

  • @dimitriskliros
    @dimitriskliros Месяц назад +54

    to be fair, you don’t seem to have actually checked the model responses, there could have been mistakes or hallucinations throughout

    • @luissantiagolopezperez4938
      @luissantiagolopezperez4938 Месяц назад +1

      Can you point out any specific hallucinations on this video?

    • @particle.garden
      @particle.garden Месяц назад +9

      100% this. It's given the answer to work towards. I do not have enough knowledge in this area to prove that it came to it's conclusions incorrectly, but it's a well known quirk.

  • @SomeRandomdUde14
    @SomeRandomdUde14 Месяц назад +47

    Testing it with a book that is “infamous” probably isnt a great benchmark considering that it would mean that there is a considerable database related to that specific book it could read from. If you could test it on a novel problem that would be better

  • @JurankoNomo
    @JurankoNomo Месяц назад +27

    You have to remember that this book was probably directly in the chatGPT training data, so this may not be a valid measure of novel problem solving ability

  • @delxinogaming6046
    @delxinogaming6046 2 месяца назад +688

    This is the worst this technology will ever be….

    • @armwrestlerjeff
      @armwrestlerjeff 2 месяца назад +63

      That's an incredible truth

    • @MaxWinner
      @MaxWinner 2 месяца назад +46

      That's a terrifying truth

    • @thegeeeeeeeeee
      @thegeeeeeeeeee 2 месяца назад +23

      Eh it might hit a wall though.

    • @wbay3848
      @wbay3848 2 месяца назад +57

      @@thegeeeeeeeeeeI’m here from the future, your comment aged poorly

    • @igorbessa3563
      @igorbessa3563 2 месяца назад +4

      It might stagnate tho

  • @masterfall27
    @masterfall27 Месяц назад +52

    if its a famous problem, isn't there a good chance the solution was already in the training data?

    • @Stepbro126
      @Stepbro126 Месяц назад +17

      In general, ML models shouldn’t memorize the training data. A lot of effort is put into ensuring the model learns how to do the problem rather than memorizing.

  • @kylechickering5890
    @kylechickering5890 Месяц назад +24

    I haven’t watched the solving yet, but immediately I would like to point out that choosing problems which have known solutions may mean that the model has already seen (and simply memorized) the correct answer!
    A better test is to ask it an impossible problem or one that solutions don’t exist for and then try to see if it’s generated solution is correct.

    • @pripyaat
      @pripyaat Месяц назад +1

      Absolutely. If you simply Google the first 15 words of Problem 1, the very first result is a pdf document with a detailed, step-by-step solution. If anything, assuming the steps provided by o1 are correct, it just demonstrates it's decent at summarising search results...
      The same goes for programming. A lot of people get easily impressed when GPT "writes" a 50-line script that's basically within the first 3-4 StackOverflow posts after a Google search. I mean, yeah, I won't deny it's really convenient that the tool can save you a few clicks, but saying that it has an understanding of the answer it's giving you is (as of today) still a stretch.

    • @o_o9039
      @o_o9039 Месяц назад

      if you know how ai works the way they are trained is lossful they don't have access to word for word of every bit of their training info if they did these models would be terabytes upon terabytes in size and would be extremely slow.

    • @pripyaat
      @pripyaat Месяц назад +1

      @@o_o9039 I know how they work, and I'm not saying the model has all the information stored in its parameters, but it's no secret GPT can indeed search the web and summarize its findings. Copilot (based on GPT4) openly provides sources for almost everything it spits out.

    • @KRYPTOS_K5
      @KRYPTOS_K5 Месяц назад

      ​@@pripyaatHow to know if it cheated?

    • @98danielray
      @98danielray Месяц назад

      ​@@pripyaatjeez. even worss than I thought

  • @paulojcavalcanti
    @paulojcavalcanti 2 месяца назад +9

    i asked it to calculate some stuff (quantum mechanics) for me and it also did some difficult step without explanation. i asked it to prove that step and it gave me a proof containing 1 mistake, but i wasn't sure and asked about that step, then it realized it was wrong, explained exactly why it was wrong, fixed it, and remade the calculation corrrectly

  • @warsin8641
    @warsin8641 2 месяца назад +14

    Just a a few years ago no one ever imagined bots thinking...😭

  • @1.4142
    @1.4142 21 день назад +12

    The key is having the answer before hand so it can guess from both ends and connect them. Ask it to evaluate a parameterized surface integral even with wolfram plugins and it will make mistakes.

  • @olivetree9920
    @olivetree9920 2 месяца назад +17

    And remember this is a truncated version of the model. It's full version is much better at problems like this

  • @rishikhan2
    @rishikhan2 2 месяца назад +12

    Instead of telling it the answers, try asking it to find them. When I did this, it got the first one to an infinite sum but didn't reduce the infinite sum to the final answer: pretty good! For the second one, it had an extra factor of 1/pi that it made up. For the third it completely disregarded the angular dependence of the scattering and failed.

  • @Nordobali
    @Nordobali 2 месяца назад +5

    I don't understand anything about physics and advanced mathematics, but this video just made me excited for the future again!

  • @ominousplatypus380
    @ominousplatypus380 2 месяца назад +12

    The model has most likely been trained on these problems and their solutions, since they've been around on the internet for a long time. So it isn't really a good test of its abilities since it has just memorized the solutions. That being said, I also tried it with some problems from the 2024 International Math Olympiad, and it was able to get at least two (out of six) of them correct consistently. I only tried problems where the answer was a single number/expression, going through proofs would be much more work. The model's knowledge cutoff should be in October 2023 so it shouldn't be possible for it to have seen these problems before. It's still hard to tell since OpenAI isn't being very transparent with their methodology, but if the model is actually able to solve novel IMO level problems it has never seen before, color me impressed.

    • @contentfreeGPT5-py6uv
      @contentfreeGPT5-py6uv 2 месяца назад +1

      I test ,AND correct answer for me o1 2024 final with alternatives

    • @mirek190
      @mirek190 2 месяца назад

      gpt4o has the same training data and can not sole it? so ...

  • @cluelessMuslims-ed5js
    @cluelessMuslims-ed5js 2 месяца назад +59

    This channel is the reason why I'm not reading fluid mechanics rn

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад +30

      Can’t tell if I should say thank you or I’m sorry lol

    • @dreamscapes2727
      @dreamscapes2727 2 месяца назад +2

      Fluid mechanics is extremely fun🤲

  • @militiamc
    @militiamc Месяц назад +25

    O1 was trained on all Internet, including that book

    • @HedgeFundCIO
      @HedgeFundCIO Месяц назад +7

      So were all of us.

    • @roro-v3z
      @roro-v3z Месяц назад +2

      @@HedgeFundCIO the difference is we can think, but it can only answer. Its a great tool!! but not think on its own

    • @casaruto
      @casaruto Месяц назад +7

      Actually we dont know if its thinks because we dont know how we think. This is a philosical debate in ai community over the years.​@@roro-v3z

    • @Hosea405
      @Hosea405 Месяц назад

      @@roro-v3z almost like you didn't see it go through problems step by step to get to an answer..... It can indeed reason on it's own now

    • @roro-v3z
      @roro-v3z Месяц назад +1

      @@Hosea405 yes it did but on training data, but it won't have new ideas that have not been trained

  • @knotratul5106
    @knotratul5106 5 дней назад +21

    It was trained on this data lol

  • @anthonypenaflor
    @anthonypenaflor Месяц назад +28

    The model's performance is undoubtedly impressive, but if it was trained on this book (which seems likely), it's not truly generalizing to new data. For a fair assessment of its capabilities, models should be tested on novel, unforeseen problems, rather than those for which the answers are already available. In practice, models are typically evaluated on fresh data to gauge how well they can generalize. To accurately measure performance at this level, problems should be novel and manually verified, even if that takes considerable time (1.5 weeks or more).

    • @pentazucar
      @pentazucar Месяц назад +5

      I believe the book does not have the answers to the problems, so even if it was trained with the book it shouldnt help it to solve problems. Still it is possible that it just took the answers from some physics subreddit post and just pasted it

    • @velcrawlvfx
      @velcrawlvfx Месяц назад

      It backtracked on its own answers double checking so I doubt it already knew the answer off it being trained off the book

    • @Daniel__-
      @Daniel__- Месяц назад

      IT BACKTRACKED ITSELF THOUGH????

    • @Daniel__-
      @Daniel__- Месяц назад

      Not to mention, universities have and still run research where they create brand new tests solely for having AI take them

  • @samsonabanni9562
    @samsonabanni9562 2 месяца назад +8

    " OpenAI's new AI model, "o1," has achieved a significant milestone by scoring around 120 on the Norway Mensa IQ test, far surpassing previous models. In a recent test, it got 25 out of 35 questions right, which is notably better than most humans. A critical factor in these tests is ensuring the AI doesn't benefit from pre-existing training data. To address this, custom questions were created that had never been publicly available, and o1 still performed impressively"

    • @marcianoforst6311
      @marcianoforst6311 Месяц назад

      So it’s already smarter than 90% of the global human population, and it knows everything on the internet.

  • @marul688
    @marul688 Месяц назад +37

    There is a problem with the test:
    Since the answer ,,show that.." is given, the AI will always show the correct answer, the reasoning might be flawed. It would be better to cut out the correct answer from the problem and see what AI will answer then.

    • @maxaposteriori
      @maxaposteriori Месяц назад

      This applies to humans completing the problem as well, and there was an effort made to check the steps.
      I agree, it might be interesting to see if it could though (although if it succeeds, will likely express it in an different form which may be hard to verify).

    • @pentazucar
      @pentazucar Месяц назад +3

      i agree with you, specially taking into account that it may just be bluffing and we would have no idea

    • @briansauk6837
      @briansauk6837 Месяц назад +3

      Prior versions had bogus steps that didn’t really follow legitimate steps, and units were often fouled up. Definitely deserves to be looked at deeper to see if that has improved.

  • @Draco-jk3rb
    @Draco-jk3rb 26 дней назад +5

    If you want to know, the steps are simply contextual changes, it is essentially a gpt that has the instructions self-set, and the output of its thinking steps is the instructions it is providing itself at each step. it works because by shifting context at each step rather than only a single context of the original message and response, it is able to aproach problems itteratively from different 'perspectives'

  • @Linshark
    @Linshark 2 месяца назад +7

    It might hallucinate since it knows the answers. So one would need to check all the calculations.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      I just filmed a part 2 where it involves some problems where the answer is not known in advance, and problems that I'm confident it did not have access to previously: ruclips.net/video/a8QvnIAGjPA/видео.html

  • @themonsterintheattic
    @themonsterintheattic 2 месяца назад +4

    i’ve been watching lots of videos on o1 and i’ve not had a wow moment yet…. but this was it

  • @a.b3203
    @a.b3203 Месяц назад +12

    As a person doing a bachelor's in EE, this will be very useful for me. Like many, I only wonder what'll happen in the future when it gets even more advanced?
    Maybe take my reduced earnings and live off the land somewhere. Away from this.

  • @pancakesauces
    @pancakesauces Месяц назад +64

    Now ask it if 9.11 is bigger than 9.9

    • @goldenspoon8203
      @goldenspoon8203 Месяц назад +2

      it worked for me 👍

    • @anhucphan3828
      @anhucphan3828 Месяц назад

      bruh it said 9.11 is bigger than 9.9

    • @thatonebruneian
      @thatonebruneian Месяц назад +6

      *is 9.11 bigger than 9.9*
      [Thought for 4 seconds]
      "Yes, 9.11 is greater than 9.9."
      *Are you sure?*
      [Thought for 17 seconds]
      "You're right, and I apologize for the mistake in my previous response. Actually, 9.9 is greater than 9.11. Here's why:
      9.9 can be expressed as 9.90 or 9.900.
      9.11 remains 9.11 or 9.110.
      When comparing:
      9.900 (which is 9.9) is greater than 9.110 (which is 9.11).
      Therefore, 9.9 is greater than 9.11."
      I suppose we're going to have to make sure ChatGPT re-evaluates their answer every single time

    • @craycody
      @craycody Месяц назад

      I just did it and it got it right first try:
      No, 9.11 is not bigger than 9.9; in fact, 9.11 is less than 9.9. Here's the explanation:
      9.11 equals 9 plus 0.11.
      9.9 equals 9 plus 0.9.
      Since 0.11 is less than 0.9, it follows that:
      9
      +
      0.11
      9+0.11 (which is 9.11) is less than
      9
      +
      0.9
      9+0.9 (which is 9.9).
      Therefore:
      9.11 < 9.9

    • @trucid2
      @trucid2 Месяц назад +1

      9.11 is bigger than 9.9 when it comes to version numbers.

  • @decouple
    @decouple Месяц назад +12

    Its funny how good it is at some things and how terrible it is at others things still, seems its abilities are heavily dependent on whether examples of the problem were included in its training data. I've asked it to create a 32 bit crc algorithm and it did it perfectly, however when asking it to create considerably more trivial 3 bit crc algorithm (which is uncommon and quite useless), it failed miserably and in fact produced multiple wrong result that got worse and worse as i pointed out the flaws.

  • @foubani3673
    @foubani3673 2 месяца назад +21

    This is scary. But you have to try with novel problems that the AI has never seen before. Chatgpt has been for sure trained with the Jackson book!
    Nevertheless, the reasoning capabilities are astonishing.
    A new era has begun.

    • @sid.h
      @sid.h 2 месяца назад +10

      " Chatgpt has been for sure trained with the Jackson book!"
      This is such an oft-repeated nonsense statement though. Just because a problem might be in its training set, the model will not be significantly or any better answering that exact problem than any other problem in the same category.
      It's, like. Do you remember every homework math equation you have solved in your life?
      Would you be any better at solving a problem you have already encountered once 10 years before vs a similar novel one? No, of course not, unless you have superhuman memory where you keep an exact copy of everything you've done ever.
      Similarly, these models don't memorize. They synthesize. They are learning models, not search engines or "neually indexed databases" or whatever.

    • @denysivanov3364
      @denysivanov3364 2 месяца назад +2

      @@sid.h Ai remembers patterns, not particular problems. And indeed if some pattern is missing AI will miss it, if pattern is well represented AI will solve it well. Better architecture of neural network remembers more and remembers and solves corner cases better. This is what we see in chess networks such as Leela Chess Zero.

  • @xorqwerty8276
    @xorqwerty8276 2 месяца назад +11

    Imagine 10 years from now

    • @phen-themoogle7651
      @phen-themoogle7651 2 месяца назад

      @@xorqwerty8276Star Wars Universe but more humanoid bots on our planet , and billions of them being like gods building anything and everything they imagine. Earth is surrounded by a giant dome that extracts/enhances light from the sun combined with technology that speeds up how fast plants or trees grow, we have a combo of biological machines that have become humans too and are interbreeding half humans half machines. The sun is all we need to survive now. Millions of unique new species emerge.
      (10 years is like millions of years if true ASI comes in a year from now)
      Even 2 years could be very wtf lol 😂

    • @MrSchweppes
      @MrSchweppes 2 месяца назад

      In less than 3 years lots of knowledge workers will be displaced by AI.

  • @otty4000
    @otty4000 Месяц назад +10

    i am doing a phd in ml related field.
    Setting fair benchmarks and tests in the current day is quite hard considering the shear scale of data top models are trained on.
    And using a famous physique text book isnt really a good attempt.
    model o1 reasoning is a massive step up though for sure, i think it could do a similar blind test like this very soon.

  • @ibzilla007
    @ibzilla007 2 месяца назад +8

    If it is on the internet, it's in its training data. You would need to find questions that it has not been trained on. This is why benchmarking is so hard

    • @maalikserebryakov
      @maalikserebryakov 2 месяца назад +5

      It still impressive the model can accurately comprehend which part of its training data deals with the problem in question.
      There are human beings who haven’t mastered this skill lmao

    • @Weirdgeek83
      @Weirdgeek83 2 месяца назад +2

      Stop the downplaying. These types of problems are impossible to solve without reasoning. Simple pattern recognition doesn't make this possible.
      This cope needs to stop

  • @hidroman1993
    @hidroman1993 2 месяца назад +7

    You show up in 2005 with this tool and they'd call it AGI

  • @rwi6760
    @rwi6760 2 месяца назад +4

    as a high schooler who had taken part in aime, o1 is really impressive. aime problems get so much harder when it gets to the latter half. so 83% (o1) compared to 13%(gpt4o) is huge. the latter solve possibly only solve the first two which are not challenging at all

  • @stevedavey9435
    @stevedavey9435 2 месяца назад +5

    God, if only I had this back in 2003 when I completed my physics degree. I would have saved myself so much pain and suffering.

  • @diophantine1598
    @diophantine1598 2 месяца назад +13

    Since that book is older than the model, I wonder if it appeared in its training data.

    • @Analyse_US
      @Analyse_US 2 месяца назад

      100%. Perplexity pointed me to at least 6 pdf versions available for free online. There are also lots of study notes
      online available for this text. Although I have no idea if it is memorizing answers.

    • @lolilollolilol7773
      @lolilollolilol7773 2 месяца назад

      @@Analyse_US it looks like it actually tries to solve the problems.

    • @Analyse_US
      @Analyse_US 2 месяца назад +1

      ​@@lolilollolilol7773 I agree, it's definitely not just remembering the answer. But is it remembering steps to solving the problems that it found in online study examples? I don't know. But my own testing makes me think it is a big step up in capablity.

    • @denysivanov3364
      @denysivanov3364 2 месяца назад

      @@Analyse_US AI memorizes patterns. If pattern is similar but exercise is different AI will solve it.

  • @bgill7475
    @bgill7475 Месяц назад +14

    This was an interesting test. I still think it's funny when people say these models don't understand.
    Anyone who's used them enough understands that they do understand.
    One nice thing is that you can ask follow up questions as well and ask why something is like that, or ask it to try things in a slightly different way if you want it done differently.

    • @woosterjeeves
      @woosterjeeves Месяц назад +5

      I dunno about latest models, but ChatGPT 3.5 does NOT "understand" anything. It feeds you fake references, and when you repeatedly tell it it is doing so, it will say "sorry" and continue to feed you fake references. That is not its fault--it is not "replying" or "responding" or dong anything a living being is doing. If you give it a training set containing PhD Level physics problems, sure it can solve those problems. That is just predicting output from a training data.

    • @人人人人人人人人
      @人人人人人人人人 Месяц назад +1

      @@woosterjeeves This isn't GPT 3.5 though, and that specific model you mentioned was released back in November of 2022, the first public release of ChatGPT. In the video, you can see it's process of reasoning. ChatGPT doesn't use fake references if it's able to break it down and be able to express why and how it conducts it's problem solving and reasoning. Also to "That is just predicting output from a training data", one, how is that different from learning? Isn't that the point of teachers, to help you predict and reason the output from the input of questions and data? Two, this is just a preview, not the full model, and it is able to do extremely difficult problems like these, explain the reasoning, the process, and give the right answer. We are slowly gravitating towards such a world where such an excuse of prediction of data will no longer be viable to argue about. The model is able to understand. The model is able to think with it's data. It's putting formulas and answers together from it's data, to reason and to form intelligent answers and responses when in contrast, the same problems make the most qualified PhDs scratch their heads. Reminder, these questions take around 1.5 weeks as said to solve ONE problem, GPT-o1 does it in less than 2 minutes.

    • @woosterjeeves
      @woosterjeeves Месяц назад +3

      ​@@人人人人人人人人 Sure. I am still flummoxed why someone would add "understanding" to a prediction model. If you think prediction (from training data) is equal to understanding, then algorithms are already "understanding". Why hype this one? OTOH, if you think there is something qualitatively different, then we can talk about that. But you cannot claim both.
      Are chess computers "understanding" because they can do moves that make super GMS scratch their heads? If so, then the argument is already over. I am only cautioning against use of common-term words ("understanding") which makes one think in terms of sentience. A language model has no such thing.
      Does this mean AI will never reach sentience? I never said that--just that the video does not do it for me. I am totally clueless why others are impressed about this model's "understanding", the same way I would be if someone said Alpha-Zero (the chess AI) understands chess. That is all.

    • @Smrda1312
      @Smrda1312 Месяц назад +1

      Please refer to the chinese room problem.

    • @fr5229
      @fr5229 Месяц назад

      @@woosterjeeves If you’ve only used 3.5 then I’m not surprised that’s your opinion 😂

  • @luisalfonsohernandez9239
    @luisalfonsohernandez9239 2 месяца назад +6

    Maybe it was in its training dataset, would be interesting for you to test something it could not have seen during training

    • @iFastee
      @iFastee 2 месяца назад +1

      not maybe, for sure. i know people dont have to be all experts in exactly what the black box of deep learning is doing but holy people are so dumb... i wonder if they don't think that IF what they think is true, meaning the models being this great, that in 1 month we wouldn't have to get new discoveries in all science fields...
      which will not come because the current AI is 100% data capped. its just memorization of PDFs and manifold recalling

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      This is a fair point! I have gone ahead and uploaded a Part 2 using problems I'm confident it had not seen before and that I have detailed answers to! ruclips.net/video/a8QvnIAGjPA/видео.html

  • @AlfarrisiMuammar
    @AlfarrisiMuammar 2 месяца назад +9

    Open ai say The real Ai o1 version of will be out before the end of 2024.

    • @Romathefirst
      @Romathefirst 2 месяца назад +1

      really? where?

    • @achille5509
      @achille5509 2 месяца назад +4

      They said about 1 month but will probably be end of 2024 as you say, o1-preview is not the full version there is the "full" o1 that is better yeah

  • @Patrick-vv3ig
    @Patrick-vv3ig Месяц назад +35

    "PhD-level". Our undergraduate theoretical physics course in electrodynamics used Jackson lol

    • @mohammadfahrurrozy8082
      @mohammadfahrurrozy8082 Месяц назад +16

      smells like a clickbait title you know

    • @trent2043
      @trent2043 Месяц назад +1

      Definitely non-standard in the US.

    • @andreaaurigemma2782
      @andreaaurigemma2782 Месяц назад

      You used it as a vague reference book but you never really read through it.

    • @Patrick-vv3ig
      @Patrick-vv3ig Месяц назад

      @@andreaaurigemma2782 Of course I did.

    • @andreaaurigemma2782
      @andreaaurigemma2782 Месяц назад

      @@Patrick-vv3ig no you didn't and if I had a penny for every shitty undergrad bragging about how they went through hard books without understanding a single thing I'd be rich

  • @JJ-fr2ki
    @JJ-fr2ki 2 месяца назад +9

    I suspect this was trained on the Jackson book.

  • @rickandelon9374
    @rickandelon9374 2 месяца назад +4

    The first time i watched a video like this was from sixty symbols where they also tried to solve physics problems using the original vanilla Chatgpt 3.5. They didn't get anywhere close to this level. I think the progress is reallty accelerating. I also think that inference time compute is a very real thing and the guys at openai have solved it with this new model in a fundamental way for sure. I think there will be other ways to implement system 2 thinking but i think that using reasoning tokens for accomplishing this is maybe the best and coherent way to go forward. I truly think that with o1, we have the first complete architecture for AGI.

  • @mradford10
    @mradford10 2 месяца назад +9

    Great video and interesting commentary. It’s interesting you think this might be a good study aid or a tool… however I just watched you take longer to check the answers than the model took to solve them… and your an actual subject matter expert… and as you correctly pointed out, this is just a preview of the full model capabilities. This new type of model will not help experts, but replace them. They will eclipse not only human level knowledge, but human level speed. This is not a tool. It’s disruption personified. With something this good (and as the saying goes, this is as bad as they will ever be as they will only improve from this time onwards) what purpose will it serve to complete university study for 3 years, only to try and find employment in a career that no longer requires humans. Amazing.

    • @msromike123
      @msromike123 2 месяца назад +3

      It's a machine, like cotton gin, the steam engine, the locomotive, etc. All advance of technology has displaced people from some jobs into others. And yet we are still here. What's the alternative? Structure society to be less productive and less efficient in order to keep people employed in obsolete jobs? That will just slow the growth of the economy and cause a lower standard of living, leading to poverty and hunger as the world population keeps multiplying. It's going to put people out of work, we will be ok. Becoming a Luddite is not going to change anything.

    • @AlfarrisiMuammar
      @AlfarrisiMuammar 2 месяца назад +2

      ​@@msromike123Cars replace horses So will humans suffer the same fate as horses?

    • @msromike123
      @msromike123 2 месяца назад +1

      @@AlfarrisiMuammar I am glad you are thinking about it now. 1) Truck drivers replaced wagon drivers (not horses.) There are many more truck drivers now. 2) The standard of living for both truck drivers AND horses is higher than ever. Same thing goes for automobiles and horses.

  • @FlavioSantos-uw1mr
    @FlavioSantos-uw1mr 2 месяца назад +6

    Not bad for a model smaller than the o1 and based on GPT-4, to be honest I don't know how I'll be able to test upcoming versions like the ones based on GPT-5.
    I can't wait to use this on university projects, there are so many things I need to go looking for experts for relatively "easy" tasks.

    • @danielbrown001
      @danielbrown001 2 месяца назад

      There’s so much potential in the pipeline. Imagine the o1 techniques applied to image/video generation. Bye-bye obviously fake images, and hello “indiscernible from reality” images.
      Also, once o1 is layered on top of GPT-5, we’re likely talking “competing with or beating best-in-the-world level scientists/thought leaders” in different fields. This will fuel more investment into compute farms to create even MORE powerful AI, and multiple instances can run simultaneously to solve problems that would take humanity millennia to solve otherwise. Including AI researching how to improve AI in a self-improving recursive loop that will only stop upon reaching the physical boundaries of the universe.

  • @h-e-acc
    @h-e-acc 2 месяца назад +1

    I mean it gave you step by step how it was able to solve those problems and gives you its insights into how it’s thinking. That is just wild beyond imagination.

  • @debasishraychawdhuri
    @debasishraychawdhuri 2 месяца назад +36

    You have to give it your own problem. The book is part of its training data. That is why it just knew the sum.

    • @lolilollolilol7773
      @lolilollolilol7773 2 месяца назад +5

      Even if that was the case, the simple fact that it worked out the path to the solution is impressive. But you are likely wrong.

    • @lewie8136
      @lewie8136 2 месяца назад +1

      @lolilollolilol7773
      LLMs literally predict the next word based on probability. If the answer isn’t in the training data it can’t answer the question. It doesn’t have reasoning skills.

    • @Lucid.28
      @Lucid.28 2 месяца назад

      But they do have reasoning skills ,

    • @lewie8136
      @lewie8136 2 месяца назад

      @@Lucid.28 No they dont.

    • @Tverse3
      @Tverse3 2 месяца назад +3

      ​@@lewie8136they recognize patterns like we do... We don't really think, we also predict things based on the patters we see... We just named it thinking.

  • @boredofeducation-sb6kr
    @boredofeducation-sb6kr 2 месяца назад +2

    The way this model was trained was it took physics problems just like that and used a model like gpt4 to create reasoning chains until it could actually derive the correct answer. So it's not surprising. It can already solve textbooks that are well solved already because the answer is very objective and once you get a solid reasoning chain to get to the answer, you can simply train the model on that

  • @akaalkripal5724
    @akaalkripal5724 2 месяца назад +39

    We need AI to replace politicians, ASAP. The 'presidential debate' was a travesty.

    • @jamesbench8040
      @jamesbench8040 2 месяца назад +6

      best realization I've heard in weeks

    • @CubeStarMaster
      @CubeStarMaster 2 месяца назад +4

      An "ai president" as long as there isn't a person telling it how to think could be the best thing for any country. I would still give it a few years before doing so tho and make sure it's main objective is to do the best for the country.

    • @tchadcarby8439
      @tchadcarby8439 2 месяца назад

      I support this idea 1000%

    • @alvaroluffy1
      @alvaroluffy1 2 месяца назад

      i think current o1-preview is far more capable to govern than any human. Of course, it would need some readjustments like a more continous existence, without resetting itself, and a virtually infinite context window so it can always take into account everything that has ever happened in the past

    • @gurpreet4912
      @gurpreet4912 Месяц назад

      You have no clou how ai works 😂

  • @bbamboo3
    @bbamboo3 2 месяца назад +4

    I asked it to find how much the earth would have to be compressed to become opaque to neutrinos: It took it 39 seconds to say 26 km diameter. Totally fascinating how it got there...(01Preview)

    • @Diamonddavej
      @Diamonddavej 2 месяца назад

      The correct answer is ~300 meters. It told me 360 meters.

  • @Sheeshening
    @Sheeshening 2 месяца назад +4

    And they wrote how this was just a step of many like that to come. In 5-10 years the world may be changed fundamentally, 20 years it’ll be hard to recognize

  • @albertoalfonso7835
    @albertoalfonso7835 Месяц назад +64

    If the solutions exist on the internet is it really solving it? Or just analyzing and printing the answers . A true test could be a creating a unique problem with no known solutions

    • @dieg9054
      @dieg9054 Месяц назад +9

      how would you know if it was correct or not if there was no known solution?

    • @d1nrup
      @d1nrup Месяц назад +8

      @@dieg9054 Maybe he means a problem that isn't posted on the internet since ChatGPT gets its solutions from the downloaded internet data.

    • @TrueOracle
      @TrueOracle Месяц назад +3

      That isn't how LLMs work, unless it is a wildly popular problem the small details it learns from the internet gets lost in the neural web

  • @andydataguy
    @andydataguy 2 месяца назад +1

    Your reaction clips bout to go viral bro 🚀

  • @sergiosierra2422
    @sergiosierra2422 2 месяца назад +10

    Lol that was my reaction last year with gpt4 but with programing

  • @cyrilvonwillingh5523
    @cyrilvonwillingh5523 2 месяца назад +1

    This is a great to see the model's real ability. Thank you for the demonstration.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      You're welcome! I have made a part 2 using new questions that I'm confident it didn't have access to beforehand: ruclips.net/video/a8QvnIAGjPA/видео.html

  • @cross4326
    @cross4326 2 месяца назад +15

    GPT is most probably trained on the answers since it is a well known book

    • @sCiphre
      @sCiphre 2 месяца назад +2

      Maybe, but it showed its work

  • @cheisterkamp
    @cheisterkamp 2 месяца назад +9

    Since it is an infamous book, how do we know that it really solved the problems by reasoning and is not just trained on the existing solutions?

    • @hxlbac
      @hxlbac 2 месяца назад

      Is there the answers at the back of this book?

    • @cheisterkamp
      @cheisterkamp 2 месяца назад +1

      @@hxlbac No, but an Instructor's Solutions Manual online as PDF and several other sample solutuons.

  • @trejohnson7677
    @trejohnson7677 2 месяца назад +5

    the changing my approach part was kinda scary ngl

  • @plonkermaster5244
    @plonkermaster5244 24 дня назад +29

    the problems are known by the llm already, it has been trained on the issue it dident come to a conclusion through reasoning

    • @tomyao7884
      @tomyao7884 21 день назад +1

      to my knowledge, its data is only until october 2023, and it can solve problems created after that data cutoff just as well. (for example it o1 mini was able to solve some advent of code programming problems published december 2023)

    • @matiasluna514
      @matiasluna514 20 дней назад

      @plonkermaster5244 Your statement it's half true, LLMs need to have existing information to propperly work. However, unless the problem presented needs an actual new theory with previous research and a never seen formula, LLMs can recognize the formulas needed to solve the problem. Good observation.

    • @I_Blue_Rose
      @I_Blue_Rose 14 дней назад +1

      Lol, it's not the case whatsoever, keep coping though.

    • @ZONEA0official
      @ZONEA0official 11 дней назад

      @@matiasluna514to be fair, we as humans need to do that as well haha

  • @yzhishko
    @yzhishko 25 дней назад +20

    Solutions are publicly available and most probably in training datasets already. LLMs are good at what they already learned, but even not 100% accurate there.

    • @I_Blue_Rose
      @I_Blue_Rose 14 дней назад +1

      "to my knowledge, its data is only until october 2023, and it can solve problems created after that data cutoff just as well. (for example it o1 mini was able to solve advent of code programming problems published december 2023)"

  • @JJ.R-xs8rf
    @JJ.R-xs8rf Месяц назад +12

    The first one is the easy one? Yet at the same time you're amazed that it solved it in 122 seconds, while you mention that it generally takes others 1.5 week.

  • @tanner9956
    @tanner9956 Месяц назад +5

    ChatGPT is truly amazing i wonder what this technology will be like in 10 years i think schools should really use this technology and allow it because it’s not like it’s going away tomorrow. I also think this technology makes it impossible to be ignorant

  • @Mayeverycreaturefindhappiness
    @Mayeverycreaturefindhappiness 2 месяца назад +25

    This book is probably in its training data

    • @japiye
      @japiye 2 месяца назад +7

      so why did it try different approaches and not just the correct one?

    • @Mayeverycreaturefindhappiness
      @Mayeverycreaturefindhappiness 2 месяца назад +1

      @@japiye I am not sure but I do know it was trained on those types of problems so it’s not truly deriving those problems cold did you notice it would pull numbers out of nowhere. It’s still really impressive and a very useful model I think we skeptical that it’s really the equivalent of a physics grad student, if you watch ai explained video it gets common sense problems wrong

    • @trueuniverse690
      @trueuniverse690 2 месяца назад +2

      @@Mayeverycreaturefindhappiness Still impressive

    • @Mayeverycreaturefindhappiness
      @Mayeverycreaturefindhappiness 2 месяца назад +2

      @@trueuniverse690 yes

    • @deror007
      @deror007 2 месяца назад +3

      @@japiye As it probabilistically selects the next word, it will select different words compared to what is has seen. This is what makes the model generate new sentences, but it is able to evaluate it's chain of thought which leads to the correct one or a better result. As the problems are found online and the jackson problems are well known in the field for many years previously, it must be in it's training set.

  • @OmicronChannel
    @OmicronChannel 2 месяца назад +7

    Just as a comment: it looks impressive. However, to truly judge how good the model is, one (unfortunately 😬) needs to read the proofs line by line and examine the arguments in depth. From my experience with GPT-4, the proofs often look good, but they sometimes contain flaws when examined more closely.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад +5

      Just finished recording a video where I do that more or less with some problems I have the answer to and am pretty sure the problem didn't exist on the internet!

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад +4

      Here is part if you are interested: ruclips.net/video/a8QvnIAGjPA/видео.html

  • @lolilollolilol7773
    @lolilollolilol7773 2 месяца назад +6

    Incredible. It would be interesting to see what happens if you give it to solve an incorrect result. Will it show that your result is incorrect and instead give the correct one ?

  • @BAAPUBhendi-dv4ho
    @BAAPUBhendi-dv4ho Месяц назад +10

    The real question is can it solve Indian entrance exam questions or not?

  • @netscrooge
    @netscrooge 2 месяца назад +3

    Thank you. I find this more interesting that Dr. Alan D. Thompson's obscure-information tests.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад +1

      I appreciate the view! I am working on creating a Part 2 that will involve open-ended questions that I was given as a graduate student in school that I don't believe come from any textbooks, so stay tuned for that!

    • @netscrooge
      @netscrooge 2 месяца назад

      @@KyleKabasares_PhD I subscribed. Don't want to miss Part 2.

  • @Ikbeneengeit
    @Ikbeneengeit 2 месяца назад +7

    But surely the model has already been trained on that textbook?

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      It's a fair point, I've gone ahead and filmed and recorded a part 2 that involves problems I'm confident it hadn't seen before: ruclips.net/video/a8QvnIAGjPA/видео.html

  • @MrErick1160
    @MrErick1160 2 месяца назад +6

    Hey man! You should do a video with scores, like, you will do 5 tests, and allow 5-shot for each problem to each model. And then see out of 5 what's the score. Do this for GPT4o vs O1 preview, you can also do O1 vs Claude sonnet!
    Like a "LLM's Face Off"

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      I actually did a stream like that last night! Gave o1, 4o, Gemini Advanced, Claude Sonnet 3.5, Grok 2, and LLama 3.1 a college math exam! ruclips.net/user/liveGdN4MFxLQUU?si=flPSFIxx85Uqyoz7

  • @robclements4957
    @robclements4957 Месяц назад +6

    Tip to past questions in: ask ChatGPT 4o to transcribe the picture

  • @scgisouvik7992
    @scgisouvik7992 Месяц назад +8

    Everything Is Easy Until o1 Faces Keshab Chandra Nag (Only Bangalis Will Understand) 😂

  • @Drone256
    @Drone256 2 месяца назад +11

    This should make you seriously question the way we do education. If human value isn't in solving problems that are hard, yet clearly defined, then why teach that? You teach it because you need to know it to solve higher level problems. But maybe we no longer need to also train the skill of doing the calculations. So long as you understand the concept properly you can move on without spending a week pushing through the math. That's going to be very hard for some people to accept.

    • @JaredCooney
      @JaredCooney 2 месяца назад

      Understanding the concept, unfortunately, typically requires dozens of practical experiences. This us why teaching math starting with a calculator leads to lesser learning than introducing a calculator following basic practice

    • @Drone256
      @Drone256 2 месяца назад

      @@JaredCooney very true. But I think students will be doing less of it and learning more. We’ve seen this pattern before.

  • @tama-gq2xv
    @tama-gq2xv 2 месяца назад +9

    OMFG, another year.... everyone going to have a PHD.

    • @hipotures
      @hipotures 2 месяца назад +1

      Or no one, because why do something that a machine does better?

    • @tama-gq2xv
      @tama-gq2xv 2 месяца назад +3

      @@hipotures No, you don't get it. The standards have been raised. The hyper intelligent.... are going to be on steroids. I know i am.
      Imagine if someone at 18 with an iq of 145+ with AI tools at their disposal? now imagine a decade of this progress and the new generation coming in.
      We're going to see hyper geniuses.

  • @AlexBerish
    @AlexBerish 2 месяца назад +4

    FYI you should put new problems in new chats to avoid polluting the context window

  • @GodGuy8
    @GodGuy8 2 месяца назад +2

    it was messing up on symbolab generated Mclauren & taylor series problems for me last night, but its a massive improvement from last time i tried to get it to do math a couple years ago

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      Oh interesting, I'm working on another video that involve problems that shouldn't exist on the internet that my professors created themselves.

  • @Hardcore10
    @Hardcore10 2 месяца назад +15

    I will admit, even though this is cool and impressive It likely was trained off of this. I recommend trying to create some novel problem yourself and then testing it but the GPQA benchmark they used is completely Google proof and is not on the Internet it was created by PhD‘s, who were in physics, chemistry, and biology and it outperformed them in answering the tests that doesn’t mean its as good as them that just means it’s good at answering questions that PhD’s in those fields would struggle with I know nothing about physics. I came here for the AI stuff

    • @samsonabanni9562
      @samsonabanni9562 2 месяца назад +11

      the fact that it tired different apraoches, failed and tried others does not proof your point, but i lnow how hard it is for humans to accept that a machine can match their intelligence but i guess this time there is no excaping it..

    • @mint-o5497
      @mint-o5497 2 месяца назад +1

      @@samsonabanni9562 Doesnt rule out the possibility that it still couldve helped guide it to the answer.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      This is a good point! I've gone ahead and made a part 2 using newer problems that I don't believe it had access to in its training set and that I do have the answers to! ruclips.net/video/a8QvnIAGjPA/видео.html

    • @CubeStarMaster
      @CubeStarMaster 2 месяца назад

      Realistically, it probably does have that math problem in its data. However, unless they specifically overfitted their data with that one question hundreds of times, it's not likely the ai model simply remembered the question or something.

    • @Hardcore10
      @Hardcore10 2 месяца назад

      @@mint-o5497 yeah that’s my point i’m an ai nerd I’m not anti ai just being cautious

  • @bobsalita3417
    @bobsalita3417 2 месяца назад +2

    Great content idea. Love your reaction. Genuine.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      Thank you so much! I've gone ahead and created a part 2 based on the feedback to this video, I hope you will consider watching it! ruclips.net/video/a8QvnIAGjPA/видео.html

  • @mootytootyfrooty
    @mootytootyfrooty Месяц назад +18

    So much cope in the comments

  • @AAjax
    @AAjax 2 месяца назад +1

    As Andrej Karpathy recently said in an interview, ideal training data for a reasoning model would include step-by-step reasoning. (like how we teach children in school) t's a bit amazing that bulk internet data has enough of this embedded reasoning to get us the current results.
    OpenAI is using q*star to refine their synthetic data, no doubt with successful step-by-step reasoning in that data. This will take a couple years to reach the next model (that's how long it takes to train a new model) but it's start of a virtuous cycle, where ever capable models refine future synthetic data.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      Thank you for your comment, this step-by-step reasoning approach is definitely a game-changer! I have also just uploaded a part 2 to my channel if you are interested: ruclips.net/video/a8QvnIAGjPA/видео.html

  • @sephypantsu
    @sephypantsu 2 месяца назад +6

    I tested to solve a sudoku and it failed. It either give wrong results or change up the original question
    Still did much better than 4o when I tested it a few months ago

  • @nts9
    @nts9 2 месяца назад +6

    o1 will get better in the coming months and these problems with be easy for it perhaps.

    • @user-jm8fj7ez8s
      @user-jm8fj7ez8s 2 месяца назад +2

      I wouldn’t be surprised. The true limit of LLMs are problems with no real known solutions. The advancements still do not change the (oversimplified here) model of fitting a curve.
      OpenAI can do another round of RL and CoT on these specific problems, but all it takes is another set of problem that it really hasn’t encountered that well. It still suffers from the generic flipping an image of a dog and having the AI shit itself.

    • @akhilsharma2712
      @akhilsharma2712 2 месяца назад +3

      @@user-jm8fj7ez8s Yup; and that's why even Sam Altman admitted it's "more impressive on your first use" [than when you use it a lot], and that it's "not AGI". But this is already INCREDIBLY useful; think about it like this: even if it can't automate most jobs, what % of humans ACTUALLY work on problems that have 0 prior data in terms of how to solve them? It's only less than maybe 1% if we're being generous. This means the AI will soon be able to eclipse the regular work 99% of humans do, without any further breakthroughs. And THAT is the mindblowing part! (this was generated by O1-mini).

    • @Komaruluten
      @Komaruluten 2 месяца назад

      ​@@user-jm8fj7ez8syeah it does not reason from first principles. Unlike humans, it doesnt explicitly operate through spatial & relational reasoning from the ground up.
      They just had o1 trained by asking it millions of questions, letting it think, and then reinforcing the reasoning that led to the right answers. So basically o1 will know the most accurate and efficient chain of reasoning for familiar questions.
      Will this eventually turn into a super intelligent reasoning engine when scaled up? Nobody knows really but I personally doubt

  • @lorenzocl6002
    @lorenzocl6002 2 месяца назад +15

    All i see is that in a few years AI will be able to do everything and most of us will be obsolete

    • @marcusrosales3344
      @marcusrosales3344 2 месяца назад +2

      Keep in mind these companies lie A LOT! Like the bar exam, it tests in the 60th percentile with the initially hidden caveats in place

    • @danielbrown001
      @danielbrown001 2 месяца назад +1

      @@marcusrosales3344True, but the money doesn’t lie. Until the bubble bursts, very smart people have bet tens of billions of dollars on it being game-changing. And notice how the goalposts keep moving back? “It’s ONLY getting 60% on the bar” is a far cry from 5 years ago when AI could only put out gibberish.

  • @michaelrogers4834
    @michaelrogers4834 2 месяца назад +1

    Wow. That's amazing. What amazes me is that it apparently knows all the tricks.

    • @Ken-vy7zu
      @Ken-vy7zu 2 месяца назад +3

      I am amazed that this is only o1 preview model. . Open AI is probably working on o2,o3,o4 and o5 model😊

  • @nexys1225
    @nexys1225 2 месяца назад +4

    Well, you said it yourself, this book is *very* well known for its problems, and by students.
    I.E. it was in the training set.
    You need to prompt it with your own problems and watch how it fails miserably.
    Take something like Yann Lecun's (infamous) geodesics problem for instance, idk.
    Also, when it has the solutions, it will try to get the solution at the end of the proof no matter what, and will sneak in errors in order to get to this result, so be very carefull.

    • @hypersonicmonkeybrains3418
      @hypersonicmonkeybrains3418 2 месяца назад

      yea but you could set up a special prompt that synthesizes new original problems and automate it, then they would have millions of original problems to work with for training a new model, im sure they will find a way to make it work with all the smart people they have at OpenAI for instance.

    • @nexys1225
      @nexys1225 2 месяца назад

      @@hypersonicmonkeybrains3418 For this to even work, you would need to generate the correct solutions to train on, alongside the generated "original problems", which is impossible, since the whole point of training it is to be able to generate those solutions.

  • @SteveWilsonNinja
    @SteveWilsonNinja 2 месяца назад +2

    Well done! Excellent video! 😮😮😮

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      Thanks so much for watching! Please consider watching part 2: ruclips.net/video/a8QvnIAGjPA/видео.html

  • @fcampos10
    @fcampos10 28 дней назад +12

    I think giving it problems where it asks to arrive at a specific solution (shown in the problem itself) is not a good way to evaluate it.
    I bet the results would be very different if you just asked it to solve the problem by itself.

  • @KCM25NJL
    @KCM25NJL 2 месяца назад +7

    Now give it the Millennium problems and see how it does with those

    • @greenboi5632
      @greenboi5632 2 месяца назад +1

      LOL

    • @evangelion045
      @evangelion045 2 месяца назад

      It still produces wrong answers in topics as basic as finite automaton.

  • @Dante-uw1ge
    @Dante-uw1ge Месяц назад +16

    Chat we're cooked

  • @PSpace-j4r
    @PSpace-j4r 2 месяца назад

    We have to remember to use it as a really smart helper and guide it at times

  • @ISK_VAGR
    @ISK_VAGR 2 месяца назад +5

    But one moment, how can you verify that the GPT did not know about this problem before and is only recreating it from his own knowledge base? You need to give something that you are 100 sure that it doesn't know. For that the best way to do it is to ask if it actually knows the solution directly if GPT4o knows the solution, it is likely that the o1 knows it.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад +2

      This is a good point! I just recorded a Part 2 with some new problems that I believe it didn't have in its knowledge-base. Will be uploading shortly!

    • @EGarrett01
      @EGarrett01 2 месяца назад

      I don't know if it's seen these problems before, but it was tested repeatedly using newly made-up logical and reasoning problems and it solved them. I showed it some work that was unpublished (but actually valid and verifiable) so I knew it hadn't seen it, and it's response was the same as I would expect from someone who was very experienced in the field and seeing it for the first time. So it definitely can reason (in its own way) on its own without already knowing the answer. I highly recommend the "Sparks of AGI" paper or lecture that goes into this in detail.

  • @Benette2
    @Benette2 Месяц назад +2

    So crazy! That level is insane

  • @jekyll366
    @jekyll366 29 дней назад +19

    I gave ChatGPT a couple of master’s degree computer science problems, both solutions were wrong. I had to tell her they were wrong, she apologized and corrected herself. It wasn’t reliable.

    • @magicalgibus3006
      @magicalgibus3006 29 дней назад

      The free model or paid model?

    • @jekyll366
      @jekyll366 28 дней назад +1

      @@magicalgibus3006 I used model 4 and 4o, free though.

    • @oedihamijok6504
      @oedihamijok6504 28 дней назад +35

      @@jekyll366 Habibi he obviously tests the state-of-the-art model o1..

    • @timonbubnic322
      @timonbubnic322 28 дней назад +1

      i gave it multiple undergrad problems from algorithms and data structures, its certainly usefull but 90% of the time fails first try, then maybe in about half of those times you can instruct it how to fix the solution. Im talking about the 4o model. I find it useful for finding dumb mistakes, like missing some boundary conditions or just forgetting an i++ and stuff like that

    • @I_Blue_Rose
      @I_Blue_Rose 14 дней назад +1

      @@jekyll366 Then your assessment was useless, we are talking about O1 preview.

  • @err0rz633
    @err0rz633 2 месяца назад +3

    It would be cool if one of the creators of these problems could get paid to make original ones on the spot and feed it to o1.

  • @konoha4
    @konoha4 2 месяца назад +8

    I would have done the test of giving the answer with some error, for example an extra factor of 2, or an arctan instead of arcsin, and see if it gets the true answer anyway and recognizes the incorrect input. That would make a very convincing test.

  • @p-k98
    @p-k98 2 месяца назад +6

    Well, until you yourself don't know whether what it did was correct, we can't say for sure. It is surprising nonetheless, yes, however if you had given these problems to the earlier version, it would have also arrived at the conclusion required, i think. It would have just done some mumbo jumbo and forcefully arrived at the conclusion, no matter what it got wrong in the process. This time around though, it looks like it actually did all the things correctly in its "reasoning" process.

  • @Alice_Fumo
    @Alice_Fumo 2 месяца назад +4

    The model can do some amazing things, but since its results are thought-wise so far separated from the input question (as opposed to the regular GPT models which just start answering immediately), yet this model can still get some weird elementary things wrong, I'm not sure that I can ever use it for anything where I am unable to verify if its solution either works or is better than one I came up with.
    I think we're at a stage where it is becoming increasingly impossible for people to even evaluate the capabilities of these models. Their ceiling is immense, but when you have it thinking for a few minutes, it can make subtle mistakes at any step. OpenAI is also not exposing the thoughts to users, so it becomes impossible to read through all of it to decide whether it checks out.
    It's gonna take me forever to develop intuition for what sorts of things this thing can and can't do realiably and thus be trusted with.

    • @MrBillythefisherman
      @MrBillythefisherman 2 месяца назад +2

      This gives you a skeleton from which to create the solution and has probably halved the amount of work you needed to do.

    • @andydataguy
      @andydataguy 2 месяца назад

      I'd be more concerned about the people who use this model and figure out how to get past the quirks.

    • @KyleKabasares_PhD
      @KyleKabasares_PhD  2 месяца назад

      Thank you for your feedback, I've gone ahead and recorded and uploaded a Part 2 where I test o1 with questions I have the answers to and questions that I'm confident it didn't have access to in its training. ruclips.net/video/a8QvnIAGjPA/видео.html

  • @akarshghale2932
    @akarshghale2932 2 месяца назад +5

    If you want to test the actual knowledge of the model then use textbooks that were compiled with questions created after the knowledge cutoff of the model. This doesn't reflect its actual knowledge but prior knowledge of the model.