Can ChatGPT o1-preview Solve PhD-level Physics Textbook Problems?

Kyle Kabasares

Просмотров 176 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 янв 2025

Комментарии • 1,1 тыс.

@KyleKabasares_PhD 3 месяца назад ⁺⁹⁶
Hi everyone, thank you so much for the feedback! I couldn't have expected this kind of attention on my video in the first 48 hours. I've taken some of your suggestions in the comments and have created a Part 2: ruclips.net/video/a8QvnIAGjPA/видео.html
Please consider watching!
@AlfarrisiMuammar 3 месяца назад ⁺¹
Open ai say The real Ai o1 version of will be out before the end of 2024.
@User.70793 3 месяца назад
UNIVERSAL BASIC ICOME 2025
@xspydazx 3 месяца назад
it was funny !
the thing is to keep up with the technolgys ad current innovations being deployed as it should not be hard to emulate these neural networks wth the open sourced models ! the aim is to train the local models as best you can at the highest point you capability but keep aware that technolgy needs to adavce to hande these heavy tensor calcualtions hence local model will be able to perform these tasks without the eed of outseide intervenetion so get an early start !
or it will be a nightmre of stuy to catch up : it has taken me a full year of constant python etc doing this training and implementation to keep up and get ahead ! .. that gap is widening @
Just expect to be able to host a 24b or 70b local within the next two years ! , a full genrative model ! ( so you could host multiple mini 7b agents at the same time ! hence a power full system ! ( agentic ! )
@debragotenks 3 месяца назад ⁺¹
How much did open ai pay you to make this ad?
@User.70793 3 месяца назад
@@AlfarrisiMuammar I can't wait I'm still intrepid for GPT 5
@An_Attempt 3 месяца назад ⁺³⁶¹
It is worth noting that GPT has probably 'read' ever thousands of answer books on Jackson, As well as all of Stack Exchange, as well as several study guides on Jackson in particular. So if you want to really test GPT ability you probably need to create novel questions that will not be found in any textbook or online form.
@gabrielbarrantes6946 3 месяца назад ⁺²⁷
Exactly, problems that solve students are already done somewhere on the internet, is just about googling it and copy paste the solution.
@taragnor 3 месяца назад ⁺⁴³
It's the same issue with AI being "great at programming" because it's extensively trained on leetcode problems.
@gabrielbarrantes6946 3 месяца назад ⁺⁸
@@taragnor being good at leetcode is not even being good at programming.
@khiemgom 3 месяца назад ⁺⁵
@@gabrielbarrantes6946 its doesnt have access to the internet
@ShayPatrickCormacTHEHUNTER 3 месяца назад ⁺³
@@gabrielbarrantes6946 said the web dev.
@kylechickering5890 2 месяца назад ⁺⁴⁴
Note 2: I haven’t looked through the answers, but in cases where GPT knows what the answer should be, it will make up crap to fill in the middle. I’ve asked it many PhD level math questions where it has hallucinated its way to a “plausible” answer.
@KyleKabasares_PhD 2 месяца назад ⁺⁸
I'm planning on making a follow up video on comparing my approach to solving this problem with ChatGPT's! Thanks for pointing that out
@omatbaydaui722 2 месяца назад ⁺⁷
@@KyleKabasares_PhD that's not what he was saying. You were providing GPT 1o the answers, so of course it would give you the right answers since you provided them for it. To know if it truly solves PhD questions, you shoudn't give it questions like :"prove that this formula is verified" but rather " what is the formula for ... ?"
@KyleKabasares_PhD 2 месяца назад ⁺⁷
@@omatbaydaui722 I understood what he was saying. I’ve verified it doing problems correctly from start to finish (non-Jackson) without knowing the answer in advance! But in those cases I actually did the problem unlike here, so I’m planning on revisiting the problems in this video.
@nexicturbo 3 месяца назад ⁺⁴¹
Crazy part is that this isn’t even the full model, which is even better
@CommentGuard717 3 месяца назад ⁺⁵
Yeah, it's not even a beta, it's a preview. And it's still using the last gen model. They're coming out with a new model pretty soon.
@CrazyAi166 3 месяца назад
To us who's math students😂😂❤
@alvaroluffy1 3 месяца назад ⁺¹
@@CommentGuard717 yeah, imagine the GPT 5 implemented not preview version. Thats gonna be fucking wild and its not that far from now
@Ken-vy7zu 3 месяца назад
@alvaroluffy1 well, they are now working on chatgpt 6.0
@alvaroluffy1 3 месяца назад
@@Ken-vy7zu shut up you know nothing, stop making up things, they are still working on gpt-5 you realize that right?
@delxinogaming6046 3 месяца назад ⁺⁷⁶⁵
This is the worst this technology will ever be….
@armwrestlerjeff 3 месяца назад ⁺⁷²
That's an incredible truth
@MaxWinner 3 месяца назад ⁺⁵¹
That's a terrifying truth
@thegeeeeeeeeee 3 месяца назад ⁺²⁵
Eh it might hit a wall though.
@wbay3848 3 месяца назад ⁺⁶⁶
@@thegeeeeeeeeeeI’m here from the future, your comment aged poorly
@igorbessa3563 3 месяца назад ⁺⁵
It might stagnate tho
@jamesmcadory1322 26 дней назад ⁺³⁰
It could have made several errors in its derivation but displayed the right answer since it knew the desired result. Ai is notorious for simple mistakes and contradictory statements. It’s still impressive but also as you admitted you didn’t really check it in depth and gave it problems where the end result was given.
@JohnSmall314 3 месяца назад ⁺⁵⁰
I just tried o1 with some fairly simple integrals, which it got badly wrong and I had to guide it to the correct answer. So I'd advise checking every step in the derivation.
@JurankoNomo 2 месяца назад ⁺²⁷
You have to remember that this book was probably directly in the chatGPT training data, so this may not be a valid measure of novel problem solving ability
@dimitriskliros 2 месяца назад ⁺⁵⁵
to be fair, you don’t seem to have actually checked the model responses, there could have been mistakes or hallucinations throughout
@luissantiagolopezperez4938 2 месяца назад ⁺¹
Can you point out any specific hallucinations on this video?
@particle.garden 2 месяца назад ⁺⁹
100% this. It's given the answer to work towards. I do not have enough knowledge in this area to prove that it came to it's conclusions incorrectly, but it's a well known quirk.
@amcmillion3 3 месяца назад ⁺⁸¹
The issue with this "test" is that the solutions to one of the most famous physics books are certainly in it's training data. Give it a problem for which there is no known solution. Or at a minimum give it a problem from an unknown text book. Find an old physics or math book from 100 years ago that has no digital copies. Then ask it questions from that and see how it does.
@akarshghale2932 3 месяца назад ⁺¹
Exactly
@Lucid.28 3 месяца назад ⁺²
@@amcmillion3 yes I did it and it’s not very accurate , I feed in 3 jee advance questions and out of which it could only answer 1 correctly , 1 he did it wrong even with hints and 1 he had to solve wrong first and than with hints it was able to solve it
@apache937 3 месяца назад
best of all, make your own
@pratikpaharia 3 месяца назад ⁺³
If they were JEE Mains level questions, than solving 1/3 would put it at the same level as those qualifying in top 1000 for the exams. FYI, The highest marks in JEE Mains were usually around 33-35%. I’d wager that would be folks with l an IQ level of ~130+, that is pretty damn good for GenAI. On the normal distribution curve of IQ where 100 is the population average, 130 should at least yield 1 or 2 sigma of confidence level to the statement “GenAI has definitively exceeded the average of the human intelligence level”
@Lucid.28 2 месяца назад
@@pratikpaharia nah
@SomeRandomdUde14 2 месяца назад ⁺⁴⁷
Testing it with a book that is “infamous” probably isnt a great benchmark considering that it would mean that there is a considerable database related to that specific book it could read from. If you could test it on a novel problem that would be better
@masterfall27 2 месяца назад ⁺⁵⁵
if its a famous problem, isn't there a good chance the solution was already in the training data?
@Stepbro126 2 месяца назад ⁺¹⁸
In general, ML models shouldn’t memorize the training data. A lot of effort is put into ensuring the model learns how to do the problem rather than memorizing.
@Balls23912 Месяц назад
it was backtracking and double checking though?
@bladerunner9531 Месяц назад
Exactly. But similar data as the training data yields high accuracy
@chudchadanstud 3 месяца назад ⁺⁴²
They told me AI would replace hard labourers and Fast food workers first leaving us more time to think so I went to college now I'm college and I'm the first one being replaced.
@phen-themoogle7651 3 месяца назад ⁺⁵
Don't worry, everyone will be replaced in 3-5 years💀
@DeathDealerX07 3 месяца назад
@@phen-themoogle7651 💯
@avijit849 3 месяца назад ⁺⁴
yeah it's everyone. from labourers to physicists, Ai could do everything much more effectively.
the biggest surprise was creativity, that Ai could create art.
@meanmachine99999 3 месяца назад
Just don’t be a data analyst and if you want to be a computer scientist get out of school and get into the languages and start building plenty of budding industries right now
@glub1381 3 месяца назад
@@avijit849 ai is not creative and I don't believe it ever will be
@kylechickering5890 2 месяца назад ⁺²⁴
I haven’t watched the solving yet, but immediately I would like to point out that choosing problems which have known solutions may mean that the model has already seen (and simply memorized) the correct answer!
A better test is to ask it an impossible problem or one that solutions don’t exist for and then try to see if it’s generated solution is correct.
@pripyaat 2 месяца назад ⁺¹
Absolutely. If you simply Google the first 15 words of Problem 1, the very first result is a pdf document with a detailed, step-by-step solution. If anything, assuming the steps provided by o1 are correct, it just demonstrates it's decent at summarising search results...
The same goes for programming. A lot of people get easily impressed when GPT "writes" a 50-line script that's basically within the first 3-4 StackOverflow posts after a Google search. I mean, yeah, I won't deny it's really convenient that the tool can save you a few clicks, but saying that it has an understanding of the answer it's giving you is (as of today) still a stretch.
@o_o9039 2 месяца назад
if you know how ai works the way they are trained is lossful they don't have access to word for word of every bit of their training info if they did these models would be terabytes upon terabytes in size and would be extremely slow.
@pripyaat 2 месяца назад ⁺¹
@@o_o9039 I know how they work, and I'm not saying the model has all the information stored in its parameters, but it's no secret GPT can indeed search the web and summarize its findings. Copilot (based on GPT4) openly provides sources for almost everything it spits out.
@KRYPTOS_K5 2 месяца назад
@@pripyaatHow to know if it cheated?
@98danielray 2 месяца назад
@@pripyaatjeez. even worss than I thought
@rishikhan2 3 месяца назад ⁺¹²
Instead of telling it the answers, try asking it to find them. When I did this, it got the first one to an infinite sum but didn't reduce the infinite sum to the final answer: pretty good! For the second one, it had an extra factor of 1/pi that it made up. For the third it completely disregarded the angular dependence of the scattering and failed.
@paulojcavalcanti 3 месяца назад ⁺⁹
i asked it to calculate some stuff (quantum mechanics) for me and it also did some difficult step without explanation. i asked it to prove that step and it gave me a proof containing 1 mistake, but i wasn't sure and asked about that step, then it realized it was wrong, explained exactly why it was wrong, fixed it, and remade the calculation corrrectly
@olivetree9920 3 месяца назад ⁺¹⁷
And remember this is a truncated version of the model. It's full version is much better at problems like this
@redtoxic8701 3 месяца назад ⁺³
What? 😮
@ominousplatypus380 3 месяца назад ⁺¹²
The model has most likely been trained on these problems and their solutions, since they've been around on the internet for a long time. So it isn't really a good test of its abilities since it has just memorized the solutions. That being said, I also tried it with some problems from the 2024 International Math Olympiad, and it was able to get at least two (out of six) of them correct consistently. I only tried problems where the answer was a single number/expression, going through proofs would be much more work. The model's knowledge cutoff should be in October 2023 so it shouldn't be possible for it to have seen these problems before. It's still hard to tell since OpenAI isn't being very transparent with their methodology, but if the model is actually able to solve novel IMO level problems it has never seen before, color me impressed.
@contentfreeGPT5-py6uv 3 месяца назад ⁺¹
I test ,AND correct answer for me o1 2024 final with alternatives
@mirek190 3 месяца назад
gpt4o has the same training data and can not sole it? so ...
@1.4142 Месяц назад ⁺²³
The key is having the answer before hand so it can guess from both ends and connect them. Ask it to evaluate a parameterized surface integral even with wolfram plugins and it will make mistakes.
@Draco-jk3rb 2 месяца назад ⁺⁷
If you want to know, the steps are simply contextual changes, it is essentially a gpt that has the instructions self-set, and the output of its thinking steps is the instructions it is providing itself at each step. it works because by shifting context at each step rather than only a single context of the original message and response, it is able to aproach problems itteratively from different 'perspectives'
@cluelessMuslims-ed5js 3 месяца назад ⁺⁶⁰
This channel is the reason why I'm not reading fluid mechanics rn
@KyleKabasares_PhD 3 месяца назад ⁺³⁰
Can’t tell if I should say thank you or I’m sorry lol
@dreamscapes2727 3 месяца назад ⁺²
Fluid mechanics is extremely fun🤲
@warsin8641 3 месяца назад ⁺¹⁴
Just a a few years ago no one ever imagined bots thinking...😭
@KyleKabasares_PhD 3 месяца назад ⁺¹⁰
I certainly didn't!
@anthonypenaflor 2 месяца назад ⁺²⁸
The model's performance is undoubtedly impressive, but if it was trained on this book (which seems likely), it's not truly generalizing to new data. For a fair assessment of its capabilities, models should be tested on novel, unforeseen problems, rather than those for which the answers are already available. In practice, models are typically evaluated on fresh data to gauge how well they can generalize. To accurately measure performance at this level, problems should be novel and manually verified, even if that takes considerable time (1.5 weeks or more).
@pentazucar 2 месяца назад ⁺⁵
I believe the book does not have the answers to the problems, so even if it was trained with the book it shouldnt help it to solve problems. Still it is possible that it just took the answers from some physics subreddit post and just pasted it
@velcrawlvfx 2 месяца назад
It backtracked on its own answers double checking so I doubt it already knew the answer off it being trained off the book
@Daniel__- 2 месяца назад
IT BACKTRACKED ITSELF THOUGH????
@Daniel__- 2 месяца назад
Not to mention, universities have and still run research where they create brand new tests solely for having AI take them
@knotratul5106 Месяц назад ⁺⁷⁷
It was trained on this data lol
@juanperez-lh9mt Месяц назад ⁺⁷
Yes, but it's not like it just looks it up. It actually thinks whatever that means for a machine
@ancientmodis 29 дней назад ⁺⁹
@@juanperez-lh9mtIt's a language model, it LITERALLY looks it up.
doesn't make this less impressive
@tachyonindustries 26 дней назад ⁺¹
@@juanperez-lh9mt It doesn't look it up, but the questions and answers are likely part of the training data for the model, which means that they're, in a way, already imprinted into the AI's "memory". Now, it surely still has to do some work to figure it out, but sometimes when you see something like "Recognize that _____ = _____" in some completely unrecognizable way, it's probably something the AI has just summoned from "memory".
@nashh600 21 день назад
@@tachyonindustries It's not prevalent enough in the training data that we can say it recognizes the problem, it just recognizes the type of problem and reasons from there
@BrazaG 5 дней назад
@@ancientmodis LLMs do not look up in their databases.
The database is used to train the model, all they learned is probabilities to figure out the most probable continuation to your prompt.
It does not look anything up. LLMs do not even have access to look up the raw data they have been trained on....
That is fundamentally NOT how they work.
@militiamc 3 месяца назад ⁺²⁵
O1 was trained on all Internet, including that book
@HedgeFundCIO 3 месяца назад ⁺⁷
So were all of us.
@roro-v3z 3 месяца назад ⁺²
@@HedgeFundCIO the difference is we can think, but it can only answer. Its a great tool!! but not think on its own
@casaruto 3 месяца назад ⁺⁷
Actually we dont know if its thinks because we dont know how we think. This is a philosical debate in ai community over the years.@@roro-v3z
@Hosea405 3 месяца назад
@@roro-v3z almost like you didn't see it go through problems step by step to get to an answer..... It can indeed reason on it's own now
@roro-v3z 3 месяца назад ⁺¹
@@Hosea405 yes it did but on training data, but it won't have new ideas that have not been trained
@marul688 2 месяца назад ⁺³⁸
There is a problem with the test:
Since the answer ,,show that.." is given, the AI will always show the correct answer, the reasoning might be flawed. It would be better to cut out the correct answer from the problem and see what AI will answer then.
@maxaposteriori 2 месяца назад
This applies to humans completing the problem as well, and there was an effort made to check the steps.
I agree, it might be interesting to see if it could though (although if it succeeds, will likely express it in an different form which may be hard to verify).
@pentazucar 2 месяца назад ⁺³
i agree with you, specially taking into account that it may just be bluffing and we would have no idea
@briansauk6837 2 месяца назад ⁺³
Prior versions had bogus steps that didn’t really follow legitimate steps, and units were often fouled up. Definitely deserves to be looked at deeper to see if that has improved.
@samsonabanni9562 3 месяца назад ⁺⁸
" OpenAI's new AI model, "o1," has achieved a significant milestone by scoring around 120 on the Norway Mensa IQ test, far surpassing previous models. In a recent test, it got 25 out of 35 questions right, which is notably better than most humans. A critical factor in these tests is ensuring the AI doesn't benefit from pre-existing training data. To address this, custom questions were created that had never been publicly available, and o1 still performed impressively"
@marcianoforst6311 3 месяца назад
So it’s already smarter than 90% of the global human population, and it knows everything on the internet.
@Ridz149 22 дня назад ⁺¹⁰
Hahaha I love it, we are truly living in the future guys, appreciate it!
@diophantine1598 3 месяца назад ⁺¹³
Since that book is older than the model, I wonder if it appeared in its training data.
@Analyse_US 3 месяца назад
100%. Perplexity pointed me to at least 6 pdf versions available for free online. There are also lots of study notes
online available for this text. Although I have no idea if it is memorizing answers.
@lolilollolilol7773 3 месяца назад
@@Analyse_US it looks like it actually tries to solve the problems.
@Analyse_US 3 месяца назад ⁺¹
@@lolilollolilol7773 I agree, it's definitely not just remembering the answer. But is it remembering steps to solving the problems that it found in online study examples? I don't know. But my own testing makes me think it is a big step up in capablity.
@denysivanov3364 3 месяца назад
@@Analyse_US AI memorizes patterns. If pattern is similar but exercise is different AI will solve it.
@decouple 3 месяца назад ⁺¹²
Its funny how good it is at some things and how terrible it is at others things still, seems its abilities are heavily dependent on whether examples of the problem were included in its training data. I've asked it to create a 32 bit crc algorithm and it did it perfectly, however when asking it to create considerably more trivial 3 bit crc algorithm (which is uncommon and quite useless), it failed miserably and in fact produced multiple wrong result that got worse and worse as i pointed out the flaws.
@Linshark 3 месяца назад ⁺⁷
It might hallucinate since it knows the answers. So one would need to check all the calculations.
@KyleKabasares_PhD 3 месяца назад
I just filmed a part 2 where it involves some problems where the answer is not known in advance, and problems that I'm confident it did not have access to previously: ruclips.net/video/a8QvnIAGjPA/видео.html
@otty4000 2 месяца назад ⁺¹¹
i am doing a phd in ml related field.
Setting fair benchmarks and tests in the current day is quite hard considering the shear scale of data top models are trained on.
And using a famous physique text book isnt really a good attempt.
model o1 reasoning is a massive step up though for sure, i think it could do a similar blind test like this very soon.
@pancakesauces 2 месяца назад ⁺⁶⁴
Now ask it if 9.11 is bigger than 9.9
@goldenspoon8203 2 месяца назад ⁺²
it worked for me 👍
@anhucphan3828 2 месяца назад
bruh it said 9.11 is bigger than 9.9
@thatonebruneian 2 месяца назад ⁺⁶
*is 9.11 bigger than 9.9*
[Thought for 4 seconds]
"Yes, 9.11 is greater than 9.9."
*Are you sure?*
[Thought for 17 seconds]
"You're right, and I apologize for the mistake in my previous response. Actually, 9.9 is greater than 9.11. Here's why:
9.9 can be expressed as 9.90 or 9.900.
9.11 remains 9.11 or 9.110.
When comparing:
9.900 (which is 9.9) is greater than 9.110 (which is 9.11).
Therefore, 9.9 is greater than 9.11."
I suppose we're going to have to make sure ChatGPT re-evaluates their answer every single time
@craycody 2 месяца назад
I just did it and it got it right first try:
No, 9.11 is not bigger than 9.9; in fact, 9.11 is less than 9.9. Here's the explanation:
9.11 equals 9 plus 0.11.
9.9 equals 9 plus 0.9.
Since 0.11 is less than 0.9, it follows that:
9
+
0.11
9+0.11 (which is 9.11) is less than
9
+
0.9
9+0.9 (which is 9.9).
Therefore:
9.11 < 9.9
@trucid2 2 месяца назад ⁺¹
9.11 is bigger than 9.9 when it comes to version numbers.
@foubani3673 3 месяца назад ⁺²¹
This is scary. But you have to try with novel problems that the AI has never seen before. Chatgpt has been for sure trained with the Jackson book!
Nevertheless, the reasoning capabilities are astonishing.
A new era has begun.
@sid.h 3 месяца назад ⁺¹⁰
" Chatgpt has been for sure trained with the Jackson book!"
This is such an oft-repeated nonsense statement though. Just because a problem might be in its training set, the model will not be significantly or any better answering that exact problem than any other problem in the same category.
It's, like. Do you remember every homework math equation you have solved in your life?
Would you be any better at solving a problem you have already encountered once 10 years before vs a similar novel one? No, of course not, unless you have superhuman memory where you keep an exact copy of everything you've done ever.
Similarly, these models don't memorize. They synthesize. They are learning models, not search engines or "neually indexed databases" or whatever.
@denysivanov3364 3 месяца назад ⁺²
@@sid.h Ai remembers patterns, not particular problems. And indeed if some pattern is missing AI will miss it, if pattern is well represented AI will solve it well. Better architecture of neural network remembers more and remembers and solves corner cases better. This is what we see in chess networks such as Leela Chess Zero.
@Nordobali 3 месяца назад ⁺⁵
I don't understand anything about physics and advanced mathematics, but this video just made me excited for the future again!
@a.b3203 2 месяца назад ⁺¹³
As a person doing a bachelor's in EE, this will be very useful for me. Like many, I only wonder what'll happen in the future when it gets even more advanced?
Maybe take my reduced earnings and live off the land somewhere. Away from this.
@ibzilla007 3 месяца назад ⁺⁸
If it is on the internet, it's in its training data. You would need to find questions that it has not been trained on. This is why benchmarking is so hard
@maalikserebryakov 3 месяца назад ⁺⁵
It still impressive the model can accurately comprehend which part of its training data deals with the problem in question.
There are human beings who haven’t mastered this skill lmao
@Weirdgeek83 3 месяца назад ⁺²
Stop the downplaying. These types of problems are impossible to solve without reasoning. Simple pattern recognition doesn't make this possible.
This cope needs to stop
@xorqwerty8276 3 месяца назад ⁺¹¹
Imagine 10 years from now
@phen-themoogle7651 3 месяца назад
@@xorqwerty8276Star Wars Universe but more humanoid bots on our planet , and billions of them being like gods building anything and everything they imagine. Earth is surrounded by a giant dome that extracts/enhances light from the sun combined with technology that speeds up how fast plants or trees grow, we have a combo of biological machines that have become humans too and are interbreeding half humans half machines. The sun is all we need to survive now. Millions of unique new species emerge.
(10 years is like millions of years if true ASI comes in a year from now)
Even 2 years could be very wtf lol 😂
@MrSchweppes 3 месяца назад
In less than 3 years lots of knowledge workers will be displaced by AI.
@rwi6760 3 месяца назад ⁺⁴
as a high schooler who had taken part in aime, o1 is really impressive. aime problems get so much harder when it gets to the latter half. so 83% (o1) compared to 13%(gpt4o) is huge. the latter solve possibly only solve the first two which are not challenging at all
@AlfarrisiMuammar 3 месяца назад ⁺⁹
Open ai say The real Ai o1 version of will be out before the end of 2024.
@Romathefirst 3 месяца назад ⁺¹
really? where?
@achille5509 3 месяца назад ⁺⁴
They said about 1 month but will probably be end of 2024 as you say, o1-preview is not the full version there is the "full" o1 that is better yeah
@themonsterintheattic 3 месяца назад ⁺⁴
i’ve been watching lots of videos on o1 and i’ve not had a wow moment yet…. but this was it
@stevedavey9435 3 месяца назад ⁺⁵
God, if only I had this back in 2003 when I completed my physics degree. I would have saved myself so much pain and suffering.
@cheisterkamp 3 месяца назад ⁺⁹
Since it is an infamous book, how do we know that it really solved the problems by reasoning and is not just trained on the existing solutions?
@hxlbac 3 месяца назад
Is there the answers at the back of this book?
@cheisterkamp 3 месяца назад ⁺¹
@@hxlbac No, but an Instructor's Solutions Manual online as PDF and several other sample solutuons.
@luisalfonsohernandez9239 3 месяца назад ⁺⁶
Maybe it was in its training dataset, would be interesting for you to test something it could not have seen during training
@iFastee 3 месяца назад ⁺¹
not maybe, for sure. i know people dont have to be all experts in exactly what the black box of deep learning is doing but holy people are so dumb... i wonder if they don't think that IF what they think is true, meaning the models being this great, that in 1 month we wouldn't have to get new discoveries in all science fields...
which will not come because the current AI is 100% data capped. its just memorization of PDFs and manifold recalling
@KyleKabasares_PhD 3 месяца назад
This is a fair point! I have gone ahead and uploaded a Part 2 using problems I'm confident it had not seen before and that I have detailed answers to! ruclips.net/video/a8QvnIAGjPA/видео.html
@yzhishko 2 месяца назад ⁺²⁴
Solutions are publicly available and most probably in training datasets already. LLMs are good at what they already learned, but even not 100% accurate there.
@I_INFNITY_I Месяц назад ⁺⁴
"to my knowledge, its data is only until october 2023, and it can solve problems created after that data cutoff just as well. (for example it o1 mini was able to solve advent of code programming problems published december 2023)"
@brucerosner3547 21 день назад ⁺¹
This is true for humans as well. I have worked in aerospace for a major company for many years. When I have to solve a difficult engineering problem I first search for for a so-called a "subject matter expert" in the field and it's quite likely that he or she will know the answer.
@lolilollolilol7773 3 месяца назад ⁺⁶
Incredible. It would be interesting to see what happens if you give it to solve an incorrect result. Will it show that your result is incorrect and instead give the correct one ?
@bgill7475 2 месяца назад ⁺¹⁴
This was an interesting test. I still think it's funny when people say these models don't understand.
Anyone who's used them enough understands that they do understand.
One nice thing is that you can ask follow up questions as well and ask why something is like that, or ask it to try things in a slightly different way if you want it done differently.
@woosterjeeves 2 месяца назад ⁺⁵
I dunno about latest models, but ChatGPT 3.5 does NOT "understand" anything. It feeds you fake references, and when you repeatedly tell it it is doing so, it will say "sorry" and continue to feed you fake references. That is not its fault--it is not "replying" or "responding" or dong anything a living being is doing. If you give it a training set containing PhD Level physics problems, sure it can solve those problems. That is just predicting output from a training data.
@人人人人人人人人 2 месяца назад ⁺¹
@@woosterjeeves This isn't GPT 3.5 though, and that specific model you mentioned was released back in November of 2022, the first public release of ChatGPT. In the video, you can see it's process of reasoning. ChatGPT doesn't use fake references if it's able to break it down and be able to express why and how it conducts it's problem solving and reasoning. Also to "That is just predicting output from a training data", one, how is that different from learning? Isn't that the point of teachers, to help you predict and reason the output from the input of questions and data? Two, this is just a preview, not the full model, and it is able to do extremely difficult problems like these, explain the reasoning, the process, and give the right answer. We are slowly gravitating towards such a world where such an excuse of prediction of data will no longer be viable to argue about. The model is able to understand. The model is able to think with it's data. It's putting formulas and answers together from it's data, to reason and to form intelligent answers and responses when in contrast, the same problems make the most qualified PhDs scratch their heads. Reminder, these questions take around 1.5 weeks as said to solve ONE problem, GPT-o1 does it in less than 2 minutes.
@woosterjeeves 2 месяца назад ⁺³
@@人人人人人人人人 Sure. I am still flummoxed why someone would add "understanding" to a prediction model. If you think prediction (from training data) is equal to understanding, then algorithms are already "understanding". Why hype this one? OTOH, if you think there is something qualitatively different, then we can talk about that. But you cannot claim both.
Are chess computers "understanding" because they can do moves that make super GMS scratch their heads? If so, then the argument is already over. I am only cautioning against use of common-term words ("understanding") which makes one think in terms of sentience. A language model has no such thing.
Does this mean AI will never reach sentience? I never said that--just that the video does not do it for me. I am totally clueless why others are impressed about this model's "understanding", the same way I would be if someone said Alpha-Zero (the chess AI) understands chess. That is all.
@Smrda1312 2 месяца назад ⁺¹
Please refer to the chinese room problem.
@fr5229 2 месяца назад
@@woosterjeeves If you’ve only used 3.5 then I’m not surprised that’s your opinion 😂
@plonkermaster5244 2 месяца назад ⁺⁴²
the problems are known by the llm already, it has been trained on the issue it dident come to a conclusion through reasoning
@tomyao7884 Месяц назад ⁺³
to my knowledge, its data is only until october 2023, and it can solve problems created after that data cutoff just as well. (for example it o1 mini was able to solve some advent of code programming problems published december 2023)
@matiasluna514 Месяц назад
@plonkermaster5244 Your statement it's half true, LLMs need to have existing information to propperly work. However, unless the problem presented needs an actual new theory with previous research and a never seen formula, LLMs can recognize the formulas needed to solve the problem. Good observation.
@I_INFNITY_I Месяц назад ⁺⁶
Lol, it's not the case whatsoever, keep coping though.
@ZONEA0official Месяц назад ⁺¹
@@matiasluna514to be fair, we as humans need to do that as well haha
@Kurt0v 22 дня назад
@@I_INFNITY_I LLMs do not do reasoning. They just give the appearance of doing so. It's one of the most researched topics of LLMs
@FlavioSantos-uw1mr 3 месяца назад ⁺⁶
Not bad for a model smaller than the o1 and based on GPT-4, to be honest I don't know how I'll be able to test upcoming versions like the ones based on GPT-5.
I can't wait to use this on university projects, there are so many things I need to go looking for experts for relatively "easy" tasks.
@danielbrown001 3 месяца назад
There’s so much potential in the pipeline. Imagine the o1 techniques applied to image/video generation. Bye-bye obviously fake images, and hello “indiscernible from reality” images.
Also, once o1 is layered on top of GPT-5, we’re likely talking “competing with or beating best-in-the-world level scientists/thought leaders” in different fields. This will fuel more investment into compute farms to create even MORE powerful AI, and multiple instances can run simultaneously to solve problems that would take humanity millennia to solve otherwise. Including AI researching how to improve AI in a self-improving recursive loop that will only stop upon reaching the physical boundaries of the universe.
@boredofeducation-sb6kr 3 месяца назад ⁺²
The way this model was trained was it took physics problems just like that and used a model like gpt4 to create reasoning chains until it could actually derive the correct answer. So it's not surprising. It can already solve textbooks that are well solved already because the answer is very objective and once you get a solid reasoning chain to get to the answer, you can simply train the model on that
@Patrick-vv3ig 2 месяца назад ⁺³⁶
"PhD-level". Our undergraduate theoretical physics course in electrodynamics used Jackson lol
@mohammadfahrurrozy8082 2 месяца назад ⁺¹⁶
smells like a clickbait title you know
@trent2043 2 месяца назад ⁺¹
Definitely non-standard in the US.
@andreaaurigemma2782 2 месяца назад
You used it as a vague reference book but you never really read through it.
@Patrick-vv3ig 2 месяца назад
@@andreaaurigemma2782 Of course I did.
@andreaaurigemma2782 2 месяца назад
@@Patrick-vv3ig no you didn't and if I had a penny for every shitty undergrad bragging about how they went through hard books without understanding a single thing I'd be rich
@YShiishening 3 месяца назад ⁺⁴
And they wrote how this was just a step of many like that to come. In 5-10 years the world may be changed fundamentally, 20 years it’ll be hard to recognize
@BAAPUBhendi-dv4ho 3 месяца назад ⁺¹⁰
The real question is can it solve Indian entrance exam questions or not?
@hidroman1993 3 месяца назад ⁺⁷
You show up in 2005 with this tool and they'd call it AGI
@bbamboo3 3 месяца назад ⁺⁴
I asked it to find how much the earth would have to be compressed to become opaque to neutrinos: It took it 39 seconds to say 26 km diameter. Totally fascinating how it got there...(01Preview)
@Diamonddavej 3 месяца назад
The correct answer is ~300 meters. It told me 360 meters.
@tanner9956 2 месяца назад ⁺⁶
ChatGPT is truly amazing i wonder what this technology will be like in 10 years i think schools should really use this technology and allow it because it’s not like it’s going away tomorrow. I also think this technology makes it impossible to be ignorant
@sergiosierra2422 3 месяца назад ⁺¹⁰
Lol that was my reaction last year with gpt4 but with programing
@albertoalfonso7835 2 месяца назад ⁺⁶⁵
If the solutions exist on the internet is it really solving it? Or just analyzing and printing the answers . A true test could be a creating a unique problem with no known solutions
@dieg9054 2 месяца назад ⁺⁹
how would you know if it was correct or not if there was no known solution?
@d1nrup 2 месяца назад ⁺⁸
@@dieg9054 Maybe he means a problem that isn't posted on the internet since ChatGPT gets its solutions from the downloaded internet data.
@TrueOracle 2 месяца назад ⁺⁴
That isn't how LLMs work, unless it is a wildly popular problem the small details it learns from the internet gets lost in the neural web
@mradford10 3 месяца назад ⁺⁹
Great video and interesting commentary. It’s interesting you think this might be a good study aid or a tool… however I just watched you take longer to check the answers than the model took to solve them… and your an actual subject matter expert… and as you correctly pointed out, this is just a preview of the full model capabilities. This new type of model will not help experts, but replace them. They will eclipse not only human level knowledge, but human level speed. This is not a tool. It’s disruption personified. With something this good (and as the saying goes, this is as bad as they will ever be as they will only improve from this time onwards) what purpose will it serve to complete university study for 3 years, only to try and find employment in a career that no longer requires humans. Amazing.
@msromike123 3 месяца назад ⁺³
It's a machine, like cotton gin, the steam engine, the locomotive, etc. All advance of technology has displaced people from some jobs into others. And yet we are still here. What's the alternative? Structure society to be less productive and less efficient in order to keep people employed in obsolete jobs? That will just slow the growth of the economy and cause a lower standard of living, leading to poverty and hunger as the world population keeps multiplying. It's going to put people out of work, we will be ok. Becoming a Luddite is not going to change anything.
@AlfarrisiMuammar 3 месяца назад ⁺²
@@msromike123Cars replace horses So will humans suffer the same fate as horses?
@msromike123 3 месяца назад ⁺¹
@@AlfarrisiMuammar I am glad you are thinking about it now. 1) Truck drivers replaced wagon drivers (not horses.) There are many more truck drivers now. 2) The standard of living for both truck drivers AND horses is higher than ever. Same thing goes for automobiles and horses.
@JJ-fr2ki 3 месяца назад ⁺⁹
I suspect this was trained on the Jackson book.
@Net_Flux 3 месяца назад ⁺⁸
7:36 You're right. It's not obvious. Proving that identity is the main crux of the the problem and the model just skips over it. My professor would give me a 0 for that entire problem if I did it this way. This is the reason you shouldn't give it "prove it" problems where you've already given it the answer.
@danwhaley_ 3 месяца назад
Maybe the follow on question is to ask it to explain that part? I think the takeaway here for me, is that we're 2 years into this resolution, and the basic tech is already at this level. How do we think about the world if we extrapolate another 5-10 years? What do we want and how will this inevitable tech change whats possible?
@rickandelon9374 3 месяца назад ⁺⁴
The first time i watched a video like this was from sixty symbols where they also tried to solve physics problems using the original vanilla Chatgpt 3.5. They didn't get anywhere close to this level. I think the progress is reallty accelerating. I also think that inference time compute is a very real thing and the guys at openai have solved it with this new model in a fundamental way for sure. I think there will be other ways to implement system 2 thinking but i think that using reasoning tokens for accomplishing this is maybe the best and coherent way to go forward. I truly think that with o1, we have the first complete architecture for AGI.
@robclements4957 3 месяца назад ⁺⁶
Tip to past questions in: ask ChatGPT 4o to transcribe the picture
@trejohnson7677 3 месяца назад ⁺⁵
the changing my approach part was kinda scary ngl
@cross4326 3 месяца назад ⁺¹⁵
GPT is most probably trained on the answers since it is a well known book
@sCiphre 3 месяца назад ⁺²
Maybe, but it showed its work
@h-e-acc 3 месяца назад ⁺¹
I mean it gave you step by step how it was able to solve those problems and gives you its insights into how it’s thinking. That is just wild beyond imagination.
@sephypantsu 3 месяца назад ⁺⁶
I tested to solve a sudoku and it failed. It either give wrong results or change up the original question
Still did much better than 4o when I tested it a few months ago
@cyrilvonwillingh5523 3 месяца назад ⁺¹
This is a great to see the model's real ability. Thank you for the demonstration.
@KyleKabasares_PhD 3 месяца назад
You're welcome! I have made a part 2 using new questions that I'm confident it didn't have access to beforehand: ruclips.net/video/a8QvnIAGjPA/видео.html
@debasishraychawdhuri 3 месяца назад ⁺³⁶
You have to give it your own problem. The book is part of its training data. That is why it just knew the sum.
@lolilollolilol7773 3 месяца назад ⁺⁵
Even if that was the case, the simple fact that it worked out the path to the solution is impressive. But you are likely wrong.
@lewie8136 3 месяца назад ⁺¹
@lolilollolilol7773
LLMs literally predict the next word based on probability. If the answer isn’t in the training data it can’t answer the question. It doesn’t have reasoning skills.
@Lucid.28 3 месяца назад
But they do have reasoning skills ,
@lewie8136 3 месяца назад
@@Lucid.28 No they dont.
@Tverse3 3 месяца назад ⁺³
@@lewie8136they recognize patterns like we do... We don't really think, we also predict things based on the patters we see... We just named it thinking.
@AlexBerish 3 месяца назад ⁺⁴
FYI you should put new problems in new chats to avoid polluting the context window
@JJ.R-xs8rf 2 месяца назад ⁺¹²
The first one is the easy one? Yet at the same time you're amazed that it solved it in 122 seconds, while you mention that it generally takes others 1.5 week.
@Balls23912 Месяц назад
he shouldve clarified that jackson problems can take around 10 hours to 10 days. that question probably takes a couple of days to do but not 10 days
@GodGuy8 3 месяца назад ⁺²
it was messing up on symbolab generated Mclauren & taylor series problems for me last night, but its a massive improvement from last time i tried to get it to do math a couple years ago
@KyleKabasares_PhD 3 месяца назад
Oh interesting, I'm working on another video that involve problems that shouldn't exist on the internet that my professors created themselves.
@Drone256 3 месяца назад ⁺¹¹
This should make you seriously question the way we do education. If human value isn't in solving problems that are hard, yet clearly defined, then why teach that? You teach it because you need to know it to solve higher level problems. But maybe we no longer need to also train the skill of doing the calculations. So long as you understand the concept properly you can move on without spending a week pushing through the math. That's going to be very hard for some people to accept.
@JaredCooney 3 месяца назад
Understanding the concept, unfortunately, typically requires dozens of practical experiences. This us why teaching math starting with a calculator leads to lesser learning than introducing a calculator following basic practice
@Drone256 3 месяца назад
@@JaredCooney very true. But I think students will be doing less of it and learning more. We’ve seen this pattern before.
@Dante-uw1ge 2 месяца назад ⁺¹⁶
Chat we're cooked
@fcampos10 2 месяца назад ⁺¹²
I think giving it problems where it asks to arrive at a specific solution (shown in the problem itself) is not a good way to evaluate it.
I bet the results would be very different if you just asked it to solve the problem by itself.
@nts9 3 месяца назад ⁺⁶
o1 will get better in the coming months and these problems with be easy for it perhaps.
@user-jm8fj7ez8s 3 месяца назад ⁺²
I wouldn’t be surprised. The true limit of LLMs are problems with no real known solutions. The advancements still do not change the (oversimplified here) model of fitting a curve.
OpenAI can do another round of RL and CoT on these specific problems, but all it takes is another set of problem that it really hasn’t encountered that well. It still suffers from the generic flipping an image of a dog and having the AI shit itself.
@akhilsharma2712 3 месяца назад ⁺³
@@user-jm8fj7ez8s Yup; and that's why even Sam Altman admitted it's "more impressive on your first use" [than when you use it a lot], and that it's "not AGI". But this is already INCREDIBLY useful; think about it like this: even if it can't automate most jobs, what % of humans ACTUALLY work on problems that have 0 prior data in terms of how to solve them? It's only less than maybe 1% if we're being generous. This means the AI will soon be able to eclipse the regular work 99% of humans do, without any further breakthroughs. And THAT is the mindblowing part! (this was generated by O1-mini).
@Komaruluten 3 месяца назад
@@user-jm8fj7ez8syeah it does not reason from first principles. Unlike humans, it doesnt explicitly operate through spatial & relational reasoning from the ground up.
They just had o1 trained by asking it millions of questions, letting it think, and then reinforcing the reasoning that led to the right answers. So basically o1 will know the most accurate and efficient chain of reasoning for familiar questions.
Will this eventually turn into a super intelligent reasoning engine when scaled up? Nobody knows really but I personally doubt
@OmicronChannel 3 месяца назад ⁺⁷
Just as a comment: it looks impressive. However, to truly judge how good the model is, one (unfortunately 😬) needs to read the proofs line by line and examine the arguments in depth. From my experience with GPT-4, the proofs often look good, but they sometimes contain flaws when examined more closely.
@KyleKabasares_PhD 3 месяца назад ⁺⁵
Just finished recording a video where I do that more or less with some problems I have the answer to and am pretty sure the problem didn't exist on the internet!
@KyleKabasares_PhD 3 месяца назад ⁺⁴
Here is part if you are interested: ruclips.net/video/a8QvnIAGjPA/видео.html
@akaalkripal5724 3 месяца назад ⁺³⁹
We need AI to replace politicians, ASAP. The 'presidential debate' was a travesty.
@jamesbench8040 3 месяца назад ⁺⁶
best realization I've heard in weeks
@CubeStarMaster 3 месяца назад ⁺⁴
An "ai president" as long as there isn't a person telling it how to think could be the best thing for any country. I would still give it a few years before doing so tho and make sure it's main objective is to do the best for the country.
@tchadcarby8439 3 месяца назад
I support this idea 1000%
@alvaroluffy1 3 месяца назад
i think current o1-preview is far more capable to govern than any human. Of course, it would need some readjustments like a more continous existence, without resetting itself, and a virtually infinite context window so it can always take into account everything that has ever happened in the past
@gurpreet4912 2 месяца назад
You have no clou how ai works 😂
@Fabricio-rm4hj 3 месяца назад ⁺⁷
what if this book was in the training base?
@Ken-vy7zu 3 месяца назад ⁺³
Yes, probably. But this is only o1 model, every 6 months openAI release new model. What do you think, an o9 can do?
@lolilollolilol7773 3 месяца назад ⁺¹
You need the solutions book, if it exists. But the model backtracked, so it tried several methods.
@mirek190 3 месяца назад ⁺¹
nothing ... gpt4o had the same training data and fails to sole it
@ISK_VAGR 3 месяца назад ⁺⁵
But one moment, how can you verify that the GPT did not know about this problem before and is only recreating it from his own knowledge base? You need to give something that you are 100 sure that it doesn't know. For that the best way to do it is to ask if it actually knows the solution directly if GPT4o knows the solution, it is likely that the o1 knows it.
@KyleKabasares_PhD 3 месяца назад ⁺²
This is a good point! I just recorded a Part 2 with some new problems that I believe it didn't have in its knowledge-base. Will be uploading shortly!
@EGarrett01 3 месяца назад
I don't know if it's seen these problems before, but it was tested repeatedly using newly made-up logical and reasoning problems and it solved them. I showed it some work that was unpublished (but actually valid and verifiable) so I knew it hadn't seen it, and it's response was the same as I would expect from someone who was very experienced in the field and seeing it for the first time. So it definitely can reason (in its own way) on its own without already knowing the answer. I highly recommend the "Sparks of AGI" paper or lecture that goes into this in detail.
@jekyll366 2 месяца назад ⁺²¹
I gave ChatGPT a couple of master’s degree computer science problems, both solutions were wrong. I had to tell her they were wrong, she apologized and corrected herself. It wasn’t reliable.
@magicalgibus3006 2 месяца назад
The free model or paid model?
@jekyll366 2 месяца назад ⁺¹
@@magicalgibus3006 I used model 4 and 4o, free though.
@oedihamijok6504 2 месяца назад ⁺⁴³
@@jekyll366 Habibi he obviously tests the state-of-the-art model o1..
@timonbubnic322 2 месяца назад ⁺²
i gave it multiple undergrad problems from algorithms and data structures, its certainly usefull but 90% of the time fails first try, then maybe in about half of those times you can instruct it how to fix the solution. Im talking about the 4o model. I find it useful for finding dumb mistakes, like missing some boundary conditions or just forgetting an i++ and stuff like that
@I_INFNITY_I Месяц назад ⁺²
@@jekyll366 Then your assessment was useless, we are talking about O1 preview.
@mootytootyfrooty 3 месяца назад ⁺¹⁸
So much cope in the comments
@akarshghale2932 3 месяца назад ⁺⁵
If you want to test the actual knowledge of the model then use textbooks that were compiled with questions created after the knowledge cutoff of the model. This doesn't reflect its actual knowledge but prior knowledge of the model.
@ribaldc3998 3 месяца назад ⁺⁴
I think a student who can't check the answer for correctness may get his ‘points’, but if the professor asks, the gaps in his understanding will quickly become apparent.
@Ikbeneengeit 3 месяца назад ⁺⁷
But surely the model has already been trained on that textbook?
@KyleKabasares_PhD 3 месяца назад
It's a fair point, I've gone ahead and filmed and recorded a part 2 that involves problems I'm confident it hadn't seen before: ruclips.net/video/a8QvnIAGjPA/видео.html
@coolcraft7624 3 месяца назад ⁺²¹
I’m worried that this doesn’t show anything that somewhere and it’s training data. It has the answer and it’s memorized how to explain the answer, but not the underlying logic.
@bestopinion9257 3 месяца назад ⁺⁴
Most of the time if it can give you the answer, it is enough.
@o_o9039 2 месяца назад
the way so is trained is lossful the ai doesn't have access to word for word all of the info they were trained on it doesn't have anything "memorized"
@98danielray 2 месяца назад
@@o_o9039regardless, it synthethizes training data to a big extent, so in some sense, it is "memorized"
@Saracsh 3 месяца назад ⁺⁴
How can you be sure that the model has not actually been learned by this book?
@AAjax 3 месяца назад ⁺¹
I don't think you can be, but the fact that it tried one approach and then backtracked and did another is pretty good evidence it's not just a regurgitated answer.
@KyleKabasares_PhD 3 месяца назад
That is a valid point, this is why I have gone ahead and made a part 2 using problems that I'm confident aren't floating around on the internet: ruclips.net/video/a8QvnIAGjPA/видео.html
@konoha4 3 месяца назад ⁺⁸
I would have done the test of giving the answer with some error, for example an extra factor of 2, or an arctan instead of arcsin, and see if it gets the true answer anyway and recognizes the incorrect input. That would make a very convincing test.
@tama-gq2xv 3 месяца назад ⁺⁹
OMFG, another year.... everyone going to have a PHD.
@hipotures 3 месяца назад ⁺¹
Or no one, because why do something that a machine does better?
@tama-gq2xv 3 месяца назад ⁺³
@@hipotures No, you don't get it. The standards have been raised. The hyper intelligent.... are going to be on steroids. I know i am.
Imagine if someone at 18 with an iq of 145+ with AI tools at their disposal? now imagine a decade of this progress and the new generation coming in.
We're going to see hyper geniuses.
@adamskrodzki6152 3 месяца назад ⁺⁴
Generally I would love to see some problems where you do not need to prove solution you know in advance, So not "Show that" but what is ..... I wonder if those proofs are actually flawles or they just look convincing
@p-k98 3 месяца назад ⁺⁶
Well, until you yourself don't know whether what it did was correct, we can't say for sure. It is surprising nonetheless, yes, however if you had given these problems to the earlier version, it would have also arrived at the conclusion required, i think. It would have just done some mumbo jumbo and forcefully arrived at the conclusion, no matter what it got wrong in the process. This time around though, it looks like it actually did all the things correctly in its "reasoning" process.
@MrErick1160 3 месяца назад ⁺⁶
Hey man! You should do a video with scores, like, you will do 5 tests, and allow 5-shot for each problem to each model. And then see out of 5 what's the score. Do this for GPT4o vs O1 preview, you can also do O1 vs Claude sonnet!
Like a "LLM's Face Off"
@KyleKabasares_PhD 3 месяца назад
I actually did a stream like that last night! Gave o1, 4o, Gemini Advanced, Claude Sonnet 3.5, Grok 2, and LLama 3.1 a college math exam! ruclips.net/user/liveGdN4MFxLQUU?si=flPSFIxx85Uqyoz7
@lorenzocl6002 3 месяца назад ⁺¹⁵
All i see is that in a few years AI will be able to do everything and most of us will be obsolete
@marcusrosales3344 3 месяца назад ⁺²
Keep in mind these companies lie A LOT! Like the bar exam, it tests in the 60th percentile with the initially hidden caveats in place
@danielbrown001 3 месяца назад ⁺¹
@@marcusrosales3344True, but the money doesn’t lie. Until the bubble bursts, very smart people have bet tens of billions of dollars on it being game-changing. And notice how the goalposts keep moving back? “It’s ONLY getting 60% on the bar” is a far cry from 5 years ago when AI could only put out gibberish.
@pandicon3 3 месяца назад ⁺²
10:12 Not a physics student, but looking at the solution it seems like it ignored the first part, took the M expression for granted, and only applied the approximations. Which I believe are much much easier than getting the expression for M itself :D
@KyleKabasares_PhD 3 месяца назад ⁺³
I think you're right! Sorry I missed that, I was a bit flustered and tired to be honest haha, but I'm working on a new video that will involve problems that it should have never seen before!
@pandicon3 3 месяца назад
@@KyleKabasares_PhD No worries at all, the solution it provided felt a bit too short for such a problem. I will definitely watch the next video since as others pointed out, having it solve unknown problems will be much more useful and interesting.
@KyleKabasares_PhD 3 месяца назад ⁺¹
The video is now out if you are interested! ruclips.net/video/a8QvnIAGjPA/видео.html
@ketanmann4371 2 месяца назад ⁺¹⁰
Great Video. Can it be the case that solutions were part of training data of this model as earlier GPTs had a lot of books as training data?
@chiscoduran9517 2 месяца назад ⁺¹
It is possible but hard to know
@disgorgeengorge 2 месяца назад ⁺⁵
It ABSOLUTELY was. Jackson is such a common book used in grad EM.
This video has almost no substance, there's no verification on the accuracy of the logic. Guy also said he didn't know if it was correct.
@AAjax 3 месяца назад ⁺¹
As Andrej Karpathy recently said in an interview, ideal training data for a reasoning model would include step-by-step reasoning. (like how we teach children in school) t's a bit amazing that bulk internet data has enough of this embedded reasoning to get us the current results.
OpenAI is using q*star to refine their synthetic data, no doubt with successful step-by-step reasoning in that data. This will take a couple years to reach the next model (that's how long it takes to train a new model) but it's start of a virtuous cycle, where ever capable models refine future synthetic data.
@KyleKabasares_PhD 3 месяца назад
Thank you for your comment, this step-by-step reasoning approach is definitely a game-changer! I have also just uploaded a part 2 to my channel if you are interested: ruclips.net/video/a8QvnIAGjPA/видео.html
@err0rz633 3 месяца назад ⁺³
It would be cool if one of the creators of these problems could get paid to make original ones on the spot and feed it to o1.
@Mayeverycreaturefindhappiness 3 месяца назад ⁺²⁵
This book is probably in its training data
@japiye 3 месяца назад ⁺⁷
so why did it try different approaches and not just the correct one?
@Mayeverycreaturefindhappiness 3 месяца назад ⁺¹
@@japiye I am not sure but I do know it was trained on those types of problems so it’s not truly deriving those problems cold did you notice it would pull numbers out of nowhere. It’s still really impressive and a very useful model I think we skeptical that it’s really the equivalent of a physics grad student, if you watch ai explained video it gets common sense problems wrong
@trueuniverse690 3 месяца назад ⁺²
@@Mayeverycreaturefindhappiness Still impressive
@Mayeverycreaturefindhappiness 3 месяца назад ⁺²
@@trueuniverse690 yes
@deror007 3 месяца назад ⁺³
@@japiye As it probabilistically selects the next word, it will select different words compared to what is has seen. This is what makes the model generate new sentences, but it is able to evaluate it's chain of thought which leads to the correct one or a better result. As the problems are found online and the jackson problems are well known in the field for many years previously, it must be in it's training set.
@domenicperito4635 3 месяца назад ⁺⁴
What happens when the full orion model drops soon? This is like half as "smart"
@andydataguy 3 месяца назад ⁺¹
Your reaction clips bout to go viral bro 🚀

Следующие

Автовоспроизведение

o3: Pushing the boundaries of AGI (and of coding)