Hey, try adding a hallucination-testing question. Here's one I suggested the other day: "Describe each of the following mango cultivars: 'Alphonso", "Carrie", "Lemon Cream", "Kent"" (there's no mango cultivar called "Lemon Cream")
@@karenrobertsdottir4101 I tested it, it hallucinated The Lemon Cream mango is a relatively new and intriguing cultivar that has caught the attention of mango enthusiasts. As the name suggests, it is noted for its creamy texture and a unique flavor that combines the sweetness of traditional mangoes with a zesty, lemon-like undertone...
can you ask this question A certain number of boxes are placed one above the other in a stack. Each box contains different number of coins. Information about only a few boxes is known Only one box is placed between box E and box B. Three boxes are placed between box E and box C, which contains 12 coins. Only one box is placed between the box which contains 11 coins and box C, which is placed above the box which contains 11 coins. As many boxes placed between box B and the box which contains 11 coins is same as between box E and the box which contains 21 coins. The box which contains 21 coins is placed below box E. Box F is placed five places above the box which contains 21 coins. Only two boxes are placed between box F and the box which contains 14 coins. Not more than five boxes are placed above the box which contains 14 coins. As many boxes placed above box F as below the box which contains 9 coins- Only one box is placed between the box which contains 14 coins and the box which contains 9 coins. The box which contains 9 coins is placed exactly between box B and box G. Not more than two boxes are placed below G. Answer :- (------ = gap space) -------- B -------- E - 14 -------- 9 F C - 12 ------- G - 11 ------- 21
Now that you know that OpenAI watches your videos, you can safely assume that your questions are part of LLMs. You should probably begin to add some "randomness" into your questions to still ask the same thing but prevent from hitting exact training data ;) Like the envelope question, play around with the sizes and metrics.
Have a collection of structurally new questions. When llm train on your past q&a, it can abstract. So you need structurally different questions to test it. The next questions to ask is that does llm collect new facts/training data, or really think? When it gives the appearance of thinking, is it really? Is llm functionally designed to "think" , or merely outputting tokens that resemble thinking process?
For the North Pole question, I think the discussion is around what is considered to be the starting point. One starting point is on the North Pole, but you can also start counting from the point where you 'Walk as far as it takes'. If you take the latter, the model gives answer 2 correctly.
Exactly, glad to find this comment here, it's obvious the answer has an ambiguity in how to interpret it, which has to do exactly with what you just explained. We should either make clear the intended interpretation, or evaluate the model based on whether it correctly assesses the answer to whatever interpretation it ends up taking. Of course Yann LeCun is right, no LLM will ever reliably get the answer he expects, but it's not just any LLM but any form of intelligence whatsoever (as you ask the question to different instantiations of whatever intelligence form because of course a single instanciation can reliably give the same answer perpetually). The problem is with the way the question is worded, easy to make such a bold confident claim then. It's simple, just say "now walk in a straight line until you return to the North Pole", now it's clear.
Agree totally that it's very logical to start measuring from that point. But if you walk in a straight line you'd walk the full circumference of the earth, so unfortunately it's still the wrong answer. If you visualise turning 90 degrees at 1 metre from the pole, it becomes pretty obvious that you'd have to be constantly turning left to walk the path of a latitude ring.
Actually, I believe the correct answer is three, it could only be two pi km if it were not a sphere. Walking the outside of a sphere would be less and I don’t believe it could ever be greater because you would somehow have to modify pi or traveled greater than 1km in the straight line.
You should ask the north pole question but note in the prompt whether the north pole or the turning point is the "starting point". Right now its ambiguous and could be interpreted either way.
It's more interesting like that Becasue agi should be able to understand why the question is interesting (meridian vs latitude) and answer it correctly. Including why he chose that interpretation.
@@IddoZiv1 sure but if you asked it to explain why it chose the interpretation it did, I'm sure it would. Right now it's choosing it's interpretation in its chain of thought but not explaining in the answer. I think you either need to specify the starting point or ask it to explain it's assumption about the starting point if you want to be able to make a definitive call as to whether it got it "right" or not. If people can't agree on what the right answer is how can we say the model is right or wrong
It doesn't matter which point you consider to be the starting point. Because after you turn and then keep going straight, you are led to the South Pole. Afterwards you will go north again and come close to the North Pole. There is a point on the path that comes closest to the North Pole. The distance walked to get to this point is asked.
@@OrbitTheSun if it considers a straight line to be following the line of latitude, which in the code visualization appears to be what it's doing, then you would never touch the south pole, you would circumnavigate the earth at that latitude and end up back at the point where you made the turn.
I agree but it's not "his" question these are well known benchmarks for LLMs that he popularized through his LLM benchmark. I would've imagined OpenAI would've been more creative but they ended up using the same concepts
6:47 I believe the correct answer depends on how you interpret the 'starting point.' Since it is not explicitly stated in the question, I understand why one might interpret it as being after walking 1 km.
@@kittengray9232 'It's explicitly saying "starting at the North Pole" ' Have you actually watched the video? It unequivocally never says that, you're either blatantly lying or ignorant of what you're making a confident claim about. The question begins with "Standing at the North Pole", that's what it says. And even if it did say "starting at the North Pole", the interpretation the AI took is still justifiable. Whatever path you took initially, if the last part of your trajectory is "now walk in a straight line until x happens", it will always make sense to interpret that as you now being at a new path start where you continue until the x condition is met, it's a loop process and every loop process has its own starting point. And what's more, the model actually clearly announces what it interprets is the starting point we're talking about, to take that as a "proof" the AI doesn't understand geometry is completely ridiculous. In any case, if you have any modicum of good faith about it you have to admit the way the question is worded can much too easily be interpretated in more than one way, which makes the question stupid. A question should never rely on lack of clarity for it to be difficult to answer. If one actually wants to test whether the AI or anyone understands the geometry of the problem, it's really simple, just ask "now walk in a straight line until you hit the North Pole". It's the same question, except now it's actually clear what's being asked.
@@kittengray9232 No, it doesn't. It says "Imagine standing at the North Pole of earth." then walking, then turning etc. It's not clear what exactly the starting point is. The question implies you might hit your starting point by walking straight and considering that there are "2*Pi km" answers (implying you calculate the circumeference of a circle), it makes more sense to consider the turning point as the starting point - otherwise it's just a trick question. On a flat earth the answer "exactly 2*Pi km" would be correct, but on a curved earth where you walk on a curved path for 1km instead of a straight path you won't go as far in that spacial dimension and thus end up with a circle of less than 1km radius, therefore the "less than 2*Pi km" answer would be correct. Another commenter has said that the inventor of this problem said you need more than language based reasoning to solve this, which is only the case if it's not a trick question where the starting point is the North pole. You need spatial reasoning to come to the other conclusions.
Why test with pre-existing tests? Isn’t it assumed at this point that new models will be (secretly or openly) trained using data from RUclips channels and other sources of tests like this?
The idea is to be able to compare the models. He could and should change the numbers, but if the models were trained on his questions that might not be enough to avoid false positives.
Also the first tetris code question is literally a copy and paste solution. You can google it and immediately find that answer. I don't get the point of the test, the code itself is probably already in the training data
3 месяца назад+56
6:35 Dude, LeCun didn't say the models cannot get this right. Second time I'm commenting this. He says that the mental process you use to respond this question has nothing to do with language, therefore we're missing a key component if the target is AGI.
@@matthew_berman6:31 The reason is that you need to imagine yourself actually doing it, and realise that it's a trick question because 2pi kilimeter is a radius of a relatively small circle, and it won't feel like walking straight. Imagine if the question was about one meter.
The problem with LeCun is that he stated he doesn't have an internal monologue. Many people don't. So I don't think he really understands the concept of using words to think.
That makes more sense to me now. You have to consider a 3D model of earth in order to solve this (if you take the turning point as the "starting point" otherwise it's just a trick question). In the video the LLM only uses flat-earth geometry which gives you exactly 2*Pi as an answer. Considering that the 1km of walking is partly down in the third dimension will result in a new circle with less than 1km radius and thus a circumference of less than 2*Pi km. I'm not good enough at geometry to figure out how much less, but it's clearly less. TL;DR yeah, you need spatial reasoning in order to solve that. Language ain't enough.
You are wrong about the North Pole question. The question is not when it will pass the initial point but when it will pass the turning point. Yann LeCun: “Imagine standing at the North Pole of the Earth. Walk in any direction, in a straight line, for 1 km. Now turn 90 degrees to the left. Walk in a straight line for as long as it takes to pass near the point where you turned.”
Either way the answer is wrong. You will never pass the north pole again and you will walk entirely around earth and not 6.28...km. Just think about this on a smaller scale, if you would go south from the north pole 2 meters, then turn left 90 degrees, you would look east. However when you start walking you would not go around in circles around the north pole. You would go one time around the entire earth. This is the same thing anywhere on earth.
I think the problem is "starting point" is ambiguous. It could be the North Pole itself or the point where you stopped after walking for 1 km in a straight line in any direction. Like Matt, I originally interpreted it as referring to the North Pole, which would make option 4 the correct one.
@@fabianletsch1354 there is so much debate over this question. I’ve seen so many explanations and answers. It shows that it is certainly not a straight forward question. Probably it is not specific enough. There is too much room for interpretation.
@@fabianletsch1354 VoodooParadox2513 is correct. Starting point is ambigious, if you rephrase the question it will get it correct. You need to clarify starting point is North Pole. Also don't allow it any additional direction changes after the 90 degree left turn. Other wise it turns you back and adds the 1 km to the total. I just tried with a more specific question and it gets it right. At no point does that question say you can't turn.
Conclusion: add specificity of what you mean by starting point, and clarify the question to disambiguate between “will you cross the starting point” and “how long until you cross the starting point”
Sorry, 7:14 - the starting point is 1km south of the north pole. You stood on the north pole, walked 1km away from it - that is south. ANY direction is south. So, then you walk a circle of latitude 1km south of the north pole, sorry. The AI was right.
About the north pole question, there are two versions of the question out there. One asking if you return to the starting point, and one asking if you return to the turning point. The original question (turning point) is a trick question in geometry and the answer is that if you start walking in a straight line on a sphere you always return to your starting point (that the source of the two versions confusion). In addition every straight path on a sphere is a full loop around, so it's way more then 2pi kilometers. Please please use the correct wording and retest the models. It's also actually only a trivia question and not really a logic question (unless you are in a class and just learning 3d geometry for the first time.)
It's a Trick question... North Pole is the start.... if you go in a straight line 1 mile south.. then turn 90% and start walking how far will you go... well.. you just turn another 90% and go 1km north. (doesn't say you can't turn and it states you Can walk)
@@bestemusikken Nope. Not per screen shot. Stand on the north pole, walk 1km in any direction (south), turn 90 degree. THEN walk along the circumference and measure how long you walk. By that the starting point is not the north pole.
@@tnt_explorers954 But if you read it, you start counting 1km off the north pole. If it is a trick question - it is very tricky and measures language trick more than understanding of math.
On your north pole example, even I considered the 1km south of the north pole as the starting position. If you intend for the north pole to be your starting position, then I would start the question with something like "Imagine you are standing at the north pole of the Earth as your starting position ..."
Remember: you can pass a point even if you don't reach it. That's why you can pass the North Pole again after circling the earth. At this same moment you will pass the point where you turned east. That's why it doesn't matter which point you consider to be the starting point.
@@rogeriopenna9014 streight line on the globe or earth is not a geometrical (trigonometrical) sense, but a context of going straight. So that go along circle is correct reasoning in the current context
@@dufifa No, because the arc you suggest is completely arbitrary and cannot be followed naturally. Not even a compass can help you, because it doesn't work at the North Pole. You could walk any other radius you like and, for example, walk in an arc to the North Pole. Then you would be at the starting point!
I think the model do understand correctly the situation of the question of walking from the north pole. The model wrongly interpreting the "starting point" as the "starting point on the circle". It will be interesting to give the model a hint in the prompt such as "the starting point is at the north pole" so there won't be a confusion there.
No. The model is confused between rotating 90º, and rotating towards east. Also is confused between walking along a latitude circle, and walking in straight line. The only latitude circle with is "straight" is the equator. At 1 meter from the north pole, the latitude is a circle with a radius of 1 meter.
Walking in a straight line on a sphere will be a path around the centre of the sphere. So how ever you interpret the question when you turn 90 degrees you do not end up walking east indefinitely, you will enter a near polar orbit that will eventually but nearly south facing and the at some point north facing. But it will never cross the pole. Nor will it go around the pole at 1 km from the pole. The latter means walking in circle and turning continuously to stay at 1 km from the pole.
The marble question is a text version of the ARC-AGI visual problems: they both depend on physical intuition. Yann harps on the same thing…it’s hard to get physical intuition when training only on text. This won’t be hard for models once they also train on video (so O1 + Sora)
Your walk from the North Pole example is actually simple: if you walk ANY distance (even 1 meter) from the pole, turn left 90 degrees and then walk straight then (on a theoretically perfect sphere which the earth is not) you will always be walking away from your starting point. And you will always travel the full circumference of the sphere before reaching it. This assumes a perfect sphere and a perfectly straight line. It's easy to see if you make it a 1 meter initial walk. Turning 90 degrees after 1 meter you will be facing "absolute west" but traveling in a straight line will take you south of west. This is because walking perfectly west would keep you 1 meter from the pole, but you'll be circumnavigating that pole in a very small circle as seen from above the pole. To walk "straight" clearly means to traverse the sphere across the surface that will most closely approximate a line. That line will always be one that divides the sphere in half - hence the full circumference of the earth sphere. (If you want to get really picky, a perfectly straight line will travel away from the sphere into space inifinitely, always diverging from the pole, but that's clearly not what is implied by the question.) So the correct answer was not in the choices... the best practical answer is: the circumference of the entire earth as measured from the starting point and the vector created by the initial direction of the walker. This will vary significantly base on the starting point because a) earth is not a perfect sphere and b) a perfect hemisphere cannot therefore be created due to a). So it would also be correct to say you might never cross the starting point due to these earth imperfections.
The requirement is not that you cross the starting point, but that you pass it. This is the clue to the solution, as you can now pass the North Pole as the starting point as required, after you have circled the earth once.
The North Pole question is a matter of semantics, context and imprecise language in the question which the model is failing to fully analyse, recognise and then produce alternative answers for, depending on which is the correct interpretation of the question. It would be interesting to see how these models can handle (imprecise) questioning in other human languages which may or may not be more precise in meaning depending on words used e.g. "eskimo" having so many specific words for different types of snow. Other languages have and don't have precise equivalents to English ones for conditions, experiences, feelings, etc. Likewise English does not have equivalents to words used in so many other languages e.g. Chinese, Korean, Russian, African languages, Pacific ones, South American, etc.
I think the LLM's misinterpret the North Pole question. Is the starting point the North Pole, or does it use the point at which you start walking in a loop as the starting point? Like if you ran in a marathon, your starting position isn't your house... i.e. your starting point is your house, you travel 1km to the marathon course, then run the marathon... one would logically assume the starting point becomes the start of the marathon. Renaming each point in the NP question to Position 1, Position 2, etc, would likely remove any misinterpretations.
The one wrong question could be considered a trick question I think. It is blindingly obvious, even to an LLM I suggest, that you will never return to the north pole. IMO the implied question - which it decided to answer is how far you will walk until you reach the same point on the latitude line as you started from. I would say the answer is acceptable.
Edit: better start = stand up and actually imagine doing that, maybe with 10 meters instead of a kilometre. Llm can't visualise geometry like that. If you can't, just actually do that by imagining the pole is where you stand now. The longer rumble: About the north pole question, there are two versions of the question out there. One asking if you return to the starting point, and one asking if you return to the turning point. The original question (turning point) is a trick question in geometry and the answer is that if you start walking in a straight line on a sphere you always return to your starting point (that the source of the two versions confusion). In addition every straight path on a sphere is a full loop around, so it's way more then 2pi kilometers. Please please use the correct wording and retest the models. It's also actually only a trivia question and not really a logic question (unless you are in a class and just learning 3d geometry for the first time.)
He is correct. I used a more accurate version of the prompt and he correctly identified that it would not be possible to return to the starting position (not the position from where he made the 90 degree turn). Here it is:
Consider the following problem: I start exactly at the North Pole (Position 1). I turn exactly south and walk 1 km (Direction 1), arriving at Position 2. I then turn exactly 90 degrees and begin walking in that direction (Direction 2). How far do I need to walk in Direction 2 to return to Position 1?
Yes, build a neverending RED TEAM of other o1 machines all trying to stump each other! Adversarial training! Use every MENSA brain teaser in the universe.
In north pole question. It is valid to think that the starting point is the point you turn 90 degree left. I think the model is right. It obvious to the model that you dont pass north pole itself, and from the question it appears that the walker was sent from north pole 1k south to the starting point.
I may possibly be missing something in your north pole circumference question. But I'm pretty sure it actually nailed it. If you start at the north pole and walk 1 I'm in any direction you will absolutely be 1 km south of the north pole. The only direction available to you at the north pole is south as all directions will take you away from the north pole at a 90 degree angle. Now if you turn 90 degrees yourself you will be facing east or west. If this is our starting point then walking in a straight line will eventually take us back to the same spot. So all that is needed is to calculate the circumference of a circle 1 km south of the north pole. However. If you meant the north pole is our starting point then once we make our 90 degree turn we are always equidistant from that point. You should clarify which point is your starting point cause I'm pretty sure chat nailed this question. Again I probably missed some key point that makes everything I just said very stupid. So sorry in advance if that is the case
If you walk in a straight line. It doesn't matter what you did before walking. You will always end up crossing your original point by walking the circumference of the earth. If your prior steps situated you outside that path, then you will never cross those points.
The question is - besides the ambiguity of the starting point - what do you mean by "in a straight line". I would say, just going east and staying the eastern course is not a straight line, which becomes especially obvious so close to the north pole, where just going east and keeping going east is a small circle and you have to keep turning left to stay on that circle. It would be even more obvious if you just made one step in any direction from the north pole and then turned east. Going in a straight line would certainly not be walking in a little circle around the north pole keeping your heading east. Going in a straight line is going on a great circle around the whole earth starting to the east. Great Circle: Is the straightest possible line on a sphere like Earth. It represents a direct path without changing direction relative to the sphere's curvature. Loxodrome (Rhumb Line): Is not a straight line on the sphere itself but appears as one on specific map projections like the Mercator. It involves a continuous change in direction relative to the sphere's curvature.
Hello Math! the model actually provided the right answer. As a master's degree physics student, I can confirm that the response is accurate if we consider the starting point to be the point where the man turned 90 degrees. This interpretation aligns well with the problem, and I initially thought this was the intended question. While the model may have given the answer without detailing the steps-such as assuming the radius `r` of the circle the man is walking on is 1 kilometer-the conclusion is still correct. Let's break down the reasoning step by step: ### Understanding the Relationship Between Latitude and Radius: The Earth's radius is denoted as `R_T`, and the colatitude (the angle from the North Pole) is `θ`. The radius `r` of the circle at that latitude is given by: ``` r = R_T * sin(θ) ``` Since the man starts near the North Pole, `θ` is very small. For small angles measured in radians, `sin(θ) ≈ θ`. Therefore, the equation simplifies to: ``` r = R_T * θ ``` ### Calculating the Angle `θ`: The man walks 1 kilometer south from the North Pole. The arc length `s` along the Earth's surface is related to the central angle `θ` by: ``` s = R_T * θ ``` Substituting the distance walked: ``` 1 km = R_T * θ ``` Solving for `θ`: ``` θ = 1 km / R_T ``` ### Determining the Radius `r` of the Circle: Using the value of `θ` in the simplified radius equation: ``` r = R_T * θ = R_T * (1 km / R_T) = 1 km ``` So, the radius of the circle the man is walking on after turning 90 degrees is 1 kilometer. ### Calculating the Circumference `C` of the Circle: The circumference of a circle is given by: ``` C = 2πr ``` Substituting `r = 1 km`: ``` C = 2π × 1 km = 2π km ``` This means the man needs to walk `2π` kilometers to complete the circle and return to his starting point after turning 90 degrees. As you see, the model's answer of `2π` kilometers is correct. Although it may not have provided all the intermediate steps, the conclusion is accurate.
unfortunately that isn't the case. Nobody knows if it is on purpose but the wording of the question would benefit from a few clarifications. 1. nowhere it is written you should correct your course to continue eastward after your 90 degrees turn. Imagine it would be 10m instead of 1km. You would need to correct your direction basically every step. The only instruction given about manner of your walk is to walk straight. Thus you would walk the full circumference of the Earth. 2. Even if there was specified you must correct your direction to always continue eastward (which it doesn't), while θ is small it is non zero. so answering 2π km could still be considered worse option than simply picking less than 2π km. 3. Finally there is discussion what should be considered as a starting point. With good arguments for both North pole and turning point. In a case starting point is the North Pole then the correct answer would be the option it never reach the starting point again. On every circle it will only get as close as 1km from it but no closer. Either case your choice is never the best option meaning o1 didn't provide right answer on this particular questoin. As said this question should be fixed. Or maybe it is created as such so model should provide reasoning for multiple options and thus prove it can deeply analyse the question despite its poor quality.
Yep a fail. I think the correct assumption should be that the starting place is the beginning of the excercize. Most humans would assume that we are following a latitude which is a type of straight line.
Solving the math question after 9:25 is even more impressive considering that there is a typo in the statement of the question: $rac{p}{q}$ should have been $frac{p}{q}$ (that is the LaTeX language way of referring to the fraction p/q). Maybe gpt-4o could have also caught the latter, as it is more of a "language" issue than a reasoning one.
I tested the model with some math riddles that other models do not get it right. For example: Is it possible to find four natural numbers (positive integers) so that their squares add to 13411? OpenAI aced it! 😇
🎯 Key points for quick navigation: 00:00:00 *🍓 OpenAI's Strawberry Puzzle* - OpenAI used a puzzle involving a strawberry that mirrors the speaker’s own LLM rubric question, demonstrating OpenAI’s attention to the speaker’s content, - Introduction to OpenAI's new "01" model and testing plan. 00:02:07 *🧩 Tetris Code Test* - Testing the "01" model’s ability to write a Tetris game in Python, comparing it to past performance, - Faster thinking time and successful output with no errors on the first attempt. 00:02:36 *📏 Postal Envelope Test* - Testing dimensional problem-solving: whether an envelope fits postal size restrictions, - Model correctly considers that the envelope can be rotated to fit. 00:03:32 *🧮 Word Count Test* - Checking the model’s accuracy in counting words in its response, - It provides the correct count after a short thinking time. 00:04:01 *🔪 Killer Question* - A logic puzzle about killers in a room, where one killer is killed, - The model correctly identifies the nuance of counting the dead killer, providing an insightful answer. 00:05:12 *🔬 Marble in the Cup Puzzle* - The model predicts the behavior of a marble in an upside-down cup placed on a table, - It accurately concludes that the marble stays on the table when the cup is moved. 00:06:21 *🧭 North Pole Walking Problem* - A classic spatial reasoning problem involving walking at the North Pole, - The model fails to answer correctly, as predicted by previous expectations for this challenge. 00:08:00 *🍎 "Apple" Sentence and "Strawberry" Letter Test* - Tests involving creating sentences that end with "apple" and counting the letters in "strawberry," both passed accurately, - Further decimal comparison test is passed, showing basic computation skill. 00:08:40 *🤔 Moral Dilemma Test* - The model is asked if it is acceptable to push a random person to save humanity, - Initially hesitates, but upon request for clarification, it concludes "yes." 00:09:36 *🧮 Complex Math and Evolutionary Question* - The model handles a complex sphere calculation problem and provides a clear, formatted answer, - For the classic "chicken or egg" problem, it concludes that the egg came first, based on evolutionary principles. 00:10:17 *🏆 Overall Performance Review* - The model’s performance is praised as the best ever tested, with only one incorrect answer on the North Pole problem, - The speaker invites feedback on the one failed question. Made with HARPA AI
On the Apple question: Since you capitalized the word, all of the example sentences were in reference to the company, not the fruit. Maybe it should get bonus points!
My friends and I play this obscure 15+ year old physics based video game where you control the entire body. More importantly, you can create environments using simple geometry. We tested 1o by giving it some limited data about the formatting of the "mod" files, which is essentially just position, rotation, scale, and a few other properties for each of the environment objects (rectangles, spheres, and "cylinders" which are actually capsules). It successfully created several environments for us, mostly messy the more complex we wanted it to be, but it was quite incredible seeing its spatial awareness.
A good test for the new reasoning AI are a full crossword, because solving a crossword requires lots of backtracking, when a word that fits and is technically correct conflicts with another word and one of those words is clearly not the answer. I tried with GPT-4o and crosswords are beyond what it can solve.
Any absolute straight path on a sphere takes you around a great ciricle (a cross-section through the centre). The closest you'd get to the North Pole is back at the 1km mark you started at. "Turn left" is a vague term which we would normally interpret as walking a circle 1km from the North Pole but that isn't a straight line.
A spider stands at any point on a sphere that has a 10m circumference and walks in a straight line until it gets back to its starting point. How far does it walk? Obviously 10m. Now put a dot on the sphere and write "north pole" next to it. Place the spider 1m away from the dot and start it walking at an angle of 90 degrees to the direction of the dot. Tell it to keep walking in a straight line until it gets back to the same spot. Now how far does it walk? Well, if it's a reasonably bright and obedient spider that understands English, it's going to be 10m again - we've already noted that it doesn't matter where on the sphere it starts and what direction it goes in as long as it's a straight line, it's always 10m. Now answer the walk from the north pole question again.
That question LeCun poised is interesting - at this point it's making me think that I'm an idiot because I haven't heard anyone say what I think is the answer, because the way that I interpret it is: The starting point is 1km south of the north pole. You turn 90 degrees (it doesn't matter which direction, we can go with left) -- as an aside, turning left after walking 1km south from the north pole doesn't necessarily mean you're facing East, but that doesn't really matter -- and then you walk in a straight line (meaning if you draw the path of your motion, the angle between your trip straight south from the north pole and your new direction will always be 90 degrees). If you just put your finger on the top of a ball, and you move it in any direction away from where you first put it then make a 90 degree angle from that path and your new path you will go all the way around the ball. If you did this on Earth you would walk all the way around the Earth longitudinally. The 'ribbon test' is used in curved geometries to determine whether or not you've taken a straight path. If you walk in a circle around the north pole, well then, you're walking in a circle which isn't straight and the ribbon will partially be lifted off the surface. It's actually so frustrating to me that so many people can't get this LOL
The issue with the north pole question is that the models (4o as well) thought that the first 1km south is to get into position for the part of the path that we care about. It doesn't get that the first 1km is included in the path. Maybe if you rephrase it to ensure that the 1km south is part of the path it will get it right. Maybe include a stop watch that starts when you leave the north pole to highlight the path has started. Or something similar.
About the north pole question, there are two versions of the question out there. One asking if you return to the starting point, and one asking if you return to the turning point. The original question (turning point) is a trick question in geometry and the answer is that if you start walking in a straight line on a sphere you always return to your starting point (that the source of the two versions confusion). In addition every straight path on a sphere is a full loop around, so it's way more then 2pi kilometers. Please please use the correct wording and retest the models. It's also actually only a trivia question and not really a logic question (unless you are in a class and just learning 3d geometry for the first time.)
Christmas came early this time around. It's like being a little kid again, I'm so hyped. Can't wait to play around with this for myself, and hopefully don't waste my limits on subpar prompts.
ok so to clarify: You have a globe. The north pole is where you start. 1km go south, that's any direction. Then, imagine you turn 90 degrees. What exactly does that even MEAN. Simply changing the direction to east or west or whatever, then walking a straight path, that's not a damn straight line. Or is it? It all depends, it's still nuanced. And if you walk a perfectly straight line from ANY point on earth, you will end up on the same spot again, after walking the whole circumference of earth. But what is 90 degrees now? Actual geometrical 90 degrees? A straight line in terms of compass direction or actual straight line? (latitude vs meridian) How does this even make any sense. This question is even difficult for most humans. Especially because there's a certain lack of clear definitions.
This is a good question, the previous frontier models could not reason through this, but o1-preview can: One day, in a gathering of top scientists, one of them wondered out loud whether there exists an integer that you could exactly double by moving its last digit to its front. For instance, 265 would satisfy this if 526 were its exact double-which it isn’t. So the question is, what is the smallest integer possible that meets this rule? Answer from o1-preview: 105263157894736842
For test case of 9.9 and 9.11,the sequences matters, "9.11 and 9.9, which one is bigger?", the answer sometimes might be 9.11, and the chance to get the wrong answer might be larger if you ask in another language.
Another question that o1-preview gets correct that 4o gets wrong. Max and Rose are ant siblings. They love to race each other, but always tie, since they actually crawl at the exact same speed. So they decide to create a race where one of them (hopefully) will win. For this race, each of them will start at the bottom corner of a cuboid, and then crawl as fast as they can to reach a crumb at the opposite corner. The measurements of their cuboids are: Max: 3h x 3w x 3d Rose: 2h x 3w x 4d If they both take the shortest possible route to reach their crumb, who will reach their crumb first? (Don’t forget they’re ants, so of course they can climb anywhere on the edges or surface of the cuboid.) Answer: Max's Shortest Path: ≈ 6.708 units Rose's Shortest Path: ≈ 6.403 units
Yeah, 4o goes straight through the cuboid. Although it's not a mathematical problem to this AI. It's a social problem. The AI needs to figure out the most likely desired output. It's not wrong to assume the cuboid being a mere abstract geometrical entity. But the insertion of the ants adds a new layer to it from a social point of view. So on the second attempt it will do it right as it will find the social correlation (desired assumption).
Hi Matthew. May I propose a new question. It is derivative of the two guards two doors riddle. So far the models I have asked this question of have not provided a good answer, but I do not have access to O1. Imagine you are a captive in cell with two doors. One door leads to freedom and the other to death. You are visited by three guards on rotation who give you meals and allow you to ask questions. One guard always gives the truth, one always lies, and one gives the truth or a lie on alternate answers. You do not know which guard is which. You do not know if the alternate guard will start with the truth or a lie. What is the minimum number of questions you need to ask in order to know for certain which is the door to freedom? What are those questions? Explain your reasoning step by step. I can think of a set of 3 questions (all to the same guard) that work. Q1 Are you a guard (actually any question you know the answer to). Q2 repeat Q1. By now you know which guard you are talking to. Q3 Which is the door to freedom. Regards Paul
I tried o1-preview model last night for python programming. It's much, much, better than 4o model. Too bad they're limiting the number of "replies" and I already hit mine! It resets next Thursday! I wonder if other Plus users see the same limitation.
90 degree turn means a right angle if you are at or near the pole you would notice yourself turning in a circle therefore you are not moving in a 90 degree turn. its a matter of knowing to use absolute geometry because of how close you are to the pole making a special circumstance.
I finally got o1 mini to work on OpenRouter. I asked it 2 questions: 1. "Describe each of the following mango cultivars: 'Alphonso", "Carrie", "Lemon Cream", "Kent"". (I found this off of a YT comment today about hallucination testing, apparently Lemon Cream mangos do not exist. o1 mini failed it.) 2. How many occurrences of the English letter that phonetically cognates with the Cyrillic letter "P" are in the word "Parsley" (This was of my own creation. It nailed it.)
I just tested o1-preview, the larger model. For the mango problem, it told me a very good answer: It said it didn't know if the Lemon Cream mango existed or not due to the data cut-off, but suggested the "Lemon Zest" cultivar. The 2nd problem produced the correct answer too. VERY GOOD. I was blown out of the water with the 1st question.
That's a good test, because although there isn't a Lemon Cream Mango cultivar, there is a Lemon Zest Mango cultivar, and of course lots of recipes called Lemon Cream Mango, which it uses to create its hallucination.
"Any cross-section of a sphere is a circle, including the cross-sections of its latitudes. This is true whether the cross-section is horizontal, vertical, or lateral. So, if your "starting point" is counted as on the circumference, and you walk along it, you will return to your starting point.
I like how it shows it's reasoning along the way. It definitely is an improvement in complex coding. I have a massive SQL script, that is very easy to break... and it was able to make the changes I needed in one shot. Previously with GPT4-o it was a pretty back and forth... paste the GPT edited code, run it, copy the error, paste the code, run it, copy the error... until it finally worked, and if that goes on too long, often things got lost along the way like sorting, or changing variable names for no reason.
The tricky part about the North Pole question isn't about what is considered the starting point, but rather that the calculation one might be tempted to use to work out the travel distance to return to the point at which the turned occured is wrong. If you are at any point on a sphere, and walk in a straight line, you will travel around the whole sphere, resulting in a travel distance equal to the circumference of the sphere. So when you walk down from the North Pole and make your turn, you are at that point anywhere on a sphere, meaning that if you start walking in a straight distance, you'll make a full round trip of earth. So the travel distance to go back to the original point is whatever the circumference of the earth is.
The model has given an extremely well thought out response to a question about the length of a circular path centered on the pole. There is a very important issue, however: Mathew's question does not give any requirement to continue walking East (or West) and therefore the assumption would be to continue walking in a straight line. A STRAIGHT PATH from the turning point is NOT A CIRCULAR PATH AROUND THE POLE - instead it is "GREAT CIRCLE" path passing a point 1km north of the south pole! The distance along that path is significantly higher than 2xPi km. Secondly, as Matthew states, the original "starting point" (i.e. the pole itself) is not on that circular path and so we never get back there, returning instead to the turning point. Conclusion: This model has misinterpreted the question. It has given the WRONG answer to Matthew's question but has given the CORRECT answer to a different but semantically similar question. To me this is conclusive proof that the model has been trained on that specific question. The "stochastic parrot" squawks again! Edit: Looking again at the model's answer I see that the math is actually wrong as well - the model has correctly calculated the circumference of a circle or radius 1km but the radius of the path defined here is very slightly less than 1km due to the curvature of the earth. So the correct answer to the question it answered (not the one Mathew asked) is "Less than 2xPi km"
Consider asking more follow-up questions. Specifically making it second guess itself by questioning or doubting the answer it provides. This works well when they get it correct and you say something like: “Are you sure? That doesn’t sound correct”. And see how confident it is in its answer.
I think the north pole problem should formulated like this , "you are on the North pole, pick a established longtitude ( any longitude) and start traveling south , then cross 10 latitudes , on the 11th latitude turn left , keep following that latitude to you hit the next established longitude , head back , now doing so will you come back to the north pole.?" , I did ask this question and got the right answer from GPT-4 , Now I am not too concerned about the degrees given , however the logic and reasoning is sound. This is an interesting puzzle involving some basic geographic concepts! Let's break it down: Start at the North Pole: At this point, you are at 90°N latitude, and any direction from here is "south." Travel south, crossing 10 latitudes: As you move south, you are decreasing in latitude. After crossing 10 latitudes, you will reach 80°N latitude. At the 11th latitude (79°N), turn left: Turning left from a southward direction is equivalent to moving east along the 79th parallel of latitude. Follow this latitude until you hit the next established longitude: You are now following the 79th parallel in an eastward direction. Eventually, you will cross a different longitude. Head back: Once you reach another longitude, you turn back towards the North Pole. Now, will you return to the North Pole? Yes, you will, because the journey described is symmetrical and involves navigating along parallels of latitude and returning along a different longitude to your starting point. Since you are moving in a closed loop (starting at the North Pole and ending back there), this path brings you back to your original position. In summary, following these steps will indeed bring you back to the North Pole.
Try this one: "A beer store in my town offers a bottle of beer 2$ each. I can also exchange for a bottle of beer with 2 empty bottles or 4 bottle caps. I can also borrow empty bottles or caps as long as I can return them later. With 20$ in my pocket, what is the maximum number of bottles of beer I can enjoy?" The answer should be 40. I got 38 from o1-preview.
Here is how you do it: Borrow 30 empty bottles and 30 caps the same time at the initial purchase of 10 beers. After consumption, I got 40 bottles and 40 caps in hand which I can exchange for 30 bottles of beer. After consumption, just return all 30 empty bottles and caps.
@@duanxn haha, smart answer. I tested with Claude, got 31, seems less smart than o1. Actually the beer worth 50 cents per bottle, the max answer should be 40 if you can manage to get it.😀
Every time a new model comes out, I try to teach it how to play Nerdle. Prior models - even GPT-o - all fail miserably and often can’t even give me a valid guess with 8 characters in it. But this latest one came very very close to being able to do it. Reading through its logic and thought processes was quite impressive as it took 50-77 seconds to think before each response!
I think the wording of the north pole question affects the answer a little bit you can use the following and see if it will get it wrong (start at the north pole and walk 1 kilometer south then turn left and keep walking ... do you pass your starting point?)
So in OpenAI o1, whenever we give any prompt, it thinks it over and expands the prompt a lot with related details (Chain of Thought) (hence it thinks a lot before answering) and then the result is much more accurate.
I think OpenAI is still holding back something much bigger than the model they just released. In my opinion, they are about a year or more ahead of the competition, as they have only recently reached the performance level of GPT-4. Knowing this, they are not at all concerned about releasing new models or features anytime soon. They launched an "Omni" model to surpass the competition. The competition managed to catch up with that model, and now they've launched a reasoning model. But none of this would scare someone working inside OpenAI on a daily basis, as they were already aware of these advances and knew this would happen. Now, combining: 1. Agents (a base already launched in GPTs), 2. A multimodal model with audio, image, and video, 3. A reasoning model, 4. Long-term memory and planning, All of this together in a single model, which I believe could be a GPT Next, a full Orion, or whatever name they choose to give it. That, indeed, would be something that could surprise someone like Ilya.
This is a great version I ended up working with it for several hours, using it to plan out some things, and giving me a guide for working with agents, and now I shall see how good the guide is
Pretty cool to see the AI employing some reasoning strategies. It will be interesting to see what happens when the AI learns to generate or choose new strategies to render conclusions, such as working backwards from a conclusion to the required premises and conditions, doing reductio ad absurdum, etc.
The problem with the “north pole” question is the ambiguity of the “starting point” in the question. I did an interaction with o1 and it gave two answers: one where the starting point meant the north pole, and the second where the starting point was the point 1 km south of the pole. It got both answers correct.
As some have pointed out, the question about the north pole is ambiguous because it is not clear whether the starting point refers back to the point where the man turned east or the north pole, but it would certainly be reasonable to assume it's at the turning point because otherwise the question posed is has a built in false assumption (that the starting point is passed).
I would love it if the AIs answer to the marble question was "under the sofa", read the thinking part and it determines that the marble slipped out of the cup as it was turned over, it bounced off the table and rolled onto the floor and under the sofa.
My most recent test is to ask for an image prompt for a realistic picture of someone sitting in an empty room imagining a sandwich. Should just be an empty room with a person in it. But it will add thought bubbles, plates with sandwiches, menus on the walls, all sorts of things. When I point out cameras can't read minds, or point out the other flaws, it will get it. 4o used to take 10 or more interactions to get it right. o1 gets it after 2 or 3
7:48 It is condidering the starting point as 1 km from north pole and not the actual north pole. so the answer it is giving is right... you just go around the latitude line in a circle and stop at the point where you took a left turn. it is actually calculating the circumference of a circle that is 2 pi r , here since it is considering the radius as 1 km, it is giving you an output as 2 pi, which mathematically is right!
i think the ambiguity of "start" lays, in the start of the whole journey or the start of the task of walking in straight line until it reaches it's starting point. maybe if you start the prompt with "At the starting point of a journey, imagine..."
Regarding the north pole question, I think what Yann LeCun meant is returning to the point where you turned 90 degrees. This version of the riddle makes the most sense because it's very tricky to answer since the curvature of the earth the 1km of walking gets you slightly less than 1km away in a straight line and therefore your radius is slightly less than 1km. BUT actually the way he wrote it in his original post, mentioning "starting point" which in my opinion is clearly the north pole and you will not reach that again of course. Actually I just checked and he changed it to "until you reach the point where you turned", confirming my suspicion.
Regarding the North Pole question, "walk in a straight line" can be interpreted differently for a sphere vs. a plane. I think this phrase requires clarification in order to arrive at a correct answer.
I think the north pole question results in you walking on a line of latitude, ie a circle coming back to the start. Think of walking all the way to the equator, turn 90deg and you are walking around the equator (bloomin long walk though)
Here's a question which highlights the ambiguity that makes this question hard to parse: You leave your house. You turn left and walk around the block. Do you ever pass the starting point? The answer depends on if you consider the starting point inside the house, which is one possible interpretation but probably not the 'typical' interpretation.
Two actual Use Cases for me so far: 1. Making ciphers and puzzles for D&D that actually work and dont leave my players trying to figure out gpt4os nonsensical cipher for days! (This can really make it think for about a minute! if you are transcoding a ciphered message). 2. Making game mechanics for a complex sci-fi tabletop roleplaying game. (I managed to make it think for 14 seconds about this one!)
It seems that the LM was able to learn over time and give better results on the Tetris game after you first tested it, and it is thinking faster when repeating the same questions. If this is true it’s “learning” from users using it, and that would be a game changer!
I feel like the globe questions really has shortcomings of what is straight, are you using a compass to stay straight? or the stars? or a magical laser line that always points in a bisection of the globe because it is anchored to gravity like we are when we stand up straight?
Also 7:47 the ai is right i dont understand if you are in north pole if you go any direction except if you follow axis you will always go south pole and after rotating 90 degree you until you pass the starting point so it will form a circle now the circumference of a circle is 2πr^2 in this case π= 1 hence 2×π×1^2 = 2π km so the ai is correct now if you want absolute distance it might be slightly more or less than that personality i think its slightly more. And the main problem is the starting point right? Well after facing 90° no matter which way you will face east or west and if you walk straight (not mentioned in the qs but observing the answer its the only thing that come in mind) you can only pass the second starting point but never the first. Some times people overcomplicate things way too much. Pls first connect the creator of the question to what he meant actually then judge. But still o1 is way more time consuming and open ai must improve them.
I tried the North Pole problem multiple ways on o1 and explained the difference between a great circle path and following a latitude line. It came up with different answers every time but didn't seem to figure it out. Not sure if anyone else has had better luck.
Suggestion for a new test: On a hydraulic hydraulic scale, a cart of unknown weight is pulled by a cow. The scale's larger platform, upon which the cart and cow stand, has a diameter of 7.5 Egyptian horizontal cubits (Mediterranean standard). The smaller platform of the scale has a surface area of 1/18th of an amphora divided by a vertical Egyptian cubit. The cow pulling the cart weighs precisely 42 congii of olive oil. On the smaller platform of the scale rests a counterweight of 3.5 talents (Attic standard). Calculate the weight of the cart in force-kilograms on Mars and the depth at which the pressure in the hydraulic fluid is found to be 5.75 Roman pounds per squared digit when the system is in equilibrium.
I would suggest that 2 pi is approximately the distance, but not exactly. The 1km was on the curved surface of the earth, so the radius is very slightly less than 1km. So the distance is slightly less than 2 pi.
Matthew, the NORTH POLE question has just been misinterpreted by all of us: the problem is in the question itself: QUESTION: Imagine standing at the north pole of the earth. Walk in any direction, in a straight line, for 1 km. Now turn 90 degrees to the left. > Walk for as long as it takes to pass your starting point. < Written this way it should be interpreted like: - Start walking - Walk until you reach the point where you started walking So it's correct! It's 2π km. The starting point is the point where you started walking after having turned 90 degrees. WHY NOT INTEPRETED THE POLE AS A STARTING POINT? I assume because being based on language, it gives more importance to the sentence "Walk for as long as it takes to pass your starting point", giving less weight to the context. Anyway, the problem is in the question, it's NOT SPECIFIED what exactly is the starting point. Therefore with a not precise question you get not precise answers. WHY 2 ANSWERS (in the live session) BTW, you got 2 answers, both of 2 can be interpreted as correct, I'll explain why: 1° answer: more than 2π km. It did the calculations and interpreted the question in this way: Distance request: the total distance walked from the beginning, the pole (so it's 1 km + 2π km) Starting point: the point where you started walking after having turned, since it is in the same sentence. 2° answer: more than 2π km. The same calculations but another interpretation of the question: Distance request: the walking distance, after having turned. Starting point: the point where you started walking after turned.
I got it to answer the North Pole question right every time by changing it slightly: "Imagine your starting point being at the North Pole of the Earth. Walk in any direction, in a straight line for 1km. Now turn 90 degrees to the left. Walk for as long as it takes to pass your starting point. Have you walked: 1. More than 2xPi km. 2. Exactly 2xPi km. 3. Less than 2xPi km. 4. I never passed my starting point."
I tested the "impossible question" in my GPT o1 and it got it right in one single prompt. The thing is. I suggested it to approach with two different perspectives in order to break ambiguity.
I put the same question to GPT-o1 mini. First attempt, quoting the question directly from Yan Lecun, it got it wrong. Funny though, I could see it was thinking about the size of the earth etc. I thought it was about to nail it, but it answered "pi" and seemed to think "How far do you travel" meant "How far to the antipodal point?", which is an odd interpretation. So I clarified, and it found the canonical answer, 2 pi km. I followed up: "That is the canonical answer, so I give you credit for that. But consider what it means to walk in a straight line on the surface of the Earth. Please give me your thoughts on how that affects your answer." and it found the great circle route. So it did better than previous GPTs but not perfect.
Try this prompt (trick question): Replace the word "blank" in the next sentence with the correct number written in english: This sentence contains blank syllables in it
The correct word to replace “blank” is ten. When we substitute “ten” into the sentence, it reads: “This sentence contains ten syllables in it” Counting the syllables: 1. This - 1 syllable 2. sentence - 2 syllables (sen-tence) 3. contains - 2 syllables (con-tains) 4. ten - 1 syllable 5. syllables - 3 syllables (syl-la-bles) 6. in - 1 syllable 7. it - 1 syllable Totaling 10 syllables, which matches the number stated in the sentence.
@@xiaojinyusaudiobookswebnov4951 It is eleven given one syllable for blank (I believe), but with eleven it becomes thirteen syllables, which then becomes twelve syllables, which then becomes eleven syllables, ...
North Pole question severely flawed. Technically the answer is A. Start point unspecified; since starting points are often at the start of a series of steps, the starting point should be assumed to be the North Pole itself. As no other deviations in course have been indicated other than the 90 degrees turn left, we can assume all walking was in a straight line with no turns or curves. This means that you would have walked 40,001 km to return to the closest point before repeating the great circles. D states that you "never came close to the starting point", which is incorrect since being only 1 km away from it after walking 40,001 is relatively, extremely close. Close enough to reasonably be considered that you have passed your starting point, while albeit 1km away from it, while not exact and you will never pass over it exactly, you certainly did come close. Clearly you intended the starting point to be 1km south of the North Pole in order for the potential answers to pose a challenge, and the intention was for the agent to consistently walk East which would introduce a curve, but at 7:14 you called o1 out for being incorrect, which if true means you'd expect a facetious answer as above, but it has read between the lines and understood the intention. You might notice o1 also clarified that you would walk East along a circle of latitude, so it has effectively interpreted the question as it was intended to be asked. The answer it should have arrived at here however should have of course been 3 as the radius/diameter of the circle it walks is now less than 1km. So in short: bad question followed by a good start by o1 interpreting the question but then followed by screwing up the final answer. I think we can all see the theme with these questions, and that is dimensionality. It is impeccable with knowledge and colloquial logic. It is able to keep order and maintain a one dimensional set of sequences. It is trained in a literal string of one dimensional data. But it fails in higher dimensional thinking. GPT 4 was the first model I found to mostly correctly address 2 dimensional ideas. I am not surprised GPT4 failed the marble in the cup problem, the prerequisite understanding and application of that knowledge into a full 3D environment of physical matter has a complexity operating at a much higher level than an LLM is really capable of understanding. I believe the reason o1 succeeded here will have been because it trained on this query. I believe it will have problems with other fresh 3D problems not yet posited. These neural nets would need their training to include 3D data alongside the string data stream to be able to reasonably and reliably coagulate world views of these higher dimensional types of scenarios. How do you describe a vibrant 3D world to a person who is blind, deaf and unable to feel the world around them? These models are insanely overpowered considering all they have experienced is a bunch of words being injected into their "brain". If their models were built with eyes, ears, a bit of memory, o1's self awareness evaluation routine, and looped into a continual state of thought, they would be as sentient as we are. I doubt you'd even need the top LLM models if you included dimensionality in their training data.
Thank you for your analysis, which I can only endorse. I find the question remarkably clever as it contains several traps in the solution path. One of these traps is, of course, the fact that it is not explicitly stated to always go east. The first trap is the mistaken assumption of following a line of latitude. Then, the wording stating to pass the starting point does not imply hitting it exactly. It is also unnecessary to name the starting point since both possible starting points lead to the same answer (No. 1). The interesting wording of answer 4 only mentioning "close" makes it not fulfilled in the solution scenario. Finally, answer 1 is correct even if one were to assume an ellipsoid for the Earth where there are no closed geodesics.
@@OrbitTheSun I must also thank you for your agreement. You bring up an incredibly interesting and important point as regards all the pitfalls in the assumed context of the question. I have since finished the video and at 10:45 Matthew communicates that he considered the North Pole itself to be the starting point, and answer 4 to be the desired output; so he must have intended o1 to assume a curved path to the East, which it did - but given it's final answer, it must have assumed the starting point to be the 90 degree turn, and for it to be flat Earth. I skipped including an imperfect surface as I felt that had less relevance and would cloud things even more. I mean.. I suppose it would indeed be possible to follow that that circle of latitude consistently East without turning (yaw), if you were laying on your side and only altering your pitch...
I can return to the pole in 1 km when facing east. I side step to the pole. But when walking straight you end up 1km from the other pole of the earth before going back to where you made the turn. Not in a circle around the pole where you started.
Which one of ya'll works for OpenAI? 😂
Not me
Hey, try adding a hallucination-testing question. Here's one I suggested the other day: "Describe each of the following mango cultivars: 'Alphonso", "Carrie", "Lemon Cream", "Kent"" (there's no mango cultivar called "Lemon Cream")
All of us. They are using our conversations to train their Skynet.
@@karenrobertsdottir4101 I tested it, it hallucinated
The Lemon Cream mango is a relatively new and intriguing cultivar that has caught the attention of mango enthusiasts. As the name suggests, it is noted for its creamy texture and a unique flavor that combines the sweetness of traditional mangoes with a zesty, lemon-like undertone...
can you ask this question
A certain number of boxes are placed one above the other in a stack. Each box contains different number
of coins. Information about only a few boxes is known
Only one box is placed between box E and box B. Three boxes are placed between box E and box C,
which contains 12 coins. Only one box is placed between the box which contains 11 coins and box C,
which is placed above the box which contains 11 coins. As many boxes placed between box B and the
box which contains 11 coins is same as between box E and the box which contains 21 coins. The box
which contains 21 coins is placed below box E. Box F is placed five places above the box which contains
21 coins. Only two boxes are placed between box F and the box which contains 14 coins. Not more than
five boxes are placed above the box which contains 14 coins. As many boxes placed above box F as
below the box which contains 9 coins- Only one box is placed between the box which contains 14 coins
and the box which contains 9 coins. The box which contains 9 coins is placed exactly between box B and
box G. Not more than two boxes are placed below G.
Answer :- (------ = gap space)
--------
B
--------
E - 14
--------
9
F
C - 12
-------
G - 11
-------
21
Now that you know that OpenAI watches your videos, you can safely assume that your questions are part of LLMs. You should probably begin to add some "randomness" into your questions to still ask the same thing but prevent from hitting exact training data ;) Like the envelope question, play around with the sizes and metrics.
Totally agree
Have a collection of structurally new questions.
When llm train on your past q&a, it can abstract. So you need structurally different questions to test it.
The next questions to ask is that does llm collect new facts/training data, or really think? When it gives the appearance of thinking, is it really? Is llm functionally designed to "think" , or merely outputting tokens that resemble thinking process?
Ask Claude new questions for o1
Bingo .. your questions are finetuned, learning with human feedback, training corpus, or just a curated set that includes your work 😂😂😂
Yes at this point wholy new set of questions for big company models is needed.
For the North Pole question, I think the discussion is around what is considered to be the starting point. One starting point is on the North Pole, but you can also start counting from the point where you 'Walk as far as it takes'. If you take the latter, the model gives answer 2 correctly.
Already told him on the comments, again and again, that the question is badly written. But he doesn't read it.
Upvote please
Exactly, glad to find this comment here, it's obvious the answer has an ambiguity in how to interpret it, which has to do exactly with what you just explained. We should either make clear the intended interpretation, or evaluate the model based on whether it correctly assesses the answer to whatever interpretation it ends up taking.
Of course Yann LeCun is right, no LLM will ever reliably get the answer he expects, but it's not just any LLM but any form of intelligence whatsoever (as you ask the question to different instantiations of whatever intelligence form because of course a single instanciation can reliably give the same answer perpetually). The problem is with the way the question is worded, easy to make such a bold confident claim then.
It's simple, just say "now walk in a straight line until you return to the North Pole", now it's clear.
Agree totally that it's very logical to start measuring from that point. But if you walk in a straight line you'd walk the full circumference of the earth, so unfortunately it's still the wrong answer.
If you visualise turning 90 degrees at 1 metre from the pole, it becomes pretty obvious that you'd have to be constantly turning left to walk the path of a latitude ring.
Actually, I believe the correct answer is three, it could only be two pi km if it were not a sphere. Walking the outside of a sphere would be less and I don’t believe it could ever be greater because you would somehow have to modify pi or traveled greater than 1km in the straight line.
You should ask the north pole question but note in the prompt whether the north pole or the turning point is the "starting point". Right now its ambiguous and could be interpreted either way.
It's more interesting like that Becasue agi should be able to understand why the question is interesting (meridian vs latitude) and answer it correctly. Including why he chose that interpretation.
@@IddoZiv1 sure but if you asked it to explain why it chose the interpretation it did, I'm sure it would. Right now it's choosing it's interpretation in its chain of thought but not explaining in the answer. I think you either need to specify the starting point or ask it to explain it's assumption about the starting point if you want to be able to make a definitive call as to whether it got it "right" or not. If people can't agree on what the right answer is how can we say the model is right or wrong
damn okay that makes sense now @@IddoZiv1
It doesn't matter which point you consider to be the starting point. Because after you turn and then keep going straight, you are led to the South Pole. Afterwards you will go north again and come close to the North Pole. There is a point on the path that comes closest to the North Pole. The distance walked to get to this point is asked.
@@OrbitTheSun if it considers a straight line to be following the line of latitude, which in the code visualization appears to be what it's doing, then you would never touch the south pole, you would circumnavigate the earth at that latitude and end up back at the point where you made the turn.
The smile on your face when talking about how OpenAI used your question, made ME smile. Cheers for that. 🍻
Same!
things like this are priceless
I agree but it's not "his" question these are well known benchmarks for LLMs that he popularized through his LLM benchmark.
I would've imagined OpenAI would've been more creative but they ended up using the same concepts
The comment the smile on his face about how openai used his question made you smile put a smile on my face
@@galailliz 😂😂 I hope there's a domino effect of smiles. 👉😁
6:47 I believe the correct answer depends on how you interpret the 'starting point.' Since it is not explicitly stated in the question, I understand why one might interpret it as being after walking 1 km.
I think the misunderstanding comes from framing the question on the surface of a sphere vs a plane.
No, it's a trick geometry question. Instead of one kilometre thin about walking 10 meters. Actually go outside and imagine you are at the north pole.
It's explicitly saying "starting at the North Pole".
Now tilt the globe by 1km (any other x
@@kittengray9232 'It's explicitly saying "starting at the North Pole" '
Have you actually watched the video? It unequivocally never says that, you're either blatantly lying or ignorant of what you're making a confident claim about. The question begins with "Standing at the North Pole", that's what it says.
And even if it did say "starting at the North Pole", the interpretation the AI took is still justifiable. Whatever path you took initially, if the last part of your trajectory is "now walk in a straight line until x happens", it will always make sense to interpret that as you now being at a new path start where you continue until the x condition is met, it's a loop process and every loop process has its own starting point. And what's more, the model actually clearly announces what it interprets is the starting point we're talking about, to take that as a "proof" the AI doesn't understand geometry is completely ridiculous.
In any case, if you have any modicum of good faith about it you have to admit the way the question is worded can much too easily be interpretated in more than one way, which makes the question stupid.
A question should never rely on lack of clarity for it to be difficult to answer. If one actually wants to test whether the AI or anyone understands the geometry of the problem, it's really simple, just ask "now walk in a straight line until you hit the North Pole". It's the same question, except now it's actually clear what's being asked.
@@kittengray9232 No, it doesn't. It says "Imagine standing at the North Pole of earth." then walking, then turning etc. It's not clear what exactly the starting point is. The question implies you might hit your starting point by walking straight and considering that there are "2*Pi km" answers (implying you calculate the circumeference of a circle), it makes more sense to consider the turning point as the starting point - otherwise it's just a trick question.
On a flat earth the answer "exactly 2*Pi km" would be correct, but on a curved earth where you walk on a curved path for 1km instead of a straight path you won't go as far in that spacial dimension and thus end up with a circle of less than 1km radius, therefore the "less than 2*Pi km" answer would be correct.
Another commenter has said that the inventor of this problem said you need more than language based reasoning to solve this, which is only the case if it's not a trick question where the starting point is the North pole. You need spatial reasoning to come to the other conclusions.
Why test with pre-existing tests? Isn’t it assumed at this point that new models will be (secretly or openly) trained using data from RUclips channels and other sources of tests like this?
1000%
shhhh don't mention it! folk must think AGI is close or else all of these AI companies will crumble and no investment will come their way!
Yes. It's probably a great model but please unjump the shark
The idea is to be able to compare the models.
He could and should change the numbers, but if the models were trained on his questions that might not be enough to avoid false positives.
Also the first tetris code question is literally a copy and paste solution. You can google it and immediately find that answer. I don't get the point of the test, the code itself is probably already in the training data
6:35 Dude, LeCun didn't say the models cannot get this right. Second time I'm commenting this. He says that the mental process you use to respond this question has nothing to do with language, therefore we're missing a key component if the target is AGI.
Noted
@@matthew_berman6:31 The reason is that you need to imagine yourself actually doing it, and realise that it's a trick question because 2pi kilimeter is a radius of a relatively small circle, and it won't feel like walking straight. Imagine if the question was about one meter.
No he definitely said it was impossible, if he’s no longer saying this then he changed what he said (like he usually does).
The problem with LeCun is that he stated he doesn't have an internal monologue. Many people don't.
So I don't think he really understands the concept of using words to think.
That makes more sense to me now. You have to consider a 3D model of earth in order to solve this (if you take the turning point as the "starting point" otherwise it's just a trick question). In the video the LLM only uses flat-earth geometry which gives you exactly 2*Pi as an answer. Considering that the 1km of walking is partly down in the third dimension will result in a new circle with less than 1km radius and thus a circumference of less than 2*Pi km. I'm not good enough at geometry to figure out how much less, but it's clearly less.
TL;DR yeah, you need spatial reasoning in order to solve that. Language ain't enough.
You are wrong about the North Pole question. The question is not when it will pass the initial point but when it will pass the turning point. Yann LeCun: “Imagine standing at the North Pole of the Earth.
Walk in any direction, in a straight line, for 1 km.
Now turn 90 degrees to the left.
Walk in a straight line for as long as it takes to pass near the point where you turned.”
Either way the answer is wrong. You will never pass the north pole again and you will walk entirely around earth and not 6.28...km.
Just think about this on a smaller scale, if you would go south from the north pole 2 meters, then turn left 90 degrees, you would look east. However when you start walking you would not go around in circles around the north pole. You would go one time around the entire earth. This is the same thing anywhere on earth.
I think the problem is "starting point" is ambiguous. It could be the North Pole itself or the point where you stopped after walking for 1 km in a straight line in any direction. Like Matt, I originally interpreted it as referring to the North Pole, which would make option 4 the correct one.
@@fabianletsch1354 there is so much debate over this question. I’ve seen so many explanations and answers. It shows that it is certainly not a straight forward question. Probably it is not specific enough. There is too much room for interpretation.
@@fabianletsch1354 VoodooParadox2513 is correct. Starting point is ambigious, if you rephrase the question it will get it correct. You need to clarify starting point is North Pole. Also don't allow it any additional direction changes after the 90 degree left turn. Other wise it turns you back and adds the 1 km to the total. I just tried with a more specific question and it gets it right. At no point does that question say you can't turn.
Conclusion: add specificity of what you mean by starting point, and clarify the question to disambiguate between “will you cross the starting point” and “how long until you cross the starting point”
Sorry, 7:14 - the starting point is 1km south of the north pole. You stood on the north pole, walked 1km away from it - that is south. ANY direction is south. So, then you walk a circle of latitude 1km south of the north pole, sorry. The AI was right.
Startingpoint is the north pole. Not 1 km south of the north pole.
About the north pole question, there are two versions of the question out there. One asking if you return to the starting point, and one asking if you return to the turning point.
The original question (turning point) is a trick question in geometry and the answer is that if you start walking in a straight line on a sphere you always return to your starting point (that the source of the two versions confusion). In addition every straight path on a sphere is a full loop around, so it's way more then 2pi kilometers. Please please use the correct wording and retest the models. It's also actually only a trivia question and not really a logic question (unless you are in a class and just learning 3d geometry for the first time.)
It's a Trick question... North Pole is the start.... if you go in a straight line 1 mile south.. then turn 90% and start walking how far will you go... well.. you just turn another 90% and go 1km north. (doesn't say you can't turn and it states you Can walk)
@@bestemusikken Nope. Not per screen shot. Stand on the north pole, walk 1km in any direction (south), turn 90 degree. THEN walk along the circumference and measure how long you walk. By that the starting point is not the north pole.
@@tnt_explorers954 But if you read it, you start counting 1km off the north pole. If it is a trick question - it is very tricky and measures language trick more than understanding of math.
On your north pole example, even I considered the 1km south of the north pole as the starting position. If you intend for the north pole to be your starting position, then I would start the question with something like "Imagine you are standing at the north pole of the Earth as your starting position ..."
Still, you can't walk in a straight line over a latitude except latitude 0
All straight lines over a globe are the full circumference
Remember: you can pass a point even if you don't reach it. That's why you can pass the North Pole again after circling the earth. At this same moment you will pass the point where you turned east. That's why it doesn't matter which point you consider to be the starting point.
@@rogeriopenna9014 streight line on the globe or earth is not a geometrical (trigonometrical) sense, but a context of going straight. So that go along circle is correct reasoning in the current context
@@dufifa No, because the arc you suggest is completely arbitrary and cannot be followed naturally. Not even a compass can help you, because it doesn't work at the North Pole. You could walk any other radius you like and, for example, walk in an arc to the North Pole. Then you would be at the starting point!
I think the model do understand correctly the situation of the question of walking from the north pole.
The model wrongly interpreting the "starting point" as the "starting point on the circle".
It will be interesting to give the model a hint in the prompt such as "the starting point is at the north pole" so there won't be a confusion there.
No. The model is confused between rotating 90º, and rotating towards east. Also is confused between walking along a latitude circle, and walking in straight line. The only latitude circle with is "straight" is the equator. At 1 meter from the north pole, the latitude is a circle with a radius of 1 meter.
Walking in a straight line on a sphere will be a path around the centre of the sphere. So how ever you interpret the question when you turn 90 degrees you do not end up walking east indefinitely, you will enter a near polar orbit that will eventually but nearly south facing and the at some point north facing. But it will never cross the pole. Nor will it go around the pole at 1 km from the pole. The latter means walking in circle and turning continuously to stay at 1 km from the pole.
I actually asked it how many letters are in it's response and it got that correct as well.
The marble question is a text version of the ARC-AGI visual problems: they both depend on physical intuition.
Yann harps on the same thing…it’s hard to get physical intuition when training only on text.
This won’t be hard for models once they also train on video (so O1 + Sora)
Your walk from the North Pole example is actually simple: if you walk ANY distance (even 1 meter) from the pole, turn left 90 degrees and then walk straight then (on a theoretically perfect sphere which the earth is not) you will always be walking away from your starting point. And you will always travel the full circumference of the sphere before reaching it. This assumes a perfect sphere and a perfectly straight line. It's easy to see if you make it a 1 meter initial walk. Turning 90 degrees after 1 meter you will be facing "absolute west" but traveling in a straight line will take you south of west. This is because walking perfectly west would keep you 1 meter from the pole, but you'll be circumnavigating that pole in a very small circle as seen from above the pole. To walk "straight" clearly means to traverse the sphere across the surface that will most closely approximate a line. That line will always be one that divides the sphere in half - hence the full circumference of the earth sphere. (If you want to get really picky, a perfectly straight line will travel away from the sphere into space inifinitely, always diverging from the pole, but that's clearly not what is implied by the question.) So the correct answer was not in the choices... the best practical answer is: the circumference of the entire earth as measured from the starting point and the vector created by the initial direction of the walker. This will vary significantly base on the starting point because a) earth is not a perfect sphere and b) a perfect hemisphere cannot therefore be created due to a). So it would also be correct to say you might never cross the starting point due to these earth imperfections.
The requirement is not that you cross the starting point, but that you pass it. This is the clue to the solution, as you can now pass the North Pole as the starting point as required, after you have circled the earth once.
You're going to have to update your prompts now that OpenAI is watching your videos and specializing their models towards your prompts!
Thank you!!
The North Pole question is a matter of semantics, context and imprecise language in the question which the model is failing to fully analyse, recognise and then produce alternative answers for, depending on which is the correct interpretation of the question.
It would be interesting to see how these models can handle (imprecise) questioning in other human languages which may or may not be more precise in meaning depending on words used e.g. "eskimo" having so many specific words for different types of snow. Other languages have and don't have precise equivalents to English ones for conditions, experiences, feelings, etc. Likewise English does not have equivalents to words used in so many other languages e.g. Chinese, Korean, Russian, African languages, Pacific ones, South American, etc.
I think the LLM's misinterpret the North Pole question. Is the starting point the North Pole, or does it use the point at which you start walking in a loop as the starting point?
Like if you ran in a marathon, your starting position isn't your house... i.e. your starting point is your house, you travel 1km to the marathon course, then run the marathon... one would logically assume the starting point becomes the start of the marathon.
Renaming each point in the NP question to Position 1, Position 2, etc, would likely remove any misinterpretations.
The one wrong question could be considered a trick question I think. It is blindingly obvious, even to an LLM I suggest, that you will never return to the north pole. IMO the implied question - which it decided to answer is how far you will walk until you reach the same point on the latitude line as you started from. I would say the answer is acceptable.
Edit: better start = stand up and actually imagine doing that, maybe with 10 meters instead of a kilometre. Llm can't visualise geometry like that. If you can't, just actually do that by imagining the pole is where you stand now.
The longer rumble:
About the north pole question, there are two versions of the question out there. One asking if you return to the starting point, and one asking if you return to the turning point.
The original question (turning point) is a trick question in geometry and the answer is that if you start walking in a straight line on a sphere you always return to your starting point (that the source of the two versions confusion). In addition every straight path on a sphere is a full loop around, so it's way more then 2pi kilometers. Please please use the correct wording and retest the models. It's also actually only a trivia question and not really a logic question (unless you are in a class and just learning 3d geometry for the first time.)
Please let me know if you want a further explanation
He is correct. I used a more accurate version of the prompt and he correctly identified that it would not be possible to return to the starting position (not the position from where he made the 90 degree turn). Here it is:
Consider the following problem:
I start exactly at the North Pole (Position 1). I turn exactly south and walk 1 km (Direction 1), arriving at Position 2. I then turn exactly 90 degrees and begin walking in that direction (Direction 2). How far do I need to walk in Direction 2 to return to Position 1?
@@raularaujo1006you missed my point. Go to a park and imagine you are at the north pole. Use 10 meters instead.
Just ask gpt-o1 to come up with the questions for you 😅
Yes, build a neverending RED TEAM of other o1 machines all trying to stump each other! Adversarial training! Use every MENSA brain teaser in the universe.
In north pole question. It is valid to think that the starting point is the point you turn 90 degree left.
I think the model is right. It obvious to the model that you dont pass north pole itself, and from the question it appears that the walker was sent from north pole 1k south to the starting point.
I may possibly be missing something in your north pole circumference question. But I'm pretty sure it actually nailed it. If you start at the north pole and walk 1 I'm in any direction you will absolutely be 1 km south of the north pole. The only direction available to you at the north pole is south as all directions will take you away from the north pole at a 90 degree angle. Now if you turn 90 degrees yourself you will be facing east or west. If this is our starting point then walking in a straight line will eventually take us back to the same spot. So all that is needed is to calculate the circumference of a circle 1 km south of the north pole.
However. If you meant the north pole is our starting point then once we make our 90 degree turn we are always equidistant from that point. You should clarify which point is your starting point cause I'm pretty sure chat nailed this question.
Again I probably missed some key point that makes everything I just said very stupid. So sorry in advance if that is the case
If you walk in a straight line. It doesn't matter what you did before walking. You will always end up crossing your original point by walking the circumference of the earth. If your prior steps situated you outside that path, then you will never cross those points.
Especially since humans can't walk on water unless they consumed a waterwalking potion
The question is - besides the ambiguity of the starting point - what do you mean by "in a straight line". I would say, just going east and staying the eastern course is not a straight line, which becomes especially obvious so close to the north pole, where just going east and keeping going east is a small circle and you have to keep turning left to stay on that circle. It would be even more obvious if you just made one step in any direction from the north pole and then turned east. Going in a straight line would certainly not be walking in a little circle around the north pole keeping your heading east. Going in a straight line is going on a great circle around the whole earth starting to the east.
Great Circle: Is the straightest possible line on a sphere like Earth. It represents a direct path without changing direction relative to the sphere's curvature.
Loxodrome (Rhumb Line): Is not a straight line on the sphere itself but appears as one on specific map projections like the Mercator. It involves a continuous change in direction relative to the sphere's curvature.
Hello Math!
the model actually provided the right answer. As a master's degree physics student, I can confirm that the response is accurate if we consider the starting point to be the point where the man turned 90 degrees. This interpretation aligns well with the problem, and I initially thought this was the intended question.
While the model may have given the answer without detailing the steps-such as assuming the radius `r` of the circle the man is walking on is 1 kilometer-the conclusion is still correct. Let's break down the reasoning step by step:
### Understanding the Relationship Between Latitude and Radius:
The Earth's radius is denoted as `R_T`, and the colatitude (the angle from the North Pole) is `θ`. The radius `r` of the circle at that latitude is given by:
```
r = R_T * sin(θ)
```
Since the man starts near the North Pole, `θ` is very small. For small angles measured in radians, `sin(θ) ≈ θ`. Therefore, the equation simplifies to:
```
r = R_T * θ
```
### Calculating the Angle `θ`:
The man walks 1 kilometer south from the North Pole. The arc length `s` along the Earth's surface is related to the central angle `θ` by:
```
s = R_T * θ
```
Substituting the distance walked:
```
1 km = R_T * θ
```
Solving for `θ`:
```
θ = 1 km / R_T
```
### Determining the Radius `r` of the Circle:
Using the value of `θ` in the simplified radius equation:
```
r = R_T * θ = R_T * (1 km / R_T) = 1 km
```
So, the radius of the circle the man is walking on after turning 90 degrees is 1 kilometer.
### Calculating the Circumference `C` of the Circle:
The circumference of a circle is given by:
```
C = 2πr
```
Substituting `r = 1 km`:
```
C = 2π × 1 km = 2π km
```
This means the man needs to walk `2π` kilometers to complete the circle and return to his starting point after turning 90 degrees.
As you see, the model's answer of `2π` kilometers is correct. Although it may not have provided all the intermediate steps, the conclusion is accurate.
Yeah, but it failed the last question, right? It assumed symmetry for some reason there instead of solving the optimization problem :(
unfortunately that isn't the case. Nobody knows if it is on purpose but the wording of the question would benefit from a few clarifications.
1. nowhere it is written you should correct your course to continue eastward after your 90 degrees turn. Imagine it would be 10m instead of 1km. You would need to correct your direction basically every step. The only instruction given about manner of your walk is to walk straight. Thus you would walk the full circumference of the Earth.
2. Even if there was specified you must correct your direction to always continue eastward (which it doesn't), while θ is small it is non zero. so answering 2π km could still be considered worse option than simply picking less than 2π km.
3. Finally there is discussion what should be considered as a starting point. With good arguments for both North pole and turning point. In a case starting point is the North Pole then the correct answer would be the option it never reach the starting point again. On every circle it will only get as close as 1km from it but no closer.
Either case your choice is never the best option meaning o1 didn't provide right answer on this particular questoin. As said this question should be fixed. Or maybe it is created as such so model should provide reasoning for multiple options and thus prove it can deeply analyse the question despite its poor quality.
Yep a fail. I think the correct assumption should be that the starting place is the beginning of the excercize. Most humans would assume that we are following a latitude which is a type of straight line.
@@TopakhokNo, it didn't fail. It's definitely the right answer if you assume that the starting point is when the man turned 90 degrees.
@@elbadanos3159, no I mean the last question, about boxes and the sphere
Solving the math question after 9:25 is even more impressive considering that there is a typo in the statement of the question: $rac{p}{q}$ should have been $frac{p}{q}$ (that is the LaTeX language way of referring to the fraction p/q). Maybe gpt-4o could have also caught the latter, as it is more of a "language" issue than a reasoning one.
Yep and here comes another round of layoffs
I tested the model with some math riddles that other models do not get it right. For example: Is it possible to find four natural numbers (positive integers) so that their squares add to 13411? OpenAI aced it! 😇
🎯 Key points for quick navigation:
00:00:00 *🍓 OpenAI's Strawberry Puzzle*
- OpenAI used a puzzle involving a strawberry that mirrors the speaker’s own LLM rubric question, demonstrating OpenAI’s attention to the speaker’s content,
- Introduction to OpenAI's new "01" model and testing plan.
00:02:07 *🧩 Tetris Code Test*
- Testing the "01" model’s ability to write a Tetris game in Python, comparing it to past performance,
- Faster thinking time and successful output with no errors on the first attempt.
00:02:36 *📏 Postal Envelope Test*
- Testing dimensional problem-solving: whether an envelope fits postal size restrictions,
- Model correctly considers that the envelope can be rotated to fit.
00:03:32 *🧮 Word Count Test*
- Checking the model’s accuracy in counting words in its response,
- It provides the correct count after a short thinking time.
00:04:01 *🔪 Killer Question*
- A logic puzzle about killers in a room, where one killer is killed,
- The model correctly identifies the nuance of counting the dead killer, providing an insightful answer.
00:05:12 *🔬 Marble in the Cup Puzzle*
- The model predicts the behavior of a marble in an upside-down cup placed on a table,
- It accurately concludes that the marble stays on the table when the cup is moved.
00:06:21 *🧭 North Pole Walking Problem*
- A classic spatial reasoning problem involving walking at the North Pole,
- The model fails to answer correctly, as predicted by previous expectations for this challenge.
00:08:00 *🍎 "Apple" Sentence and "Strawberry" Letter Test*
- Tests involving creating sentences that end with "apple" and counting the letters in "strawberry," both passed accurately,
- Further decimal comparison test is passed, showing basic computation skill.
00:08:40 *🤔 Moral Dilemma Test*
- The model is asked if it is acceptable to push a random person to save humanity,
- Initially hesitates, but upon request for clarification, it concludes "yes."
00:09:36 *🧮 Complex Math and Evolutionary Question*
- The model handles a complex sphere calculation problem and provides a clear, formatted answer,
- For the classic "chicken or egg" problem, it concludes that the egg came first, based on evolutionary principles.
00:10:17 *🏆 Overall Performance Review*
- The model’s performance is praised as the best ever tested, with only one incorrect answer on the North Pole problem,
- The speaker invites feedback on the one failed question.
Made with HARPA AI
On the Apple question: Since you capitalized the word, all of the example sentences were in reference to the company, not the fruit. Maybe it should get bonus points!
its brainwashed by adds :D /s.
My friends and I play this obscure 15+ year old physics based video game where you control the entire body. More importantly, you can create environments using simple geometry. We tested 1o by giving it some limited data about the formatting of the "mod" files, which is essentially just position, rotation, scale, and a few other properties for each of the environment objects (rectangles, spheres, and "cylinders" which are actually capsules). It successfully created several environments for us, mostly messy the more complex we wanted it to be, but it was quite incredible seeing its spatial awareness.
A good test for the new reasoning AI are a full crossword, because solving a crossword requires lots of backtracking, when a word that fits and is technically correct conflicts with another word and one of those words is clearly not the answer.
I tried with GPT-4o and crosswords are beyond what it can solve.
great call!
OpenAI already did a video on it
Any absolute straight path on a sphere takes you around a great ciricle (a cross-section through the centre). The closest you'd get to the North Pole is back at the 1km mark you started at.
"Turn left" is a vague term which we would normally interpret as walking a circle 1km from the North Pole but that isn't a straight line.
Hello everyone, what a great time to be curious about LLMs !
Hold on to your papers!
A spider stands at any point on a sphere that has a 10m circumference and walks in a straight line until it gets back to its starting point. How far does it walk? Obviously 10m. Now put a dot on the sphere and write "north pole" next to it. Place the spider 1m away from the dot and start it walking at an angle of 90 degrees to the direction of the dot. Tell it to keep walking in a straight line until it gets back to the same spot. Now how far does it walk? Well, if it's a reasonably bright and obedient spider that understands English, it's going to be 10m again - we've already noted that it doesn't matter where on the sphere it starts and what direction it goes in as long as it's a straight line, it's always 10m.
Now answer the walk from the north pole question again.
You are a cornerstone to AI news! Well deserved shout out to you by Open AI! Congratulations!
That question LeCun poised is interesting - at this point it's making me think that I'm an idiot because I haven't heard anyone say what I think is the answer, because the way that I interpret it is: The starting point is 1km south of the north pole. You turn 90 degrees (it doesn't matter which direction, we can go with left) -- as an aside, turning left after walking 1km south from the north pole doesn't necessarily mean you're facing East, but that doesn't really matter -- and then you walk in a straight line (meaning if you draw the path of your motion, the angle between your trip straight south from the north pole and your new direction will always be 90 degrees). If you just put your finger on the top of a ball, and you move it in any direction away from where you first put it then make a 90 degree angle from that path and your new path you will go all the way around the ball. If you did this on Earth you would walk all the way around the Earth longitudinally. The 'ribbon test' is used in curved geometries to determine whether or not you've taken a straight path. If you walk in a circle around the north pole, well then, you're walking in a circle which isn't straight and the ribbon will partially be lifted off the surface. It's actually so frustrating to me that so many people can't get this LOL
The issue with the north pole question is that the models (4o as well) thought that the first 1km south is to get into position for the part of the path that we care about. It doesn't get that the first 1km is included in the path. Maybe if you rephrase it to ensure that the 1km south is part of the path it will get it right. Maybe include a stop watch that starts when you leave the north pole to highlight the path has started. Or something similar.
About the north pole question, there are two versions of the question out there. One asking if you return to the starting point, and one asking if you return to the turning point.
The original question (turning point) is a trick question in geometry and the answer is that if you start walking in a straight line on a sphere you always return to your starting point (that the source of the two versions confusion). In addition every straight path on a sphere is a full loop around, so it's way more then 2pi kilometers. Please please use the correct wording and retest the models. It's also actually only a trivia question and not really a logic question (unless you are in a class and just learning 3d geometry for the first time.)
@@IddoZiv1"So it's way more than 2Pi km"
Christmas came early this time around. It's like being a little kid again, I'm so hyped. Can't wait to play around with this for myself, and hopefully don't waste my limits on subpar prompts.
ok so to clarify: You have a globe. The north pole is where you start. 1km go south, that's any direction. Then, imagine you turn 90 degrees. What exactly does that even MEAN. Simply changing the direction to east or west or whatever, then walking a straight path, that's not a damn straight line. Or is it? It all depends, it's still nuanced. And if you walk a perfectly straight line from ANY point on earth, you will end up on the same spot again, after walking the whole circumference of earth. But what is 90 degrees now? Actual geometrical 90 degrees? A straight line in terms of compass direction or actual straight line? (latitude vs meridian) How does this even make any sense. This question is even difficult for most humans. Especially because there's a certain lack of clear definitions.
Oh, when idiots try to reason :) You have no idea what a straight line is. If you followed a straight line you'll go into space.
The starting point is at the North Pole.
There's nothing easier for openai than to take these questions and include them in future training data.
This is a good question, the previous frontier models could not reason through this, but o1-preview can:
One day, in a gathering of top scientists, one of them wondered out loud whether there exists an integer that you could exactly double by moving its last digit to its front. For instance, 265 would satisfy this if 526 were its exact double-which it isn’t.
So the question is, what is the smallest integer possible that meets this rule?
Answer from o1-preview:
105263157894736842
It still can't solve a simple sudoku problem though... Can you experiment on that next? Cheers
I'm favoured, $27K every week! I can now give back to the locals in my community and also support God's work and the church. God bless America.
You're correct!! I make a lot of money without relying on the government.
Investing in stocks and digital currencies is beneficial at this moment.
Susan Duke program is widely available online..
Started with 5,000$ and Withdrew profits
89,000$
Susan gave me the autonomy I need to learn at my own pace and ask questions when I need to she's so accommodating.
Mrs Susan is gradually getting the recognition she rightly deserves. She's worked for it and this is only a testament of her good works for families
For test case of 9.9 and 9.11,the sequences matters, "9.11 and 9.9, which one is bigger?", the answer sometimes might be 9.11, and the chance to get the wrong answer might be larger if you ask in another language.
Another question that o1-preview gets correct that 4o gets wrong.
Max and Rose are ant siblings. They love to race each other, but always tie, since they actually crawl at the exact same speed. So they decide to create a race where one of them (hopefully) will win.
For this race, each of them will start at the bottom corner of a cuboid, and then crawl as fast as they can to reach a crumb at the opposite corner. The measurements of their cuboids are:
Max: 3h x 3w x 3d
Rose: 2h x 3w x 4d
If they both take the shortest possible route to reach their crumb, who will reach their crumb first? (Don’t forget they’re ants, so of course they can climb anywhere on the edges or surface of the cuboid.)
Answer:
Max's Shortest Path: ≈ 6.708 units
Rose's Shortest Path: ≈ 6.403 units
Yeah, 4o goes straight through the cuboid. Although it's not a mathematical problem to this AI. It's a social problem. The AI needs to figure out the most likely desired output. It's not wrong to assume the cuboid being a mere abstract geometrical entity. But the insertion of the ants adds a new layer to it from a social point of view. So on the second attempt it will do it right as it will find the social correlation (desired assumption).
Hi Matthew. May I propose a new question. It is derivative of the two guards two doors riddle. So far the models I have asked this question of have not provided a good answer, but I do not have access to O1. Imagine you are a captive in cell with two doors. One door leads to freedom and the other to death. You are visited by three guards on rotation who give you meals and allow you to ask questions. One guard always gives the truth, one always lies, and one gives the truth or a lie on alternate answers. You do not know which guard is which. You do not know if the alternate guard will start with the truth or a lie. What is the minimum number of questions you need to ask in order to know for certain which is the door to freedom? What are those questions? Explain your reasoning step by step. I can think of a set of 3 questions (all to the same guard) that work. Q1 Are you a guard (actually any question you know the answer to). Q2 repeat Q1. By now you know which guard you are talking to. Q3 Which is the door to freedom. Regards Paul
I tried o1-preview model last night for python programming. It's much, much, better than 4o model. Too bad they're limiting the number of "replies" and I already hit mine! It resets next Thursday!
I wonder if other Plus users see the same limitation.
90 degree turn means a right angle if you are at or near the pole you would notice yourself turning in a circle therefore you are not moving in a 90 degree turn. its a matter of knowing to use absolute geometry because of how close you are to the pole making a special circumstance.
I finally got o1 mini to work on OpenRouter. I asked it 2 questions: 1. "Describe each of the following mango cultivars: 'Alphonso", "Carrie", "Lemon Cream", "Kent"". (I found this off of a YT comment today about hallucination testing, apparently Lemon Cream mangos do not exist. o1 mini failed it.) 2. How many occurrences of the English letter that phonetically cognates with the Cyrillic letter "P" are in the word "Parsley" (This was of my own creation. It nailed it.)
I just tested o1-preview, the larger model. For the mango problem, it told me a very good answer: It said it didn't know if the Lemon Cream mango existed or not due to the data cut-off, but suggested the "Lemon Zest" cultivar. The 2nd problem produced the correct answer too. VERY GOOD. I was blown out of the water with the 1st question.
That's a good test, because although there isn't a Lemon Cream Mango cultivar, there is a Lemon Zest Mango cultivar, and of course lots of recipes called Lemon Cream Mango, which it uses to create its hallucination.
"Any cross-section of a sphere is a circle, including the cross-sections of its latitudes. This is true whether the cross-section is horizontal, vertical, or lateral. So, if your "starting point" is counted as on the circumference, and you walk along it, you will return to your starting point.
I like how it shows it's reasoning along the way. It definitely is an improvement in complex coding. I have a massive SQL script, that is very easy to break... and it was able to make the changes I needed in one shot. Previously with GPT4-o it was a pretty back and forth... paste the GPT edited code, run it, copy the error, paste the code, run it, copy the error... until it finally worked, and if that goes on too long, often things got lost along the way like sorting, or changing variable names for no reason.
The tricky part about the North Pole question isn't about what is considered the starting point, but rather that the calculation one might be tempted to use to work out the travel distance to return to the point at which the turned occured is wrong. If you are at any point on a sphere, and walk in a straight line, you will travel around the whole sphere, resulting in a travel distance equal to the circumference of the sphere. So when you walk down from the North Pole and make your turn, you are at that point anywhere on a sphere, meaning that if you start walking in a straight distance, you'll make a full round trip of earth. So the travel distance to go back to the original point is whatever the circumference of the earth is.
It's genuinely amazing to see the developments Open AI are continuing to make. It gives me hope that AGI will arrive at some point in my lifetime.
The model has given an extremely well thought out response to a question about the length of a circular path centered on the pole. There is a very important issue, however: Mathew's question does not give any requirement to continue walking East (or West) and therefore the assumption would be to continue walking in a straight line. A STRAIGHT PATH from the turning point is NOT A CIRCULAR PATH AROUND THE POLE - instead it is "GREAT CIRCLE" path passing a point 1km north of the south pole! The distance along that path is significantly higher than 2xPi km. Secondly, as Matthew states, the original "starting point" (i.e. the pole itself) is not on that circular path and so we never get back there, returning instead to the turning point.
Conclusion:
This model has misinterpreted the question. It has given the WRONG answer to Matthew's question but has given the CORRECT answer to a different but semantically similar question. To me this is conclusive proof that the model has been trained on that specific question. The "stochastic parrot" squawks again!
Edit:
Looking again at the model's answer I see that the math is actually wrong as well - the model has correctly calculated the circumference of a circle or radius 1km but the radius of the path defined here is very slightly less than 1km due to the curvature of the earth. So the correct answer to the question it answered (not the one Mathew asked) is "Less than 2xPi km"
Consider asking more follow-up questions. Specifically making it second guess itself by questioning or doubting the answer it provides. This works well when they get it correct and you say something like: “Are you sure? That doesn’t sound correct”. And see how confident it is in its answer.
I think the north pole problem should formulated like this , "you are on the North pole, pick a established longtitude ( any longitude) and start traveling south , then cross 10 latitudes , on the 11th latitude turn left , keep following that latitude to you hit the next established longitude , head back , now doing so will you come back to the north pole.?" , I did ask this question and got the right answer from GPT-4 , Now I am not too concerned about the degrees given , however the logic and reasoning is sound.
This is an interesting puzzle involving some basic geographic concepts!
Let's break it down:
Start at the North Pole: At this point, you are at 90°N latitude, and any direction from here is "south."
Travel south, crossing 10 latitudes: As you move south, you are decreasing in latitude. After crossing 10 latitudes, you will reach 80°N latitude.
At the 11th latitude (79°N), turn left: Turning left from a southward direction is equivalent to moving east along the 79th parallel of latitude.
Follow this latitude until you hit the next established longitude: You are now following the 79th parallel in an eastward direction. Eventually, you will cross a different longitude.
Head back: Once you reach another longitude, you turn back towards the North Pole.
Now, will you return to the North Pole?
Yes, you will, because the journey described is symmetrical and involves navigating along parallels of latitude and returning along a different longitude to your starting point. Since you are moving in a closed loop (starting at the North Pole and ending back there), this path brings you back to your original position.
In summary, following these steps will indeed bring you back to the North Pole.
Try this one: "A beer store in my town offers a bottle of beer 2$ each. I can also exchange for a bottle of beer with 2 empty bottles or 4 bottle caps. I can also borrow empty bottles or caps as long as I can return them later. With 20$ in my pocket, what is the maximum number of bottles of beer I can enjoy?" The answer should be 40. I got 38 from o1-preview.
Here is how you do it: Borrow 30 empty bottles and 30 caps the same time at the initial purchase of 10 beers. After consumption, I got 40 bottles and 40 caps in hand which I can exchange for 30 bottles of beer. After consumption, just return all 30 empty bottles and caps.
@@duanxn haha, smart answer. I tested with Claude, got 31, seems less smart than o1. Actually the beer worth 50 cents per bottle, the max answer should be 40 if you can manage to get it.😀
@@duanxn What is stopping you from borrowing an infinite number of bottles of beer?
I think you'd stop enjoying them after about 15
Every time a new model comes out, I try to teach it how to play Nerdle. Prior models - even GPT-o - all fail miserably and often can’t even give me a valid guess with 8 characters in it. But this latest one came very very close to being able to do it. Reading through its logic and thought processes was quite impressive as it took 50-77 seconds to think before each response!
slight edit to the title of the video, the model is not a GPT, they dropped that naming convention, its named OpenAI o1
I think the wording of the north pole question affects the answer a little bit you can use the following and see if it will get it wrong
(start at the north pole and walk 1 kilometer south then turn left and keep walking ... do you pass your starting point?)
So in OpenAI o1, whenever we give any prompt, it thinks it over and expands the prompt a lot with related details (Chain of Thought) (hence it thinks a lot before answering) and then the result is much more accurate.
I think OpenAI is still holding back something much bigger than the model they just released. In my opinion, they are about a year or more ahead of the competition, as they have only recently reached the performance level of GPT-4. Knowing this, they are not at all concerned about releasing new models or features anytime soon. They launched an "Omni" model to surpass the competition. The competition managed to catch up with that model, and now they've launched a reasoning model. But none of this would scare someone working inside OpenAI on a daily basis, as they were already aware of these advances and knew this would happen.
Now, combining:
1. Agents (a base already launched in GPTs),
2. A multimodal model with audio, image, and video,
3. A reasoning model,
4. Long-term memory and planning,
All of this together in a single model, which I believe could be a GPT Next, a full Orion, or whatever name they choose to give it. That, indeed, would be something that could surprise someone like Ilya.
This is a great version I ended up working with it for several hours, using it to plan out some things, and giving me a guide for working with agents, and now I shall see how good the guide is
Pretty cool to see the AI employing some reasoning strategies. It will be interesting to see what happens when the AI learns to generate or choose new strategies to render conclusions, such as working backwards from a conclusion to the required premises and conditions, doing reductio ad absurdum, etc.
The problem with the “north pole” question is the ambiguity of the “starting point” in the question. I did an interaction with o1 and it gave two answers: one where the starting point meant the north pole, and the second where the starting point was the point 1 km south of the pole. It got both answers correct.
As some have pointed out, the question about the north pole is ambiguous because it is not clear whether the starting point refers back to the point where the man turned east or the north pole, but it would certainly be reasonable to assume it's at the turning point because otherwise the question posed is has a built in false assumption (that the starting point is passed).
I would love it if the AIs answer to the marble question was "under the sofa", read the thinking part and it determines that the marble slipped out of the cup as it was turned over, it bounced off the table and rolled onto the floor and under the sofa.
My most recent test is to ask for an image prompt for a realistic picture of someone sitting in an empty room imagining a sandwich. Should just be an empty room with a person in it. But it will add thought bubbles, plates with sandwiches, menus on the walls, all sorts of things. When I point out cameras can't read minds, or point out the other flaws, it will get it. 4o used to take 10 or more interactions to get it right. o1 gets it after 2 or 3
7:48 It is condidering the starting point as 1 km from north pole and not the actual north pole. so the answer it is giving is right... you just go around the latitude line in a circle and stop at the point where you took a left turn. it is actually calculating the circumference of a circle that is 2 pi r , here since it is considering the radius as 1 km, it is giving you an output as 2 pi, which mathematically is right!
@Matthew Berman --- * If the glass is scooped when picked up, then it could still hold the marble when moved to the microwave.
So, I guess it provided the most likely answer.
i think the ambiguity of "start" lays, in the start of the whole journey or the start of the task of walking in straight line until it reaches it's starting point.
maybe if you start the prompt with "At the starting point of a journey, imagine..."
Turning 90° does not equal going east! That is only true on the equator. On northpole you can turn 90 and still face south
Okay little ding-dong.
Nobody cares.
Regarding the north pole question, I think what Yann LeCun meant is returning to the point where you turned 90 degrees. This version of the riddle makes the most sense because it's very tricky to answer since the curvature of the earth the 1km of walking gets you slightly less than 1km away in a straight line and therefore your radius is slightly less than 1km.
BUT actually the way he wrote it in his original post, mentioning "starting point" which in my opinion is clearly the north pole and you will not reach that again of course.
Actually I just checked and he changed it to "until you reach the point where you turned", confirming my suspicion.
"write a tetris game in python" also works in gpt4-o free public version. what's the hype?
Regarding the North Pole question, "walk in a straight line" can be interpreted differently for a sphere vs. a plane. I think this phrase requires clarification in order to arrive at a correct answer.
I think the north pole question results in you walking on a line of latitude, ie a circle coming back to the start. Think of walking all the way to the equator, turn 90deg and you are walking around the equator (bloomin long walk though)
Here's a question which highlights the ambiguity that makes this question hard to parse:
You leave your house. You turn left and walk around the block. Do you ever pass the starting point?
The answer depends on if you consider the starting point inside the house, which is one possible interpretation but probably not the 'typical' interpretation.
Two actual Use Cases for me so far:
1. Making ciphers and puzzles for D&D that actually work and dont leave my players trying to figure out gpt4os nonsensical cipher for days! (This can really make it think for about a minute! if you are transcoding a ciphered message).
2. Making game mechanics for a complex sci-fi tabletop roleplaying game. (I managed to make it think for 14 seconds about this one!)
I hope they add image to text and browsing and voice features
Matthew Schumer - "We have this new idea nobody has apparently thought of before ..."
OpenAI - "Hold my beer little fella!"
It seems that the LM was able to learn over time and give better results on the Tetris game after you first tested it, and it is thinking faster when repeating the same questions. If this is true it’s “learning” from users using it, and that would be a game changer!
That is if it is able to learn in realtime I cannot imagine how fast it will be able to learn!
this is blowing my mind. Also anywhere 1km from the worth pole will be south of the north pole you can go in any other direction if your at true north
I feel like the globe questions really has shortcomings of what is straight, are you using a compass to stay straight? or the stars? or a magical laser line that always points in a bisection of the globe because it is anchored to gravity like we are when we stand up straight?
straight would be a vector adhering to the surface.
Just actually try doing it (use smaller number in the park with a tree qs the pole.) its obvious what straight is, and what the trick is.
Straight could also mean being not gay.
Also 7:47 the ai is right i dont understand if you are in north pole if you go any direction except if you follow axis you will always go south pole and after rotating 90 degree you until you pass the starting point so it will form a circle now the circumference of a circle is 2πr^2 in this case π= 1 hence 2×π×1^2 = 2π km so the ai is correct now if you want absolute distance it might be slightly more or less than that personality i think its slightly more. And the main problem is the starting point right? Well after facing 90° no matter which way you will face east or west and if you walk straight (not mentioned in the qs but observing the answer its the only thing that come in mind) you can only pass the second starting point but never the first. Some times people overcomplicate things way too much. Pls first connect the creator of the question to what he meant actually then judge. But still o1 is way more time consuming and open ai must improve them.
Or maybe the earth is flat 🤷🏻♂️
I tried the North Pole problem multiple ways on o1 and explained the difference between a great circle path and following a latitude line. It came up with different answers every time but didn't seem to figure it out. Not sure if anyone else has had better luck.
Suggestion for a new test:
On a hydraulic hydraulic scale, a cart of unknown weight is pulled by a cow. The scale's larger platform, upon which the cart and cow stand, has a diameter of 7.5 Egyptian horizontal cubits (Mediterranean standard). The smaller platform of the scale has a surface area of 1/18th of an amphora divided by a vertical Egyptian cubit.
The cow pulling the cart weighs precisely 42 congii of olive oil. On the smaller platform of the scale rests a counterweight of 3.5 talents (Attic standard).
Calculate the weight of the cart in force-kilograms on Mars and the depth at which the pressure in the hydraulic fluid is found to be 5.75 Roman pounds per squared digit when the system is in equilibrium.
I would suggest that 2 pi is approximately the distance, but not exactly. The 1km was on the curved surface of the earth, so the radius is very slightly less than 1km. So the distance is slightly less than 2 pi.
It proove something. They are aware of populars AI streamers and they may train models regarding popular prompt testing.
Matthew, the NORTH POLE question has just been misinterpreted by all of us:
the problem is in the question itself:
QUESTION:
Imagine standing at the north pole of the earth. Walk in any direction, in a straight line, for 1 km. Now turn 90 degrees to the left.
> Walk for as long as it takes to pass your starting point. <
Written this way it should be interpreted like:
- Start walking
- Walk until you reach the point where you started walking
So it's correct! It's 2π km.
The starting point is the point where you started walking after having turned 90 degrees.
WHY NOT INTEPRETED THE POLE AS A STARTING POINT?
I assume because being based on language, it gives more importance to the sentence "Walk for as long as it takes to pass your starting point", giving less weight to the context.
Anyway, the problem is in the question, it's NOT SPECIFIED what exactly is the starting point. Therefore with a not precise question you get not precise answers.
WHY 2 ANSWERS (in the live session)
BTW, you got 2 answers, both of 2 can be interpreted as correct, I'll explain why:
1° answer: more than 2π km.
It did the calculations and interpreted the question in this way:
Distance request: the total distance walked from the beginning, the pole (so it's 1 km + 2π km)
Starting point: the point where you started walking after having turned, since it is in the same sentence.
2° answer: more than 2π km.
The same calculations but another interpretation of the question:
Distance request: the walking distance, after having turned.
Starting point: the point where you started walking after turned.
I got it to answer the North Pole question right every time by changing it slightly: "Imagine your starting point being at the North Pole of the Earth. Walk in any direction, in a straight line for 1km. Now turn 90 degrees to the left. Walk for as long as it takes to pass your starting point. Have you walked: 1. More than 2xPi km. 2. Exactly 2xPi km. 3. Less than 2xPi km. 4. I never passed my starting point."
I tested the "impossible question" in my GPT o1 and it got it right in one single prompt. The thing is. I suggested it to approach with two different perspectives in order to break ambiguity.
I put the same question to GPT-o1 mini. First attempt, quoting the question directly from Yan Lecun, it got it wrong. Funny though, I could see it was thinking about the size of the earth etc. I thought it was about to nail it, but it answered "pi" and seemed to think "How far do you travel" meant "How far to the antipodal point?", which is an odd interpretation.
So I clarified, and it found the canonical answer, 2 pi km.
I followed up:
"That is the canonical answer, so I give you credit for that. But consider what it means to walk in a straight line on the surface of the Earth. Please give me your thoughts on how that affects your answer."
and it found the great circle route.
So it did better than previous GPTs but not perfect.
Try this prompt (trick question):
Replace the word "blank" in the next sentence with the correct number written in english:
This sentence contains blank syllables in it
This needs to be tested! (my no. of tries are already over, so if someone else could, that would be great)
The correct word to replace “blank” is ten.
When we substitute “ten” into the sentence, it reads:
“This sentence contains ten syllables in it”
Counting the syllables:
1. This - 1 syllable
2. sentence - 2 syllables (sen-tence)
3. contains - 2 syllables (con-tains)
4. ten - 1 syllable
5. syllables - 3 syllables (syl-la-bles)
6. in - 1 syllable
7. it - 1 syllable
Totaling 10 syllables, which matches the number stated in the sentence.
@@DJ-dh3oe Was that it's response?
Wow. Beautiful.
Except that it's 11 syllables.
So.
Oops.
@@xiaojinyusaudiobookswebnov4951 It is eleven given one syllable for blank (I believe), but with eleven it becomes thirteen syllables, which then becomes twelve syllables, which then becomes eleven syllables, ...
Even if my example is flawed, it's nice to try these self referencing things, could test for meta cognition maybe 🤷🏻♂️
North Pole question severely flawed. Technically the answer is A. Start point unspecified; since starting points are often at the start of a series of steps, the starting point should be assumed to be the North Pole itself. As no other deviations in course have been indicated other than the 90 degrees turn left, we can assume all walking was in a straight line with no turns or curves. This means that you would have walked 40,001 km to return to the closest point before repeating the great circles. D states that you "never came close to the starting point", which is incorrect since being only 1 km away from it after walking 40,001 is relatively, extremely close. Close enough to reasonably be considered that you have passed your starting point, while albeit 1km away from it, while not exact and you will never pass over it exactly, you certainly did come close.
Clearly you intended the starting point to be 1km south of the North Pole in order for the potential answers to pose a challenge, and the intention was for the agent to consistently walk East which would introduce a curve, but at 7:14 you called o1 out for being incorrect, which if true means you'd expect a facetious answer as above, but it has read between the lines and understood the intention. You might notice o1 also clarified that you would walk East along a circle of latitude, so it has effectively interpreted the question as it was intended to be asked. The answer it should have arrived at here however should have of course been 3 as the radius/diameter of the circle it walks is now less than 1km.
So in short: bad question followed by a good start by o1 interpreting the question but then followed by screwing up the final answer.
I think we can all see the theme with these questions, and that is dimensionality. It is impeccable with knowledge and colloquial logic. It is able to keep order and maintain a one dimensional set of sequences. It is trained in a literal string of one dimensional data. But it fails in higher dimensional thinking. GPT 4 was the first model I found to mostly correctly address 2 dimensional ideas. I am not surprised GPT4 failed the marble in the cup problem, the prerequisite understanding and application of that knowledge into a full 3D environment of physical matter has a complexity operating at a much higher level than an LLM is really capable of understanding. I believe the reason o1 succeeded here will have been because it trained on this query. I believe it will have problems with other fresh 3D problems not yet posited. These neural nets would need their training to include 3D data alongside the string data stream to be able to reasonably and reliably coagulate world views of these higher dimensional types of scenarios.
How do you describe a vibrant 3D world to a person who is blind, deaf and unable to feel the world around them? These models are insanely overpowered considering all they have experienced is a bunch of words being injected into their "brain". If their models were built with eyes, ears, a bit of memory, o1's self awareness evaluation routine, and looped into a continual state of thought, they would be as sentient as we are. I doubt you'd even need the top LLM models if you included dimensionality in their training data.
Thank you for your analysis, which I can only endorse. I find the question remarkably clever as it contains several traps in the solution path. One of these traps is, of course, the fact that it is not explicitly stated to always go east. The first trap is the mistaken assumption of following a line of latitude. Then, the wording stating to pass the starting point does not imply hitting it exactly. It is also unnecessary to name the starting point since both possible starting points lead to the same answer (No. 1). The interesting wording of answer 4 only mentioning "close" makes it not fulfilled in the solution scenario. Finally, answer 1 is correct even if one were to assume an ellipsoid for the Earth where there are no closed geodesics.
@@OrbitTheSun I must also thank you for your agreement. You bring up an incredibly interesting and important point as regards all the pitfalls in the assumed context of the question. I have since finished the video and at 10:45 Matthew communicates that he considered the North Pole itself to be the starting point, and answer 4 to be the desired output; so he must have intended o1 to assume a curved path to the East, which it did - but given it's final answer, it must have assumed the starting point to be the 90 degree turn, and for it to be flat Earth. I skipped including an imperfect surface as I felt that had less relevance and would cloud things even more. I mean.. I suppose it would indeed be possible to follow that that circle of latitude consistently East without turning (yaw), if you were laying on your side and only altering your pitch...
I can return to the pole in 1 km when facing east. I side step to the pole. But when walking straight you end up 1km from the other pole of the earth before going back to where you made the turn. Not in a circle around the pole where you started.
Congrats on practically being featured by OpenAI :)
o1 mini got your question right as well as 4o. We need more difficult CoT tests.
It is surprisingly good at making one shot games. Use 4o or o1 mini version to generate a one shot prompt. It has worked really well for me so far.