Is there a way to know all the priors you embed into the puzzles? So far I’ve identified: 1. Translations - Shifting objects or patterns across the grid. 2. Rotations - Rotating objects or patterns at different angles. 3. Reflections - Flipping objects or patterns across a line. 4. Scaling - Changing the size of objects or patterns. 5. Repetition and symmetry - Repeating patterns or creating symmetrical designs. 6. Color changes - Altering the color of objects or patterns. 7. Compositions - Combining multiple operations or transformations. 8. Object addition or removal - Adding or removing elements within the grid. 9. Changes of the size matrices - Modifying the dimensions of the grid or the objects within it.
There have been a bunch of attempts at this. Table 4 on this paper leans that direction arxiv.org/pdf/2403.11793 There isn't a way to know all the priors, this is essentially helping give the answer to the test set
When you think about it, the optimal network should be like a physics simulator, every example has its own stable rules. My guess is that a recurrent network would have the best chance. Though the parameter count would need to be huge so we could perhaps make a Hypernet to generate the weights from scratch.
I am new to programming, but this challenge and task really interests me and I'd like to give it a try. Could you create a tutorial on how to submit an entry to the arc challenge? maybe with a model which will produce some minimal results?
Totally! We have a ton of templates here arcprize.org/guide As for a submission tutorial, we don't have a video of this directly, but this video shows how to work with Kaggle notebooks. ruclips.net/video/crhrzhVjWog/видео.html
Could someone please explain how the AI soccer players in a simulation can go from physically flopping around on the ground to teaching themselves team strategy but AI can't solve these ARC tasks?
Key is understanding the goal. Traditional benchmarks test skill on known tasks. ARC tests general intelligence- solving novel problems. Its training set isn't for memorizing, but for development, familiarization, and implementing core knowledge priors.
Thank you! I know this has been around for a while, but I'm happy to see a legitimate attempt at testing intelligence that isn't "It passed the Turing test." LLMs sound smart because they speak our language, but are they really doing anything more than regurgitating memorized information? This test shows that most likely not really.
@@kliersheed Maybe. There have definitely been instances of pure innovation or creativity in humans, but that's extremely rare. Our ability to understand new patterns quickly and apply it in novel scenarios is one of the current big differences right now (as shown in ARC). And the ability to be critical of ourselves. Like we can analyze what we are doing and figure out if we made a mistake and AI can't do that yet.
@@InfiniteQuest86 most humans can neither and AI def. can if you give it the ability to. its a simple causal action. if your go to AI gets something wrong, give it a hint or say "are you sure you are right?" it very often double checks, corrects and apologizes. how is that not reflection? you could easily hardcoe this process (like its in humans) to be triggered when it thinks it solved smth, wants to apply it but fails. that is also the only time humans reflect. if they HAVE TO. its causal. humans cant be creative beyond combining old pattern. AI fails in "simple" tasks we can do, because WE made them. its simple to us because of how our perception and evolved processing works. we have values and standards we have not fully implemented into AI yet but expect it to act EXACTLY like we do. take art for example. people complain AI art isnt creative but WE were the ones that trained it on our data and OUR perception of "good". how is AI supposed to be "creative" and make "new and original" art, if we narrow down so much on what we "like"? obviously its gonna feel generic. beauty is objective to a point. AI learned that. at the same time, if humans could create truely new and original things, they could e.g. "imagine" a NEW color. one that has no name yet, one you have never seen in your life. and you CANT. you can imagine different shades, brightness, COMBINATIONS, contrasts, etc. but NOTHING you havent perceived yet in at least some variant you can extrapolate from. its the same principle with anything else.
@@kliersheed Yeah I mean you are just completely uninformed. I said it was rare for humans to be truly creative, but it has happened. There's no denying that. There are niche examples you wouldn't understand, but say when Einstein came up with general relativity. That was a truly unique thing that wasn't just recombining old stuff. If what you said about AI being able to check itself was as easy as you make it sound, then it would be done already. Billions are being spent on making AI viable and they haven't been able to do it. I'm sorry but it doesn't cost billions to put a for loop in and then break out when it is sure. There's MAJOR technical hurdles to overcome before we are even close to it being able to reflect on itself.
There are also birds such as Sulphur Crested Cockatoos that have shown problem solving skills. Hopefully it's proof enough that a basic reasoning model won't require a trillion parameters.
Sorry this isn’t general intelligence. This is just reasoning. It is painful watching a whole industry trying to reinvent psychology when there is shady a century of research there.
Let's go! I'm all in on this, I will say: don't count out the power of one shot
Nice! Love it - let us know if you need anything along the way
Is there a way to know all the priors you embed into the puzzles?
So far I’ve identified:
1. Translations - Shifting objects or patterns across the grid.
2. Rotations - Rotating objects or patterns at different angles.
3. Reflections - Flipping objects or patterns across a line.
4. Scaling - Changing the size of objects or patterns.
5. Repetition and symmetry - Repeating patterns or creating symmetrical designs.
6. Color changes - Altering the color of objects or patterns.
7. Compositions - Combining multiple operations or transformations.
8. Object addition or removal - Adding or removing elements within the grid.
9. Changes of the size matrices - Modifying the dimensions of the grid or the objects within it.
There have been a bunch of attempts at this.
Table 4 on this paper leans that direction
arxiv.org/pdf/2403.11793
There isn't a way to know all the priors, this is essentially helping give the answer to the test set
When you think about it, the optimal network should be like a physics simulator, every example has its own stable rules. My guess is that a recurrent network would have the best chance. Though the parameter count would need to be huge so we could perhaps make a Hypernet to generate the weights from scratch.
I have an idea now, thanks. I’ll probably check out ARC after my PhD qualifying exam. Finetuning is gonna be fun 🤩
I am new to programming, but this challenge and task really interests me and I'd like to give it a try. Could you create a tutorial on how to submit an entry to the arc challenge? maybe with a model which will produce some minimal results?
Totally! We have a ton of templates here
arcprize.org/guide
As for a submission tutorial, we don't have a video of this directly, but this video shows how to work with Kaggle notebooks.
ruclips.net/video/crhrzhVjWog/видео.html
Why is the train/evaluation set so small?
The tasks are handmade which limit the scale that can be done.
They focus on diversity rather than quantity at this stage
Does your submission count if you make use of private models like gpt4 at some point in your algorithm?
thx for demonstrations, this taks feel like arbitrarily single step arbitrary state transition in cellular automaton. It also looks like fun to play 😄
Nice! Yes please go try it out and let us know what you think
Could someone please explain how the AI soccer players in a simulation can go from physically flopping around on the ground to teaching themselves team strategy but AI can't solve these ARC tasks?
Read about the idea of one-shot learning. You understand the difference.
Key is understanding the goal. Traditional benchmarks test skill on known tasks. ARC tests general intelligence- solving novel problems. Its training set isn't for memorizing, but for development, familiarization, and implementing core knowledge priors.
Thank you! I know this has been around for a while, but I'm happy to see a legitimate attempt at testing intelligence that isn't "It passed the Turing test." LLMs sound smart because they speak our language, but are they really doing anything more than regurgitating memorized information? This test shows that most likely not really.
Thanks! yes we agree
are humans really doing anything more than perceiving, memorizing and recombining patterns though?
@@kliersheed Maybe. There have definitely been instances of pure innovation or creativity in humans, but that's extremely rare. Our ability to understand new patterns quickly and apply it in novel scenarios is one of the current big differences right now (as shown in ARC). And the ability to be critical of ourselves. Like we can analyze what we are doing and figure out if we made a mistake and AI can't do that yet.
@@InfiniteQuest86 most humans can neither and AI def. can if you give it the ability to. its a simple causal action. if your go to AI gets something wrong, give it a hint or say "are you sure you are right?" it very often double checks, corrects and apologizes. how is that not reflection? you could easily hardcoe this process (like its in humans) to be triggered when it thinks it solved smth, wants to apply it but fails. that is also the only time humans reflect. if they HAVE TO. its causal.
humans cant be creative beyond combining old pattern. AI fails in "simple" tasks we can do, because WE made them. its simple to us because of how our perception and evolved processing works. we have values and standards we have not fully implemented into AI yet but expect it to act EXACTLY like we do. take art for example. people complain AI art isnt creative but WE were the ones that trained it on our data and OUR perception of "good". how is AI supposed to be "creative" and make "new and original" art, if we narrow down so much on what we "like"? obviously its gonna feel generic. beauty is objective to a point. AI learned that.
at the same time, if humans could create truely new and original things, they could e.g. "imagine" a NEW color. one that has no name yet, one you have never seen in your life. and you CANT. you can imagine different shades, brightness, COMBINATIONS, contrasts, etc. but NOTHING you havent perceived yet in at least some variant you can extrapolate from. its the same principle with anything else.
@@kliersheed Yeah I mean you are just completely uninformed. I said it was rare for humans to be truly creative, but it has happened. There's no denying that. There are niche examples you wouldn't understand, but say when Einstein came up with general relativity. That was a truly unique thing that wasn't just recombining old stuff.
If what you said about AI being able to check itself was as easy as you make it sound, then it would be done already. Billions are being spent on making AI viable and they haven't been able to do it. I'm sorry but it doesn't cost billions to put a for loop in and then break out when it is sure. There's MAJOR technical hurdles to overcome before we are even close to it being able to reflect on itself.
I want to do the 2025 challenge. Does it have to be a pure LLM or can I do something more interesting (to me).
You can do whatever system you want! Doesn't have to be an LLM, though many are finding them useful.
@@ARCprize Thank you!
Lets get to the bottom of this. How much for getting 90 % accuracy on a free llm model? How much do i get for that?
The threshold for a Kaggle score is 85%, reach that with a valid submission and you're eligible for a prize
@@ARCprize thanks
So have you collaborated with any psychologists to make this test
Check out section 11.1 of Measure/Intelligence. Francois digs into his influence of human psychology
i think i might have unintentionally set the basis for solving this in a project i did a couple months ago
We'd love to see a submission!
@@ARCprize working on it i just handed in my graduation project i have time to work on this now
There are also birds such as Sulphur Crested Cockatoos that have shown problem solving skills. Hopefully it's proof enough that a basic reasoning model won't require a trillion parameters.
children can solve these puzzles but i dont think LLM's can
We haven't seen LLM do this yet
@@ARCprize How about VLM? I think this task requires strong spatial understanding.
How?@@ARCprize
I get that this is a stepping stone, but calling it a test for AGI is just ludicrous. This isn't even close to AGI, it's just a toy.
It can't be that hard right
Try it out! We'd love to see a submission
If u cant design an ai architecture to solve this problem, u arent as smart as you think.
Hot take)
Don’t forget to design great design to sell subscriptions😅
Sorry this isn’t general intelligence. This is just reasoning. It is painful watching a whole industry trying to reinvent psychology when there is shady a century of research there.
Thanks for the comment! We'd love to hear your ideas and thoughts about how to get closer to AGI
If you have the unique algorithm that solves those problems, publish it please, I'm so exited to see your performances