Explore ARC-AGI Data + Play

Поделиться
HTML-код
  • Опубликовано: 24 янв 2025

Комментарии • 47

  • @s.dotmedia
    @s.dotmedia 7 месяцев назад +8

    Let's go! I'm all in on this, I will say: don't count out the power of one shot

    • @ARCprize
      @ARCprize  7 месяцев назад +2

      Nice! Love it - let us know if you need anything along the way

  • @omarnomad
    @omarnomad 7 месяцев назад +9

    Is there a way to know all the priors you embed into the puzzles?
    So far I’ve identified:
    1. Translations - Shifting objects or patterns across the grid.
    2. Rotations - Rotating objects or patterns at different angles.
    3. Reflections - Flipping objects or patterns across a line.
    4. Scaling - Changing the size of objects or patterns.
    5. Repetition and symmetry - Repeating patterns or creating symmetrical designs.
    6. Color changes - Altering the color of objects or patterns.
    7. Compositions - Combining multiple operations or transformations.
    8. Object addition or removal - Adding or removing elements within the grid.
    9. Changes of the size matrices - Modifying the dimensions of the grid or the objects within it.

    • @ARCprize
      @ARCprize  7 месяцев назад +3

      There have been a bunch of attempts at this.
      Table 4 on this paper leans that direction
      arxiv.org/pdf/2403.11793
      There isn't a way to know all the priors, this is essentially helping give the answer to the test set

  • @anantkeepershome4327
    @anantkeepershome4327 7 месяцев назад +1

    When you think about it, the optimal network should be like a physics simulator, every example has its own stable rules. My guess is that a recurrent network would have the best chance. Though the parameter count would need to be huge so we could perhaps make a Hypernet to generate the weights from scratch.

  • @DistortedV12
    @DistortedV12 7 месяцев назад +4

    I have an idea now, thanks. I’ll probably check out ARC after my PhD qualifying exam. Finetuning is gonna be fun 🤩

  • @D3cast
    @D3cast 6 месяцев назад +1

    I am new to programming, but this challenge and task really interests me and I'd like to give it a try. Could you create a tutorial on how to submit an entry to the arc challenge? maybe with a model which will produce some minimal results?

    • @ARCprize
      @ARCprize  6 месяцев назад +1

      Totally! We have a ton of templates here
      arcprize.org/guide
      As for a submission tutorial, we don't have a video of this directly, but this video shows how to work with Kaggle notebooks.
      ruclips.net/video/crhrzhVjWog/видео.html

  • @mosca204
    @mosca204 6 месяцев назад +1

    Why is the train/evaluation set so small?

    • @ARCprize
      @ARCprize  6 месяцев назад +1

      The tasks are handmade which limit the scale that can be done.
      They focus on diversity rather than quantity at this stage

  • @alvaromros8127
    @alvaromros8127 6 месяцев назад

    Does your submission count if you make use of private models like gpt4 at some point in your algorithm?

  • @mriz
    @mriz 7 месяцев назад +5

    thx for demonstrations, this taks feel like arbitrarily single step arbitrary state transition in cellular automaton. It also looks like fun to play 😄

    • @ARCprize
      @ARCprize  7 месяцев назад

      Nice! Yes please go try it out and let us know what you think

  • @RPHacker777
    @RPHacker777 6 месяцев назад +4

    Could someone please explain how the AI soccer players in a simulation can go from physically flopping around on the ground to teaching themselves team strategy but AI can't solve these ARC tasks?

    • @MMahdiM1998
      @MMahdiM1998 Месяц назад

      Read about the idea of one-shot learning. You understand the difference.

    • @joegibes
      @joegibes Месяц назад +1

      Key is understanding the goal. Traditional benchmarks test skill on known tasks. ARC tests general intelligence- solving novel problems. Its training set isn't for memorizing, but for development, familiarization, and implementing core knowledge priors.

  • @InfiniteQuest86
    @InfiniteQuest86 7 месяцев назад +11

    Thank you! I know this has been around for a while, but I'm happy to see a legitimate attempt at testing intelligence that isn't "It passed the Turing test." LLMs sound smart because they speak our language, but are they really doing anything more than regurgitating memorized information? This test shows that most likely not really.

    • @ARCprize
      @ARCprize  7 месяцев назад +1

      Thanks! yes we agree

    • @kliersheed
      @kliersheed 24 дня назад

      are humans really doing anything more than perceiving, memorizing and recombining patterns though?

    • @InfiniteQuest86
      @InfiniteQuest86 23 дня назад

      ​@@kliersheed Maybe. There have definitely been instances of pure innovation or creativity in humans, but that's extremely rare. Our ability to understand new patterns quickly and apply it in novel scenarios is one of the current big differences right now (as shown in ARC). And the ability to be critical of ourselves. Like we can analyze what we are doing and figure out if we made a mistake and AI can't do that yet.

    • @kliersheed
      @kliersheed 23 дня назад

      @@InfiniteQuest86 most humans can neither and AI def. can if you give it the ability to. its a simple causal action. if your go to AI gets something wrong, give it a hint or say "are you sure you are right?" it very often double checks, corrects and apologizes. how is that not reflection? you could easily hardcoe this process (like its in humans) to be triggered when it thinks it solved smth, wants to apply it but fails. that is also the only time humans reflect. if they HAVE TO. its causal.
      humans cant be creative beyond combining old pattern. AI fails in "simple" tasks we can do, because WE made them. its simple to us because of how our perception and evolved processing works. we have values and standards we have not fully implemented into AI yet but expect it to act EXACTLY like we do. take art for example. people complain AI art isnt creative but WE were the ones that trained it on our data and OUR perception of "good". how is AI supposed to be "creative" and make "new and original" art, if we narrow down so much on what we "like"? obviously its gonna feel generic. beauty is objective to a point. AI learned that.
      at the same time, if humans could create truely new and original things, they could e.g. "imagine" a NEW color. one that has no name yet, one you have never seen in your life. and you CANT. you can imagine different shades, brightness, COMBINATIONS, contrasts, etc. but NOTHING you havent perceived yet in at least some variant you can extrapolate from. its the same principle with anything else.

    • @InfiniteQuest86
      @InfiniteQuest86 23 дня назад

      @@kliersheed Yeah I mean you are just completely uninformed. I said it was rare for humans to be truly creative, but it has happened. There's no denying that. There are niche examples you wouldn't understand, but say when Einstein came up with general relativity. That was a truly unique thing that wasn't just recombining old stuff.
      If what you said about AI being able to check itself was as easy as you make it sound, then it would be done already. Billions are being spent on making AI viable and they haven't been able to do it. I'm sorry but it doesn't cost billions to put a for loop in and then break out when it is sure. There's MAJOR technical hurdles to overcome before we are even close to it being able to reflect on itself.

  • @wjrasmussen666
    @wjrasmussen666 23 дня назад

    I want to do the 2025 challenge. Does it have to be a pure LLM or can I do something more interesting (to me).

    • @ARCprize
      @ARCprize  23 дня назад

      You can do whatever system you want! Doesn't have to be an LLM, though many are finding them useful.

    • @wjrasmussen666
      @wjrasmussen666 23 дня назад

      @@ARCprize Thank you!

  • @ignaciosavi7739
    @ignaciosavi7739 7 месяцев назад

    Lets get to the bottom of this. How much for getting 90 % accuracy on a free llm model? How much do i get for that?

    • @ARCprize
      @ARCprize  7 месяцев назад

      The threshold for a Kaggle score is 85%, reach that with a valid submission and you're eligible for a prize

    • @ignaciosavi7739
      @ignaciosavi7739 7 месяцев назад

      @@ARCprize thanks

  • @geospatialindex
    @geospatialindex 7 месяцев назад

    So have you collaborated with any psychologists to make this test

    • @ARCprize
      @ARCprize  7 месяцев назад

      Check out section 11.1 of Measure/Intelligence. Francois digs into his influence of human psychology

  • @Aemond-qj4xt
    @Aemond-qj4xt 7 месяцев назад +1

    i think i might have unintentionally set the basis for solving this in a project i did a couple months ago

    • @ARCprize
      @ARCprize  7 месяцев назад

      We'd love to see a submission!

    • @Aemond-qj4xt
      @Aemond-qj4xt 7 месяцев назад

      @@ARCprize working on it i just handed in my graduation project i have time to work on this now

  • @robbielualhati1731
    @robbielualhati1731 7 месяцев назад

    There are also birds such as Sulphur Crested Cockatoos that have shown problem solving skills. Hopefully it's proof enough that a basic reasoning model won't require a trillion parameters.

  • @ManwithNoName-t1o
    @ManwithNoName-t1o 7 месяцев назад +4

    children can solve these puzzles but i dont think LLM's can

    • @ARCprize
      @ARCprize  7 месяцев назад +1

      We haven't seen LLM do this yet

    • @shure-youtube
      @shure-youtube 7 месяцев назад

      @@ARCprize How about VLM? I think this task requires strong spatial understanding.

    • @ignaciosavi7739
      @ignaciosavi7739 7 месяцев назад

      How?​@@ARCprize

  • @clerothsun3933
    @clerothsun3933 4 месяца назад

    I get that this is a stepping stone, but calling it a test for AGI is just ludicrous. This isn't even close to AGI, it's just a toy.

  • @JirkaKlimes_
    @JirkaKlimes_ 7 месяцев назад

    It can't be that hard right

    • @ARCprize
      @ARCprize  7 месяцев назад +1

      Try it out! We'd love to see a submission

  • @ps3301
    @ps3301 7 месяцев назад +3

    If u cant design an ai architecture to solve this problem, u arent as smart as you think.

    • @denisblack9897
      @denisblack9897 7 месяцев назад

      Hot take)
      Don’t forget to design great design to sell subscriptions😅

  • @geospatialindex
    @geospatialindex 7 месяцев назад

    Sorry this isn’t general intelligence. This is just reasoning. It is painful watching a whole industry trying to reinvent psychology when there is shady a century of research there.

    • @ARCprize
      @ARCprize  7 месяцев назад +1

      Thanks for the comment! We'd love to hear your ideas and thoughts about how to get closer to AGI

    • @PoppinBichy
      @PoppinBichy 14 дней назад

      If you have the unique algorithm that solves those problems, publish it please, I'm so exited to see your performances