Francois Chollet recommends this method to solve ARC-AGI

Поделиться
HTML-код
  • Опубликовано: 1 окт 2024
  • ARC Prize is a $1,000,000+ public competition to beat and open source a solution to the ARC-AGI benchmark.
    Hosted by Mike Knoop (Co-founder, Zapier) and François Chollet (Creator of ARC-AGI, Keras).
    --
    Website: arcprize.org/
    Twitter/X: / arcprize
    Newsletter: Signup @ arcprize.org/
    Discord: / discord
    Try your first ARC-AGI tasks: arcprize.org/play

Комментарии • 33

  • @el_chivo99
    @el_chivo99 3 месяца назад +17

    chollet has been my favorite voice on AI for 5+ years and I don't see that changing!

    • @ARCprize
      @ARCprize  3 месяца назад +1

      Thanks Chivo!

  • @BigFatSandwitch
    @BigFatSandwitch 3 месяца назад +9

    The question is if someone manages to get high score in ARC style problems will they really share the code for such little amount of money rather than raising money from venture capital farms based on that for their startup.

    • @ARCprize
      @ARCprize  3 месяца назад +5

      We encourage an open source solution to ARC-AGI!

  • @davefaulkner6302
    @davefaulkner6302 3 месяца назад +5

    So he is saying that LLMs guiding hyperparameter space will get us to AGI? Seems a little simplistic to me ... and there are better ways to search that space.

  • @gustavnilsson6597
    @gustavnilsson6597 3 месяца назад +2

    Perhaps we want the model to search actively as humans do by manipulating it's environment.
    I don't think this problem is going to be easy to solve.

  • @fayezsalka
    @fayezsalka 3 месяца назад +7

    But we, as humans, don’t do tree search when solving ARC. We solve it in one shot, almost immediately without trying / searching through different solutions, because the spacial patterns look very obvious, no?
    Large multimodal models with ability to input AND output images natively will be able to solve this in one shot. Case in point: gpt5 with native image output.

    • @stevo-dx5rr
      @stevo-dx5rr 3 месяца назад +2

      I’m not a researcher, but the notion that ‘this is obviously not what humans do’ seems moot given that the same can be said about transformers.

    • @ARCprize
      @ARCprize  3 месяца назад +10

      > We solve it in one shot
      > Spacial patterns look very obvious
      The human brain is very good at having 'intuition' to prune a search space. Though it may happen quickly, there are many (maybe infinite) possibilities for decisions humans can make, but yet we prune that down to just a few very quickly

    • @stevo-dx5rr
      @stevo-dx5rr 3 месяца назад

      @@ARCprize What do you think of MITs “Introduction to Program Synthesis” course as a starting point?

    • @fayezsalka
      @fayezsalka 3 месяца назад +2

      @@ARCprize this implies that multi modal models can do much better if there is an internal “for loop” that allows it to iterate through different solutions in the hidden space manifold before decoding into output. Only question becomes how to train such a model and what dataset / loss function objective can be used.
      Alternatively, would we be able to imitate such a process by having the “thinking” happen in the decoded output in an auto regressive fashion? We human have the ability to “think out loud” as one option and having an LLM think in the decoded output space might make it easier and more familiar to train
      (Basically, is chain of thought considered one crude form of discreet search?)

    • @shawnvandever3917
      @shawnvandever3917 3 месяца назад +1

      We don't do it in one shot. We go through 100s or 1000s of prediction updates to answer a question. We update our mental models in real time while this is happening.

  • @picklenickil
    @picklenickil 17 дней назад

    Oh it give me an 💡 idea.
    It sounds to me like when I watch my nephew learning to walk, my wife comes and provies him with a walker (hyper parameters space).
    What doesn't make sense to me is that, are we trying to build the equivalent of a kid, the walker, or wifu... Or a god forbidden amalgamation of the three?

  • @bedev1087
    @bedev1087 3 месяца назад +2

    Did Francois say he would like to see a solution which can solve these puzzles without having been trained on a lot of “ARC like” input output pairs?
    This benchmark seems to exist as a subset of the permutation group of operations on coloured grids
    (expanding/contracting grid, extruding masses, rotating masses, filling in holes, adding single colours, etc…)
    If the “core knowledge” claim is true about ARC, then the discovery of the correct set of “core knowledge” matrix operations could be used to synthetically generate a dataset.
    You could then sample thousands of games from a pre-trained policy network given only the first training input, and reinforce the trajectories closest to the revealed answer.
    Then sample games given the 1st training input, the first training output, and the 2nd training input, and reinforce again before the test set.
    Or is this not allowed?

    • @ARCprize
      @ARCprize  3 месяца назад +1

      Francois said he would *like* to see a solution that isn't trained on a bunch of input/output ARC tasks but there isn't a rule that says this isn't allowed.
      If as long as your submission only makes 2 final attempts per task you can use it. This means you can test and iterate on the example pairs as much as you'd like.

    • @bedev1087
      @bedev1087 3 месяца назад +1

      ​ @ARCprize Hey thanks for the reply! :)
      You guys obviously have the generating operations locked away, so this method would only be able to train on "ARC like" operations,
      So can I ask what his intuition is on a model being able to center on "core knowledge" priors from training on tasks orthogonal to ARC?
      Thanks for making such a great opportunity for the community 👍

  • @-mwolf
    @-mwolf 3 месяца назад +1

    These recent code diffusion models sound simmilar to this idea 🤔

  • @wwkk4964
    @wwkk4964 3 месяца назад +2

    This can be achieved fairly easily by employing a diffusion based LLM reasoning module that has accurate image captioning capabilities that ignore irrelevant details when feeding the llm during inference or test time?

    • @Hohohohoho-vo1pq
      @Hohohohoho-vo1pq 3 месяца назад +2

      Stop overusing the term LLM. GPTs are not LLMs. LLMs are GPTs.

    • @wwkk4964
      @wwkk4964 3 месяца назад

      @@Hohohohoho-vo1pq please refer to the presentation here. Author of the research calls it LLM. ruclips.net/video/kYtvqbgCxFA/видео.htmlsi=nrFEIED7mmZAE_Zf

    • @RecursiveTriforce
      @RecursiveTriforce 3 месяца назад +7

      ​@@Hohohohoho-vo1pq
      LLMs can be GPTs.
      GPTs can be LLMs.
      GPT is the architecture. LLM is the size.
      They neither imply nor contradict each other.

    • @Hohohohoho-vo1pq
      @Hohohohoho-vo1pq 3 месяца назад +2

      @@RecursiveTriforce reducing GPTs to as "mere LLMs" is very misleading. People don't even understand what that means when they say that.

  • @bladekiller2766
    @bladekiller2766 3 месяца назад +3

    This is how Stockfish (Chess Engine) works.
    Not sure whether it will lead to AGI.

    • @ARCprize
      @ARCprize  3 месяца назад +3

      We'd love a submission that tried this approach to see how it goes - super interesting

  • @2394098234509
    @2394098234509 3 месяца назад +1

    Love this

  • @tycrenshaw6968
    @tycrenshaw6968 3 месяца назад +2

    I dont know if he is nervous or what but his is totally red and looks like it is hurting alot.

    • @ARCprize
      @ARCprize  3 месяца назад +4

      Francois is on fire, with knowledge.