Volo
Volo
  • Видео 40
  • Просмотров 378 834
The Simple $1,000,000 Problem AI Can't Solve
The Arc Prize is a new challenge to surpass human-level performance at the ARC-AGI benchmark - a benchmark that has been a serious challenge for modern AI systems. ARC-AGI focuses on skill acquisition in order to solve puzzles - to be able to learn from a few examples and apply the skill to a new input. The intent is to push the boundary on intelligence and move one step closer to AGI.
📚 Resources:
- Arc Prize: arcprize.org/
- Example puzzles: arcprize.org/play
- Francois Chollet & Mike Knoop on Dwarkesh Podcast: ruclips.net/video/UakqL6Pj9xo/видео.html
🚀 In This Video, You'll learn:
- What is the Arc Prize
- ARC-AGI Benchmark
- AI reasoning and planning
- Why LLMs struggle at Arc
- Can LLMs reason...
Просмотров: 5 237

Видео

Can We Just Unplug AI?
Просмотров 48228 дней назад
Once AI becomes autonomous and tried to take over the world, can we just unplug it? Well, no, not really. But that also doesn't matter as much as some might think. In this video I share my thoughts on AI safety, the difference between AI models and AI agents, and what we can do. 💡 Perfect for Viewers Interested in: - Future of AI - AI Safety - AI Regulations - AI Agents vs AI Models - Learning ...
AI Automating Coding = More Software Engineers
Просмотров 1,6 тыс.Месяц назад
Will AI make software engineering pointless or more important than ever? Is it a good time to get into software engineering or will it be automated? In this video, I'll explain why Jevons Paradox tells us that demand for software will be higher than ever, how the role of engineers evolves in the coming years, and why it's never been a better time to get into software development. 🚀 In This Vide...
Can AI create NEW ideas?
Просмотров 2 тыс.2 месяца назад
Generative AI has been a complete game-changer, but can AI models actually create new ideas? Are the ideas that AI generates simply combinations of existing training data or do they amount to something truly original? In this @veritasium-inspired video, I dig into the science of the brain, how humans come up with new ideas and compare that with how AI systems such as ChatGPT generate ideas and ...
How To Learn Coding FAST (using AI)
Просмотров 9382 месяца назад
It's never been easier to learn how to code thanks to AI. I will show you how I would learn to code if I could start over and how I would use ChatGPT to quickly learn coding. I will also show you how AI can write code for you and how you can troubleshoot that code with ChatGPT. Will AI automate coding? Large portions of it, yes - but that will only empower problem solvers like us to add more va...
How To Actually Get A Tech Job in 2024
Просмотров 6693 месяца назад
AI has quickly changed the landscape for tech jobs has already automated a lot of coding activities. So what is the new best way to find a job as a software engineer? As a Director of Engineering and hiring manager, let me share some tips for getting your first tech job and ways to use AI to get a job. 🚀 In This Video, You'll learn: - How to use AI to get a job - How to get your first tech job ...
Devin AI Agent is WAYYY overhyped...
Просмотров 10 тыс.3 месяца назад
Cognition Labs recently announced Devin the AI Software Engineer. This startup believes they have crafted an AI Agent that can automate software engineering. I don't buy the hype and I was not particularly impressed with the demo or the way the Devin team deceptively presented their benchmark results. This comes just weeks after Jensen Huang, CEO of Nvidia told the world that people no longer n...
Is Your Job Safe? AI Risk Tier List
Просмотров 2 тыс.3 месяца назад
This is my tier ranking of AI impact on jobs. I discuss which jobs are most likely to be automated by AI and which ones can actually benefit from AI. Please note - this AI risk tier list is just my own opinion and is meant to foster conversations about the future of work and AI impact on work. I cover a wide variety of jobs to highlight not only risk but also upside of an AI future economy. 🚀 I...
Is Learning To Code Still Worth It in 2024?
Просмотров 214 тыс.3 месяца назад
AI has changed the educational landscape and has already automated a lot of coding activities. So should you learn how to code now that AI can do it? Recently at the World Governments Summit, Jensen Huang, the CEO of Nvidia, stated that people should not bother learning how to code anymore. He says that Gen AI can write code for us! As a career software engineer and owner of an AI consulting ag...
How does OpenAI Sora actually work?
Просмотров 6 тыс.4 месяца назад
OpenAI Sora is a state of the art text-to-video generative AI model and it is the best AI video generation model BY FAR. OpenAI has released a technical research blog post explaining how Sora actually works and today I am getting hands on and doing a deep dive of how this diffusion transformer model actually works. OpenAI Sora explained. Sora makes it possible to create one minute long photorea...
OpenAI Sora is the BEST text to video AI by FAR
Просмотров 4654 месяца назад
OpenAI just announced Sora, a state of the art text-to-video generative AI model and it is BY FAR the best AI video generation model that has been demonstrated. Sora makes it possible to create one minute long photorealistic videos that remain consistent - something that no other company or research group has been able to show as of yet. The model is not yet broadly available but is being teste...
This Song Is NOT Real
Просмотров 3144 месяца назад
AI Music is here and you won't believe how good it sounds. I'll show you how to use two of the best AI Music tools available - Google MusicFX and Suno.ai. With these AI tools you can make incredible music that sounds as good as music you'll hear on the radio. I'll also go over usecases for AI Music and how it can help creators, musicians, game developers, and more. Later in the video I have a h...
Rabbit R1, GPT Store + Top AI News (Jan '24)
Просмотров 3054 месяца назад
Rabbit R1, GPT Store Top AI News (Jan '24)
How To Use MidJourney Without Discord
Просмотров 2494 месяца назад
How To Use MidJourney Without Discord
Top 5 GPT ideas to easily make MONEY on the GPT Store
Просмотров 2525 месяцев назад
Top 5 GPT ideas to easily make MONEY on the GPT Store
Top 5 GPT Store ideas to AVOID and how to make each one better
Просмотров 4735 месяцев назад
Top 5 GPT Store ideas to AVOID and how to make each one better
How to deploy Custom GPT Actions for FREE
Просмотров 2,2 тыс.5 месяцев назад
How to deploy Custom GPT Actions for FREE
Goals don't matter. Here's why.
Просмотров 3315 месяцев назад
Goals don't matter. Here's why.
Google Gemini API tutorial - is it EPIC or does it SUCK?!
Просмотров 2 тыс.6 месяцев назад
Google Gemini API tutorial - is it EPIC or does it SUCK?!
What is GPT Knowledge Retrieval? (GPT Beginner basics)
Просмотров 2896 месяцев назад
What is GPT Knowledge Retrieval? (GPT Beginner basics)
What are GPT Actions? (GPT beginner basics)
Просмотров 2,9 тыс.6 месяцев назад
What are GPT Actions? (GPT beginner basics)
How to use OpenAI Assistants API with YOUR data
Просмотров 5 тыс.6 месяцев назад
How to use OpenAI Assistants API with YOUR data
GPTs vs Assistants API - which one is best for you?
Просмотров 23 тыс.6 месяцев назад
GPTs vs Assistants API - which one is best for you?
OpenAI board drama summary: Sam Altman's firing and return
Просмотров 2007 месяцев назад
OpenAI board drama summary: Sam Altman's firing and return
How to make GPTs with Actions and Knowledge | D&D Homebrew GPT
Просмотров 9 тыс.7 месяцев назад
How to make GPTs with Actions and Knowledge | D&D Homebrew GPT
How to create custom GPTs in under 5 minutes!
Просмотров 6027 месяцев назад
How to create custom GPTs in under 5 minutes!
How to use the OpenAI Assistants API (Discord Bot Tutorial)
Просмотров 8 тыс.7 месяцев назад
How to use the OpenAI Assistants API (Discord Bot Tutorial)
OpenAI DevDay Announcement breakdown - it changed everything!!
Просмотров 1,1 тыс.7 месяцев назад
OpenAI DevDay Announcement breakdown - it changed everything!!
What is RAG? Use LLMs like ChatGPT with your data!
Просмотров 7137 месяцев назад
What is RAG? Use LLMs like ChatGPT with your data!
AI app idea? Do THIS first... (save 6 months!)
Просмотров 4748 месяцев назад
AI app idea? Do THIS first... (save 6 months!)

Комментарии

  • @ashokn3698
    @ashokn3698 54 минуты назад

    As the CEO of NVIDIA, Jensen's goal is to sell GPUs. To achieve this, he tends to exaggerate even minor details and creates the impression that AI can solve everything, ultimately encouraging people to purchase his GPUs.

  • @smooth2477
    @smooth2477 2 часа назад

    What a evil man he needs replacing himself let’s build a humanoid robot to replace this guy

  • @supersoniqamanyi3075
    @supersoniqamanyi3075 6 часов назад

    Do people just learn how to code only?

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      That's the nice thing about learning coding - you learn a ton of other skills along the way (as well as how a bunch of these systems actually work)

  • @goldentiger1841
    @goldentiger1841 9 часов назад

    I am 55 and learning to code because its more fun then sudoku.

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      Heck yeah! That is awesome - keep going!

  • @Charles_E_Moses
    @Charles_E_Moses День назад

    Does that mean that human do not have to think? No. He is absolutely no correct. Man has to think for himself, act on his thoughts, and feels what he has accomplished things. Therefore, AI has a possibility in solving certain problems.

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      Yeah - we will still have our own thoughts/feelings and needs so will need to solve our own problems, whether using AI or some other way.

  • @luisrosario7542
    @luisrosario7542 День назад

    Seems like someone wants to pull up a curtain and make sure nobody else is able to see

  • @nobo6687
    @nobo6687 День назад

    Tray so solve it Hering saxophone music

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      Could you elaborate? :)

  • @user-me4zt2hg4e
    @user-me4zt2hg4e День назад

    ruclips.net/video/97tG3HSqxvs/видео.html

  • @drhxa
    @drhxa День назад

    Yes, "unplug AI" is like saying "disconnect generators from the power grid". You could maybe do it in principle, but the problem is that people in hospitals would die, our logistics systems are dependent on it to deliver food to cities, etc, etc. It's not possible to "unplug" once its out and the world is dependent on it. I do think we need to have kill-switches on large datacenters to have the capability of stopping new AI models in training or alignment processes that get out of control before they are released (or while in alpha release). This is possible but we need to build the capability and have the processes in place. For example if a rogue actor hack OpenAI in 2029 and does a rogue internal deployment of GPT-7 before it has been aligned, we really don't know what kinds of chaos it could unleash. Being able to have a stop button is key for both accelerating AND for security.

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      You got it! I think the big thing we need to start differentiating is raw AI models that can do simple input/output and matrix multiplication in the middle VS AI Agents which are really just programs that make use of AI models as part of their workflow. AI Agents is what poses the vast majority of the risk IMO.

  • @drhxa
    @drhxa День назад

    Look up Ryan Greenblat's solution he posted in his blog for ARC AGI. He used an LLM in an AI system to get 50% on the public eval dataset. Specifically what he did is have the LLM (GPT-4o) write ~8000 possible solutions in python per problem and then check the outputs against the examples for the given problem. The closest 2 or 3 are used to generate the final result. He open sourced his code, which is fascinating to look through as well. There are lots of interesting details in the blog and code I'm leaving out included such as test-time compute scaling curves, speculation on what it will take to get 85%, implementation details, etc. He finished this project in 6 days which is to say, it's likely far from optimal in cost and performance

  • @prodromosregalides3402
    @prodromosregalides3402 День назад

    It does not matter if ai can read, write, code, solve math, philosophize etc etc. What matters , is if humans can read, write, code, solve math , philosophize. It does not matter if ai can write essays, do arithmetic, play chess, perform better at the stock market What matters is if humans can.

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      Yes, we should strive to not forget these skills for the sake of keeping our dignity and purpose - even if they become "unnecessary" from an economic perspective

  • @technicalboy1816
    @technicalboy1816 День назад

    You are irresponsible by telling people not to learn CS. AI is a bloody lie.

  • @technicalboy1816
    @technicalboy1816 День назад

    The CEao is a narcissist who talks smack.

  • @vuksekicki6913
    @vuksekicki6913 День назад

    Dude, you really don’t need to make 8 min video when you say something in 3min.

  • @beginning252
    @beginning252 День назад

    Imagine Know 5 types of different coding language off ur own!! and Saying shit like this to new gen people 🙄🙄

  • @mathijs9365
    @mathijs9365 2 дня назад

    When self BI arrived u still required to code.

  • @BryanLanders
    @BryanLanders 3 дня назад

    This is 🔥! I’m on the ARC Prize team and this was a great rundown of everything. Thrilling to see my design work in the video, too. 😊 Hope this inspires people to jump in and participate. Thanks!

    • @renedworschak8670
      @renedworschak8670 2 дня назад

      I think this type of benchmark will become more and more important. Neural networks and LLM are trained with "infinite" test sets. The energy required to form the models will become ever greater - this benchmark shows how the training amount could be reduced or how inflexible the models are. I think it will be particularly crucial for small LLM on the edge (IOT, smartphones).

    • @VoloBuilds
      @VoloBuilds 2 дня назад

      🤩 awesome to hear from you Bryan! You all have done an amazing job with this prize! Hey, if you're up for it - would love it if you could share the video on X - could be a good way to introduce more people to the prize and keep engagement going!

  • @Skybasegame
    @Skybasegame 3 дня назад

    no you shouldnt learn code. more jobs for me :)

  • @petasimcak7853
    @petasimcak7853 4 дня назад

    1234th comment

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      You win this award: 🥇 Well done!

  • @softwareengineer705
    @softwareengineer705 4 дня назад

    Programmers should not develop AI models to write code because it's to replace themselves just for the benefits of some big giants or companies

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      The cool thing is that now individual programmers can actually compete with entire companies by building their own stuff much faster! At big companies, the coordination overhead and bureaucracy will keep them slow despite the advances in tech. I think we will see a big shift toward smaller coding agencies

  • @yashraj_karthikey
    @yashraj_karthikey 5 дней назад

    AI will be employed by humans until AI will be able to employ humans. In both the cases human will have their jobs and roles to do!

  • @atomikg
    @atomikg 5 дней назад

    Keep learning to code, build your own ai and start charging cheaper prices! Coding also includes Embedded Design (i.e. Robotic Automation)

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      Yep! So many things are possible now because of the accelerated pace of development

  • @wesley25101
    @wesley25101 5 дней назад

    Personal Satisfaction in a world where the economy is getting worse. Cool. Idk where you came from but I do not have time to "Personal Satisfaction". I need to pay bills.

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      Best bet to pay the bills is to take advantage of the AI wave and make money with it while the opportunity is there

  • @sykaax
    @sykaax 5 дней назад

    I agree that AI just increase speed of work and in future maybe developers will have less job, you need to learn additional skills as anslitic, science e t c. so not be only programmer, but be a programmer and sonething else it is good idea i think

    • @VoloBuilds
      @VoloBuilds 3 часа назад

      Absolutely - more and more people will become 'generalists' capable of doing many things - and I think current coders are well equipped to manage/control the AI because they can structure their thoughts well and create clear requirements for the AI

  • @Mohammedsufiyanalic7
    @Mohammedsufiyanalic7 6 дней назад

    Programming doesn't help in Life until the advanced technology but once it's developed completely then its the end. Where as maths it really helps a lot in life that is why we don't use calculators beside mental arithmetic. Example: chatgpt = teachers

  • @rainesaysdie863
    @rainesaysdie863 6 дней назад

    "why learn how to draw and paint or film or write, ai can do all these things now!!" Fuck off.

  • @handyalley2350
    @handyalley2350 7 дней назад

    This contradicts the idea that ai will create more jobs. (And it makes sense. I think it will probably create the next high language to deal with all the complexity, not just of ai, but of information, technology not just fron earth, but from space)

  • @philliplam2704
    @philliplam2704 7 дней назад

    lol ur trash

  • @leorium
    @leorium 7 дней назад

    this is what is youtube for. great vid😊

    • @VoloBuilds
      @VoloBuilds 7 дней назад

      :)) Thank you so much!

  • @jonathanSpg
    @jonathanSpg 7 дней назад

    Does anyone know a career path that won't be completely affected by AI?

    • @VoloBuilds
      @VoloBuilds 7 дней назад

      Nothing is completely certain, but some things appear safer than others. In fact, some areas may even see a boost thanks to AI. I compare various roles and the AI risk in this video: ruclips.net/video/4KSs29EPd8M/видео.html

  • @MarinersBasebaII
    @MarinersBasebaII 8 дней назад

    Cope harder script peasants. It's OVER for you. You wasted 4 years of your Iife. Get over it, cIown.

  • @AMightyStorm
    @AMightyStorm 8 дней назад

    wild how that logic works "oh hey yeah all you coders you just coded the path to your career skills being obsolete, good job btw"

  • @stevenmorales6011
    @stevenmorales6011 8 дней назад

    You don’t need a degree to learn to code but you need one if you wanna get passed the HR hiring system.

    • @VoloBuilds
      @VoloBuilds 7 дней назад

      Haha this is so true but drives me crazy. I don't understand why companies are still prioritizing formalities over actual ability. Would love to see more "work trials" where people just get to do the job for like a month and then if it works for everyone, they stay.

  • @medhavimanus
    @medhavimanus 8 дней назад

    AI can help build a code from scratch. But, humans are needed to debug it, correct the errors, make it as per our needs (AI will never 100% understand our needs).

    • @VoloBuilds
      @VoloBuilds 7 дней назад

      The funny thing is, even we will never 100% understand our own needs 😂

  • @kapngod.-
    @kapngod.- 8 дней назад

    copium

  • @pihi42
    @pihi42 8 дней назад

    I've never met a great programmer that studied CS. Great programmers studied Math & Physics. So while we may see a large reduction in run-of-the-mill coders, we'll still need deep thinkers and carriers of programming techniques even with LLMs of 3-rd gen. As far as we know, LLMs are not creative enough outside the box (they are somewhat creative inside it).

  • @pcpc1289
    @pcpc1289 8 дней назад

    In near future human brains are going to be useless. It is time to bid goodbye to human brain. We shall become brainless when AI takes over 🙃

  • @mfpears
    @mfpears 9 дней назад

    I don't get why it's so hard for people to understand that AI can do anything with recursive transformations. Do people have no introspection ability? Or do they think that AI is supposed to solve this in a completely different way from the way humans solve it? Maybe it's harder than it looks, but on the other hand when researchers like Francois Chollet are throwing away brilliant ideas from the grad students because they are too human-like, I'm suspicious about stuff like this.

    • @mfpears
      @mfpears 9 дней назад

      What I'm referring to is the ability to solve long arithmetic problems. One of his grad students had the idea to do it recursively like the way humans learn, and he threw it away because it wasn't reliable Or fast as calculators or whatever.

    • @mfpears
      @mfpears 9 дней назад

      Just analyze what's going on. Zoom. Why did you know what it was? The rectangle looked important. You cut the example inputs up and started matching against the output examples. It's an extremely incremental, recursive process that you just can't see in a single pass-through. But if you give it a series of transformations that it can perform, and then let it recursively apply them until it figures it out, it should be able to use the examples the way humans do and find the transformations, and then it's a matter of knowing what transformations are possible. I think these examples draw on human-centric perceptions. Alignment. Zooming. All of these things relate to how humans see the world. Taking things apart. We have hands, and we have done it millions of times. That's the only reason we try it out as a potential transformation. It just comes to our minds. When we see blocks, we see things to pick up and move.

    • @mfpears
      @mfpears 9 дней назад

      So if I were trying to solve this problem, I would set up a recursive neural network and train it to be able to treat the pixels as objects to manipulate. The output should be a transformation, not a full set of pixels. Reality renders the result of our actions. The thing that has to understand is that objects are rigid though. Or that they can merge. Or whatever. But that expectation in humans is the result of actual real-world experience. These puzzles rely on implicit understandings of how physics works.

    • @mfpears
      @mfpears 9 дней назад

      What this means is that if there is a certain law of physics that is accounted for in the puzzles that aren't made available for training on but are in the actual test itself, it will be impossible to pass it without full human intuition about how the world works. So it's going to take basically a humanoid robot to be able to know how to solve these things.

    • @mfpears
      @mfpears 9 дней назад

      AI researchers should listen to Jordan Peterson or learn how to think on their own.

  • @RamiSobhani
    @RamiSobhani 9 дней назад

    Remember CEOs will try bump their stock up using AI. It is a buzz word now.

  • @asherwiggin6456
    @asherwiggin6456 10 дней назад

    Anyone sensing a coming anti-AI Jihad like in Dune?

  • @hobrin4242
    @hobrin4242 10 дней назад

    tbf to chatgpt tho, maybe the method of inputting the data is a problem. Like we humans could also not read that json and see patterns like that. I think this would be a very hard challenge for us if we had to read a 1d json like gpt did. Also spatial reasoning is kind of a ridiculous thing to ask to an LLM.

    • @VoltLover00
      @VoltLover00 10 дней назад

      It points out that LLMs are not a path to AGI, as some lunatics think they are

    • @VoloBuilds
      @VoloBuilds 10 дней назад

      What I love about this challenge is that the data structure is actually sooo simple. Computer vision isn't "looking" at anything like we are, it's analyzing huge arrays of numbers that represent the colors of each pixel. So you can think of this puzzle as a super simplified image. When we use GPT vision up till now, if used a vision model to understand the contents and then pass that to GPT-4. Now with GPT-4o if should be native and pass it in as a compressed version of the image (you can read more on their blogs) but I've only gotten very poor results from using it unfortunately. Still interesting to see that a fine tuned LLM is the current SOTA for Arc.

    • @hobrin4242
      @hobrin4242 10 дней назад

      ​@@VoltLover00 speaking this number of languages and not to mention programming languages pretty fluently sounds pretty general to me. A shit ton of human problems get solved by thinking in a language as well. I think human eyes are pretty much like an API as well, considering how we only have a narrow focus point where we can actually see properly.

  • @ideacharlie
    @ideacharlie 10 дней назад

    I can guarantee that this is mostly just the way you are giving it inputs. Just send an image so it’s not translating across inputs of whats supposed to be visual

    • @VoltLover00
      @VoltLover00 10 дней назад

      I guess you don't understand how LLMs work? You can't input an image as a prompt to an LLM

    • @VoloBuilds
      @VoloBuilds 10 дней назад

      Given your confidence, you should create a solution and claim the prize :) but I assure you, plenty of smart folks have tried all sorts of LLM prompting tricks and vision models for this benchmark and none have worked well at all so far. That's what I find so interesting about it!

    • @VoloBuilds
      @VoloBuilds 10 дней назад

      Additionally, consider that vision models don't "see" things - they accept huge arrays of numbers representing the colors of each pixel. In that sense, this puzzle's data should be x100 easier to interpret.

    • @drj92
      @drj92 9 дней назад

      It's not a problem with the inputs. You can stick these in as json and the LLM will happily memorize rules. It'll even figure out how to apply the rules it's memorized to slight variations of the problem.

  • @ideacharlie
    @ideacharlie 10 дней назад

    You know it can see images right?

  • @ckq
    @ckq 10 дней назад

    How to solve ARC in my opinion: Inputs: LLM (for thinking), Vision model + generator (fine tuned for grids as in ARC) Train on the example arc puzzles. Convert the jsons to images and create a tokenizer specialized for ARC (i.e. Tetris pieces could be a token). For each of the 400 (i think) public puzzles, give a detailed description of the solution in natural language. Fine tune on this data. That method should reach 80% accuracy on unseen ARC (no one has done it yet probably because we have bigger problems)

    • @hobrin4242
      @hobrin4242 10 дней назад

      go for the prize!

    • @stevenru4516
      @stevenru4516 10 дней назад

      Which problems? Like half of nlp papers are about prompts or model evals

    • @VoltLover00
      @VoltLover00 10 дней назад

      You have no reason to make such predictions

    • @bladekiller2766
      @bladekiller2766 2 дня назад

      This has been tried, achieves less than 20% on the public set which is very bad

  • @DatOleEditGuy
    @DatOleEditGuy 10 дней назад

    Start getting good at making coffee, You may be out of a job soon.

  • @ckq
    @ckq 10 дней назад

    I keep posting this on the Dwarkesh videos about this, I'll post it here too. LLMs are trained on language, of course they'll master that but not visual tasks. They suck at Sudoku (which has an easy solution in code). If you want to solve ARC you'll need to do a convolutional neural network. I simply think the vision models are much "dumber" than the text models since there's way more knowledge in text form. The training data for images doesn't necessarily correspond to intelligence but rather a basic understanding of light and physics which 5 year olds (and plenty of animals) probably have.

    • @VoloBuilds
      @VoloBuilds 10 дней назад

      Would love to see a vision-based approach!

    • @drj92
      @drj92 9 дней назад

      These tasks are not primarily visual -- they can easily be represented via json, or as flattened sequences. The LLMs have no problem memorizing arbitrary manipulations to those sequences, showing that they don't actually have any problem with the input data-type. You don't need CNNs for the network to figure out how to memorize the training set. What they can't do is come up with new, simple combinations of rules that they haven't seen before. The problem isn't that they can't see, it's that they can't think.

    • @bladekiller2766
      @bladekiller2766 2 дня назад

      You can represent the grids 2d matrix of numbers that denote the colors, you don't need cnn at all.

  • @nonprecedent
    @nonprecedent 10 дней назад

    I'm not 100% sure future LLMs won't solve this. I bet that from now on these types of puzzles will be agressively put into the training data in enormous amount.

    • @VoloBuilds
      @VoloBuilds 10 дней назад

      Yeah that's the worry - Francois expressed this concern on Dwarkesh as well - that someone might create a successful but unsatisfactory solution of just synthesizing a ton of arc-like training data and solving it effectively through memorization. I hope some new ideas come about and take a different approach

    • @jonmichaelgalindo
      @jonmichaelgalindo 3 дня назад

      You're 100% wrong. ARC is only a challenge because you're not allowed to use LLMs. You have to use a small, local program. GPT-4o is already at 50% accuracy without any training data. Just explain the concept in a prompt, convert the images to text, and it can not only solve them, it can invent computer programs to solve them. And it's the worst it's ever going to be. Claude 3.5 just surpassed GPT4-o's performance this week. Two years ago, it had 0% accuracy. Where will it be next year? Two years from now? Three? Last week, a user on Twitter / X converted these challenges into text-based-squares, built a prompt (less than 32k tokens), and had GPT-4o write python programs to solve them, then submitted those programs to the ARC challenge. GPT-4o's work scored over 50% accuracy, which is higher than MindsAI's 39% currently topping the ARCPrize leaderboard.

    • @tommiest3769
      @tommiest3769 3 дня назад

      @@jonmichaelgalindo Still, the fact that I can sit with my coffee never having seen these puzzles before, and leisurely solve all the ones I have tried so far with relative ease, and yet it takes AI an enormous amount of energy and "compute" just to hit 50% accuracy shows that AGI is still elusive. That said, my mind is the product of 4 billion years of evolution whereas these Chatbots are just getting started. I expect that AGI will be reached within 10-20 years even though we aren't exactly sure what it will take to get there. After all, who predicted 5 years ago that AI would be where it is today in terms of being able to pass medical exams etc...

    • @jonmichaelgalindo
      @jonmichaelgalindo 3 дня назад

      @@tommiest3769 I'm basically incompetent at these puzzles. Way lower than average. :-( And I'm not stupid. I play several instruments. I'm a lot better at coding than GPT-4. I've self published novels. I enjoy philosophy. I could go on. But these stupid squares never make sense. I get it right after someone tells me the trick and then it seems super obvious, but there's just something not quite right in my head.

    • @tommiest3769
      @tommiest3769 3 дня назад

      Isn't the best way to test whether a system is an AGI to place it in a completely novel environment and ask it to figure out a puzzle for which it has no experience whatsoever? So in some ways, we might need embodiment before this can happen. An example might be an escape room or placing it out in the middle of a deep woods and seeing if it can figure out how to get from point A to point B (e.g. orienteering). Another test for AGI/ASI would be to set it loose on one of The Millennium Prize Problems" in mathematics.

  • @kissmyaft
    @kissmyaft 10 дней назад

    Well honestly if i were presented such a puzzle in json format i'd probably struggle too

  • @duytdl
    @duytdl 10 дней назад

    -But didn't IQ tests already have such pattern matching questions that AIs have passed to average human level? Or am I misinformed?- nvm, watched the full video and understood what I was missing. Fascinating insight!

    • @VoloBuilds
      @VoloBuilds 10 дней назад

      Thanks for watching! I hope we will see a non-memorizarion based solution for Arc!

  • @whismerhillgaming
    @whismerhillgaming 10 дней назад

    I wonder how GPT omni would fare at this task since GPT omni is capable of understanding all kinds of input directly and is much better at having a broader understanding of stuff

    • @VoloBuilds
      @VoloBuilds 10 дней назад

      The model I used was GPT-4o but admittedly I only did text input, not visual. I believe others have tried visual based approaches and had similar results. Will be interesting to see if someone can create an effective solution on the public leaderboard using this approach!