Was I Wrong About AI Agents? | INSANE OpenAI-o1 Planning Capabilities

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024

Комментарии • 46

  • @mrpocock
    @mrpocock День назад +21

    It's just beginning to get useful. It would be neat to see an agent community that automate a software development cycle, starting with documentation then acceptance tests, specification/contract tests, implementation, test verification and debugging. At the moment all the demos of LLMs making code I see are basically just asking it to write snake, and seeing if it does it, or when it doesn't, if it can fix the code given the compiler/runtime error. But I've found with my own projects that if you invest heavily in the textual and programmatic documentation up-front, they are much better at generating code that actually works.

  • @40bombala
    @40bombala 2 дня назад +6

    Wow, this is so good. I don't think structured output or JSON mode is available for o1 yet though. Will be even more powerful once those and function calls are available.

  • @Leto2ndAtreides
    @Leto2ndAtreides День назад +11

    You're making me feel bad about not burning $1,000 on the OpenAI API.

    • @wurstelei1356
      @wurstelei1356 День назад

      Did the requests for this video really cost $1000? Sounds expensive. One could buy a descent GPU and run agents forever locally from that money.

    • @gramnegrod
      @gramnegrod День назад

      Open ai is releasing to Tier 4 now:OpenAI’s Tier 4 for API usage has specific requirements and benefits:
      • Requirements: You must have a payment history of at least 14 days and have spent at least $250 on the API..

    • @eirikgg
      @eirikgg День назад

      @@gramnegrod But you could also use it via openrouter, i did but so far havent seen much improvments over claude 3.5 im my tests

    • @thedinoman7828
      @thedinoman7828 День назад

      just use openrouter

  • @hqcart1
    @hqcart1 День назад +4

    If you really think about it, O1 did not do anything, you already have broken the task into the 15 steps ready, and you could have just sent these 15 steps to o1 to do everything, except o1 cant generate images (for now) , so i do not see what is the benefit...

    • @idea_list
      @idea_list День назад +1

      It's not that, you should just try to create such a sequence of agent instructions with anything else. You'll immediately understand the difference. And the thing is, with moderately complicated agentic system user shouldn't even be aware of what agents and with what functions there are. User should just generate a task, and the system's mainbrain (o1 in this case) should do all the planning. No other model could do that before, there were lots of tiny (or not so tiny) inaccuracies, and while trying to get rid of them via prompting or structural adjustments you'd generate a thousand more. It was just not viable.

  • @janweber1699
    @janweber1699 День назад +2

    wow that was more impressiv than expected kinda crazy, the scaling laws seem even more absurd now.
    great video what a cool dude

  • @wildfotoz
    @wildfotoz 2 дня назад +3

    I apologize if you've covered this in a previous video but, what if you do a video where you have an agent that will create a web application that talks to a database? Will it actually create a new database or tables in an existing database. Will it create the different pages or forms in the app and give you a menu system? Another idea, have it search different websites for new TV shows or movies that are coming out and have it show you only the stuff that matches your tastes. Stuff like this would be a more real world use case for agents.

  • @wurstelei1356
    @wurstelei1356 День назад +1

    I wonder if the new LLaMa (3.2) models are capable of doing similar things. The smaller models seem better than from the 3.1 version.

  • @TheTruthIsGonnaHurt
    @TheTruthIsGonnaHurt 5 часов назад +1

    *How long did it take to do the planning?*
    Kind of a big step in testing capabilities and it wasn't discussed.

  • @pin65371
    @pin65371 День назад +1

    One project that could be interesting to work on would be to think of a bunch of tasks that you do pretty often and use cursor to make a bunch of scripts for you and build almost like a home page with all the scripts and a front end for them. So lets say you want to do transcripts for your videos you could just have a drag and drop where you could just drag the video file on and it would create a transcript. You could also add a little chat for each tool so if you want to add custom instructions you could do that as well. With how quick and cheap it is to do stuff like this now you could quickly just build up a lot of tools that are useful.

  • @VaibhavShewale
    @VaibhavShewale День назад +5

    wow, damn!
    you telling us for free? not asking to join your paid user to learn this?
    either this is ot that good or you just wanted to show good will??
    well seeing your full video after so much time feels good!

  • @alizaman239
    @alizaman239 День назад +1

    How is this too different from having lambda function rather than agents ?

  • @egoincarnate
    @egoincarnate 2 дня назад +2

    Is source for this available? I don't see it on the linked github...

    • @AyushBhatt-g7q
      @AyushBhatt-g7q День назад

      same, i was looking for the files so that i can understand them but they aren't there.

    • @VaibhavShewale
      @VaibhavShewale День назад +2

      you guys need to be a paid user of his community to check out the code

  • @mrpro7737
    @mrpro7737 День назад

    WOW, you did a great job with your prompting method
    Can you made a web search agent that
    - chat with user about a problem
    - than generate search queries for Google search
    - than export 10 first website from every search query
    - scrape all the websites and analyse the data with web scraping analysist agent
    - than save this data in vectors
    - give the saved data to an agent along with the main prompt + system prompt that make it generate an HTML page of an article that have all the solution for the problem along with sources and images
    its really a cool project, i am working on it 😅 , i am challenging myself to complete it in 1 month

  • @lystic9392
    @lystic9392 День назад +1

    What kind of costs are to be expected with things like this?

    • @pnwadventures2955
      @pnwadventures2955 День назад

      Between $1 and $25.000, or more. You could also do it for free.

  • @gramnegrod
    @gramnegrod День назад

    Awesome video! THX for all the creative applications! I think getting agents to program and deploy browser actions like through Selenium would really open up many valid use cases, basically limitless. I'm talking about having o1 produce workflows similar to what Mullion is trying to do but instead using a library like Skyvern into your agent tools. It is probably a bit grandiose, but it might work.

  • @idea_list
    @idea_list День назад

    Quite excited to see this, thanks a bunch for pioneering and sharing, it saves others a lot of time) You probably should've explained crucialness of master agent role in more detail though. Half of the comments are from those who don't understand the imprortance of o1-preview output in your system.

  • @iPloox
    @iPloox 23 часа назад

    1 file, now make it pull any open source repository and fix a bug issue on that repo. If you dont give it steps, it always fails

  • @TheAIVarietyShow
    @TheAIVarietyShow День назад

    For the snow, 16F would be snow weather. Not sure what 16C is off the top of my head. Maybe that's why the Bart pic didn't come out as expected

  • @aaronag7876
    @aaronag7876 День назад

    Are all Ai Agents code written in Python ? or is there other languages that can be used ? Like Javascript ?

  • @nessrinetrabelsi8581
    @nessrinetrabelsi8581 День назад

    Can you just ask it to do one task with no details like you did? Like generate a .md which contains 3 images of the weather of next 3 days in SF?

  • @raymond_luxury_yacht
    @raymond_luxury_yacht День назад

    I'm gonna hook up to sonet and tell it to just spit out the code via API cos I'm fed up copying and pasting to Vs ode. It just works!

  • @wicktorinox6942
    @wicktorinox6942 День назад

    What would be great to see a demo, when it is not starting as "do a basic".... Because, this is where my struggles are starting.

  • @SebKrogh
    @SebKrogh День назад

    Could this be done in something like flowise or similar?
    Just thinking if there are any glaring limitations of the low-code setups?

  • @xspydazx
    @xspydazx 2 дня назад

    i dont think you can be wrong my friend !
    you just realise its using a graph and a router with and itent detector !
    and the guardrails !
    the model has not changed !, thye did say they fine tunoed the model on the step by step !
    yu have produced these works your self ! ( you just need a graph ).. this is the best way to create a agentic system :
    you can also use open router ( this can be a agnet in the chain to detect which route to pick , ie the routes can be graphs ) ..
    Each node can be a agent in nthe graph with its won specific tools !
    graphs are recursive until the problem or step is solved !
    but it may take a long time but it is heavily direccted as well as being quite inteligence with latitude for the agents to create ad perform !
    SO now we can create many types of graph for various types of task ! and the router to pick which path to take !
    Master mind ! can be the fat controller !

  • @MarkAlmeida-Cardy
    @MarkAlmeida-Cardy День назад

    Great content! Any chance, we can have access to the code you used in the video?

  • @aiamfree
    @aiamfree День назад

    I have a feeling the stream of the answer at the end of o1 is faux-stream

  • @micbab-vg2mu
    @micbab-vg2mu 2 дня назад

    very interesting - thank you for sharing:)

  • @fredericherrera
    @fredericherrera 2 дня назад

    very good. reverse engineering pub trivia questions

  • @nufh
    @nufh День назад

    Can I get this codes via AI Rookie?

  • @evil_duck6405
    @evil_duck6405 День назад

    mate, how did you use o1 api? it's not available yet unless you pay $5000. Or did you use it from openrouter?

  • @fmbetterforms5900
    @fmbetterforms5900 День назад

    These are not agents they are functions.

  • @TomlinsonHume-h6m
    @TomlinsonHume-h6m 6 часов назад

    Williamson Loaf

  • @finalfan321
    @finalfan321 День назад

    so many use cases

  • @Rh22-c9l
    @Rh22-c9l День назад

    Holy s....... im learning to program ... why ?