Evals for AI Agents, the right way!!!

Поделиться
HTML-код
  • Опубликовано: 28 окт 2024
  • НаукаНаука

Комментарии • 10

  • @MindForeverVoyaging
    @MindForeverVoyaging 2 месяца назад +2

    I have a similar agent setup that takes a similar approach and have found that Anthropic's claude-3-5-sonnet-20240620 model (not shown in the table) seems much better than OpenAI's GPT-4o model at determining what function tool to use in a given context. The approach I took was not to provide information in the main agent's system prompt about the functions that it has available to it, but instead, the agent should be able to 'associate' which function to call from the OpenAPI definitions which are part of each available function in the agent's tools.
    This is all subjective, but in my conversations with the main agent I found that when I asked for something to be done, the 3.5 sonnet model would use the correct function and arguments the majority of the time but the Gpt4-o model would quite often have to be reminded that it had the function available to it, having been reminded, the agent would then make the correct call. As the paper pointed out about the open source models, their context to function 'association' is much lower and as a result, cannot be relied upon and are therefore mostly useless for this type of approach.(I was using llama3.1 thru groq)

    • @1littlecoder
      @1littlecoder  2 месяца назад

      This is a great validation of the paper

  • @GNARGNARHEAD
    @GNARGNARHEAD 2 месяца назад +1

    oh heck yeah, love some paper reviews 👍

    • @1littlecoder
      @1littlecoder  2 месяца назад +1

      @@GNARGNARHEAD glad there's some interest still 😁🙏🏾

  • @MichealScott24
    @MichealScott24 2 месяца назад

  • @AI-Wire
    @AI-Wire 2 месяца назад +1

    When do you think we will have computer-using agents?

    • @1littlecoder
      @1littlecoder  2 месяца назад

      Open interpreter kind of tried. People have tried the same with GPT-4o but nothing is quite yet there

  • @dhruvmehta2377
    @dhruvmehta2377 2 месяца назад +2

    💯❤️‍🔥