Introducing The New Champion of Function Calling!

Поделиться
HTML-код
  • Опубликовано: 3 окт 2024

Комментарии • 20

  • @thetagang6854
    @thetagang6854 2 месяца назад +10

    There is no rest when you're in this industry. Always some part of the tech stack that's developed, some new feature. Thanks for covering the best bits!

    • @samwitteveenai
      @samwitteveenai  2 месяца назад

      Yeah I agree with you on the whole though I am finding the increments of getting better with each of these models lately is getting smaller for a lot of them. There have been a few where I decided not to make a video because I felt there wasn't enough value in changing etc. The interesting stuff is moving away from the model a lot I feel now.

  • @SirajFlorida
    @SirajFlorida 2 месяца назад +2

    Wow! How exciting! Man you're my hero Sam. You are literally 8 steps ahead of the curve.

  • @mitchellmigala4107
    @mitchellmigala4107 2 месяца назад +1

    I wish they would release a mixture of agents option for people to use natively through their API. I have my own setup I can use, but I see a lot of people using LLMs who dont have the ability to do that.
    Function calling has great utility, but any model can do this. If you give it the tool list with definition and the schema to use and give it a few examples in your messages array of a back and forth user and assistant messages that show the assistant using them in various scenarios most decent models will do really well with using them. In places where you're 100% sure it should be using at least one tool, you simply pair this with a function that just re-asks the same question recursively until you parse the response you know you're looking for.

  • @j4cks0n94
    @j4cks0n94 2 месяца назад +3

    From my limited testing, it's significantly more prone to hallucinations than gpt family of models that I've been using (it hallucinates argument values, creates argument values out of thin air, and even creates new functions). For my use case, even gpt-3.5-turbo and the vanilla version of llama3 that they're hosting is doing better on my custom evals than this new one, which is honestly kinda disappointing. I'm starting to feel like those benchmarks are not as good of a source of evaluation as they're wanting us to believe.

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 2 месяца назад +1

    This is amazing

  • @unclecode
    @unclecode 2 месяца назад +1

    I don't think they'll release the dataset, as Groq wants to keep it as a competitive advantage to increase their developer base. Anyway, you mentioned query rewriting, so let me share something. You know, from my actual production experience, it's too bold to release software with function calling without query rewriting. Recently, in a project where we needed function calling and tried many models, we faced unpredictability. Instead of fine-tuning those models, we fine-tuned GPT-2 specifically for query rewriting using synthetic data tailored to our case. And voila! Once we implemented that, all the nuances and unpredictability were gone. Query rewriting, either using a strong model or our approach, allows for effective use of many language models supporting function calling without fine-tuning the entire model. Like in your last example, with or without the keyword "search," query rewriting is definitely one of the best steps in the pipeline.

  • @ringpolitiet
    @ringpolitiet 2 месяца назад

    Thanks for the video, an interesting model. Am I right in thinking that what this model is good at is actually extracting data from a text to make properly formatted input data to tool calls, but weaker in making the decision to call a tool or not? Like you showed with your "(search) when do the olympics start" example, I was a bit surprised that a 70b model couldn't get that one. I see they also mention this in their blog post, a hybrid/routing approach. It would be interesting to see the benchmarks/performance if the models were allowed such a "reasoning layer" on top.

  • @tpadilha84
    @tpadilha84 2 месяца назад

    In my local testing, it seems Llama 3 8b is already pretty good for function calling (couldn't find cases where it fails)
    Would be interesting to see in which function calling cases these high performing FC models succeed while Llama 3 from Meta fails.

    • @samwitteveenai
      @samwitteveenai  2 месяца назад

      Agree I hope they release the dataset so we can see what they added etc. I am still testing it and just got the Ollama version going and it seems a bit hit and miss there.

  • @choiswimmer
    @choiswimmer 2 месяца назад

    That was fast!

  • @not_a_human_being
    @not_a_human_being 2 месяца назад

    Noice!

  • @raymond_luxury_yacht
    @raymond_luxury_yacht 2 месяца назад

    sick

  • @micbab-vg2mu
    @micbab-vg2mu 2 месяца назад

    great:)

  • @teddyfulk
    @teddyfulk 2 месяца назад

    I think phidata does the best open source function calling

  • @sanchaythalnerkar9577
    @sanchaythalnerkar9577 2 месяца назад

    we can still fine tune it further right?
    would take make a difference?

  • @hqcart1
    @hqcart1 2 месяца назад

    I really dont understand why we need this? cant you just send a prompt to the LLM, "calculate this formula and return the result in json format
    [ {
    "formula": "",
    "result": ""
    } ]
    why do we complicate things with a lot of text that 100% you will have typo somewhere and you will spend hours finding that typo, to achieve what exactly???

  • @davidrobertson6371
    @davidrobertson6371 2 месяца назад

    This model is trash, I’m sorry but whoever did the benchmarking needs to be fired. It fails on every 3-4 calls quite regularly. It’s ok for super super simple function calls and it’s no better than the base Llama 3 model. Thumbs down on this model for me.