Introducing The New Champion of Function Calling!

Sam Witteveen

Просмотров 13 тыс.

411

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 6 фев 2025
In this video I go through the new open Tool Use / Function Calling model which come from Groq and Glaive and is based on the Llama-3 models.
Colab: drp.li/CH8mY
Blog: wow.groq.com/i...
Docs: console.groq.c...
For more tutorials on using LLMs and building Agents, check out my Patreon:
Patreon: / samwitteveen
Twitter: / sam_witteveen
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨‍💻Github:
github.com/sam... (updated)
github.com/sam...
⏱️Time Stamps:
00:00 Groq Tool Use X (Twitter) Post
00:13 Groq Tool Use Blog
00:52 Berkeley Function Calling Leaderboard
02:36 Glaive AI
03:58 Code Time
14:27 Groq Hugging Face

Комментарии •

@thetagang6854 6 месяцев назад ⁺¹⁰
There is no rest when you're in this industry. Always some part of the tech stack that's developed, some new feature. Thanks for covering the best bits!
@samwitteveenai 6 месяцев назад
Yeah I agree with you on the whole though I am finding the increments of getting better with each of these models lately is getting smaller for a lot of them. There have been a few where I decided not to make a video because I felt there wasn't enough value in changing etc. The interesting stuff is moving away from the model a lot I feel now.
@sohamtilekar5126 4 месяца назад
I brpocked this llm, with the perfect discription & name
@SirajFlorida 6 месяцев назад ⁺²
Wow! How exciting! Man you're my hero Sam. You are literally 8 steps ahead of the curve.
@mitchellmigala4107 6 месяцев назад ⁺¹
I wish they would release a mixture of agents option for people to use natively through their API. I have my own setup I can use, but I see a lot of people using LLMs who dont have the ability to do that.
Function calling has great utility, but any model can do this. If you give it the tool list with definition and the schema to use and give it a few examples in your messages array of a back and forth user and assistant messages that show the assistant using them in various scenarios most decent models will do really well with using them. In places where you're 100% sure it should be using at least one tool, you simply pair this with a function that just re-asks the same question recursively until you parse the response you know you're looking for.
@geniusxbyofejiroagbaduta8665 6 месяцев назад ⁺¹
This is amazing
@ringpolitiet 6 месяцев назад
Thanks for the video, an interesting model. Am I right in thinking that what this model is good at is actually extracting data from a text to make properly formatted input data to tool calls, but weaker in making the decision to call a tool or not? Like you showed with your "(search) when do the olympics start" example, I was a bit surprised that a 70b model couldn't get that one. I see they also mention this in their blog post, a hybrid/routing approach. It would be interesting to see the benchmarks/performance if the models were allowed such a "reasoning layer" on top.
@j4cks0n94 6 месяцев назад ⁺³
From my limited testing, it's significantly more prone to hallucinations than gpt family of models that I've been using (it hallucinates argument values, creates argument values out of thin air, and even creates new functions). For my use case, even gpt-3.5-turbo and the vanilla version of llama3 that they're hosting is doing better on my custom evals than this new one, which is honestly kinda disappointing. I'm starting to feel like those benchmarks are not as good of a source of evaluation as they're wanting us to believe.
@not_a_human_being 6 месяцев назад
Noice!
@raymond_luxury_yacht 6 месяцев назад
sick
@unclecode 6 месяцев назад ⁺¹
I don't think they'll release the dataset, as Groq wants to keep it as a competitive advantage to increase their developer base. Anyway, you mentioned query rewriting, so let me share something. You know, from my actual production experience, it's too bold to release software with function calling without query rewriting. Recently, in a project where we needed function calling and tried many models, we faced unpredictability. Instead of fine-tuning those models, we fine-tuned GPT-2 specifically for query rewriting using synthetic data tailored to our case. And voila! Once we implemented that, all the nuances and unpredictability were gone. Query rewriting, either using a strong model or our approach, allows for effective use of many language models supporting function calling without fine-tuning the entire model. Like in your last example, with or without the keyword "search," query rewriting is definitely one of the best steps in the pipeline.
@micbab-vg2mu 6 месяцев назад
great:)
@choiswimmer 6 месяцев назад
That was fast!
@tpadilha84 6 месяцев назад
In my local testing, it seems Llama 3 8b is already pretty good for function calling (couldn't find cases where it fails)
Would be interesting to see in which function calling cases these high performing FC models succeed while Llama 3 from Meta fails.
@samwitteveenai 6 месяцев назад
Agree I hope they release the dataset so we can see what they added etc. I am still testing it and just got the Ollama version going and it seems a bit hit and miss there.
@sanchaythalnerkar9577 6 месяцев назад
we can still fine tune it further right?
would take make a difference?
@teddyfulk 6 месяцев назад
I think phidata does the best open source function calling
@hqcart1 6 месяцев назад
I really dont understand why we need this? cant you just send a prompt to the LLM, "calculate this formula and return the result in json format
[ {
"formula": "",
"result": ""
} ]
why do we complicate things with a lot of text that 100% you will have typo somewhere and you will spend hours finding that typo, to achieve what exactly???
@davidrobertson6371 6 месяцев назад
This model is trash, I’m sorry but whoever did the benchmarking needs to be fired. It fails on every 3-4 calls quite regularly. It’s ok for super super simple function calls and it’s no better than the base Llama 3 model. Thumbs down on this model for me.

Следующие

Автовоспроизведение

OpenAI's GPT-4o-Mini - The Maxiest Mini Model?