Do you have another channel that you publish your experiments on? Like for your work with robots? I would love to see videos like that either on this channel or a similar channel that you run.
Thats why i like swarm framework as the better the model gets, it can just use those powers for better picking of the best agents and then the best tools/functions, not doing the extended work. The tools get everything done and the agents just help with integrating the fuzzy communications between processes.
I tend to agree. SWARM demonstrates how this "fuzzy communication" can be realized recursively through function calling. I wrote about this in an article titled "SWARMing Conversational AI: Integrating No-Code and Code in Agent-Based Workflows," which you can find on LI.
AI models don't actually "understand" reasoning? News flash. Neither do humans. Its all about efficacy. If it walks like a duck and quacks like a duck. Guess what?
Oh MAN! I love your stuff but you REALLY dropped the ball here. Ok, all they've done is say "Hey! The model is bad at math! Give it a calculator!" There is no reasoning here. This EXCLUSIVELY applies to consistent mathematical reasoning suitable for programming. Why are they even using a model here? All it is doing is coding. No one is testing the model's logic AT. ALL. Look, if you can parameterize and regularize it, USE A COMPUTER! LLMs are good at stuff Turing machines aren't. You want to impress? Do it WITHOUT an external tool. Now, I like your logic test. THAT'S good. Until you run it on a non-model environment. No, they ask "How do we evaluate and improve the model's reasoning?" and their answer is "Outsource the reasoning to something that isn't a model." LAME. I mean, even their basic test that came up with the 20% error - that didn't show a failure of _logic_, that could much more easily be explained by basic innumeracy. It can have the logic cold, but if 2 + 2 = 5 because it had a brainfart, it will fail.
Do you have another channel that you publish your experiments on? Like for your work with robots?
I would love to see videos like that either on this channel or a similar channel that you run.
Very interesting! I can't wait to see this in practice. Can you make a video of its practical applications and how to implement them?
Thats why i like swarm framework as the better the model gets, it can just use those powers for better picking of the best agents and then the best tools/functions, not doing the extended work. The tools get everything done and the agents just help with integrating the fuzzy communications between processes.
I tend to agree. SWARM demonstrates how this "fuzzy communication" can be realized recursively through function calling. I wrote about this in an article titled "SWARMing Conversational AI: Integrating No-Code and Code in Agent-Based Workflows," which you can find on LI.
Alice isn't wozz, but Scott most definitely is.
AI models don't actually "understand" reasoning? News flash. Neither do humans. Its all about efficacy. If it walks like a duck and quacks like a duck. Guess what?
Oh MAN! I love your stuff but you REALLY dropped the ball here. Ok, all they've done is say "Hey! The model is bad at math! Give it a calculator!" There is no reasoning here. This EXCLUSIVELY applies to consistent mathematical reasoning suitable for programming. Why are they even using a model here? All it is doing is coding. No one is testing the model's logic AT. ALL.
Look, if you can parameterize and regularize it, USE A COMPUTER! LLMs are good at stuff Turing machines aren't. You want to impress? Do it WITHOUT an external tool. Now, I like your logic test. THAT'S good. Until you run it on a non-model environment.
No, they ask "How do we evaluate and improve the model's reasoning?" and their answer is "Outsource the reasoning to something that isn't a model."
LAME.
I mean, even their basic test that came up with the 20% error - that didn't show a failure of _logic_, that could much more easily be explained by basic innumeracy. It can have the logic cold, but if 2 + 2 = 5 because it had a brainfart, it will fail.
bootiful