Thanks, seems very interesting. llama.cpp already supports Octopus V2 inference (technically its the same as gemma 2B) and it seems 2-3x faster than llama2-7B for token-generation, but strangely not much faster in prompt-processing (essential for RAG). Maybe I'm just stupid, but Octopus did not generate good function-calling code for me based on the input. It just picked a function-call example, which seems to match the problem. But did not parameterize it correctly, just provided an example as response. P.S: LG aus Wien
What about gemma? Groq has llm models (mixtral, lamma, and gemma) which u can use in open interpreter on android, and oi hardly wastes my battery when the model is not on my phone locally at all. Oi --os version at this time still isnt working but hopefully soon. It would be cool if someone put octopus into mixtral, then mixtral would have an expert that would put it way up there with the other models if that were possible
Maybe I'm mistaken, but to me it's far more. LLM Function calling is supposed to know, when the LLM should trigger an external call. And then generate a correct JSON for calling the right external function - with all the function-call parameters filled in correctly by the LLM. The LLM answers not in human-language but with correctly formatted JSON for the calling of specific functions to do the actual work.
Love the concept of phone offloading to a desktop for a closed local loop.
Thanks, seems very interesting. llama.cpp already supports Octopus V2 inference (technically its the same as gemma 2B) and it seems 2-3x faster than llama2-7B for token-generation, but strangely not much faster in prompt-processing (essential for RAG). Maybe I'm just stupid, but Octopus did not generate good function-calling code for me based on the input. It just picked a function-call example, which seems to match the problem. But did not parameterize it correctly, just provided an example as response.
P.S: LG aus Wien
What about gemma? Groq has llm models (mixtral, lamma, and gemma) which u can use in open interpreter on android, and oi hardly wastes my battery when the model is not on my phone locally at all. Oi --os version at this time still isnt working but hopefully soon.
It would be cool if someone put octopus into mixtral, then mixtral would have an expert that would put it way up there with the other models if that were possible
Thank you for the effort but just too many beautiful in one sitting.
Aloha from da 9th Island! 🤙
Functional tokens sound like macros, so you're just giving the LMM a macro to run
Maybe I'm mistaken, but to me it's far more. LLM Function calling is supposed to know, when the LLM should trigger an external call. And then generate a correct JSON for calling the right external function - with all the function-call parameters filled in correctly by the LLM. The LLM answers not in human-language but with correctly formatted JSON for the calling of specific functions to do the actual work.
We'll go back and listen again