i've tried many of open source models, but still can't figure it out how to make a chatbot with tool calls with chat history. After answering according to its general knowledge and not using a tool even though it should, llm model cannot even use very simple tools. Because I think it looks at conversation history and says "oh, okay, I can handle this without using a tool"
I loaded a 4.3b exl2 quantized version of it on my 2x RTX 3090. I'm getting 15-18 tokens a second. This thing is super fast and smart! The only downside is that it's censored. Hopefully someone makes an uncensored version soon.
I have an issue. Please help me people..i waited foe my new cpu with i914900kf 128 gb Kingston 6000 mhz ran nvme 980 pro and rtx 4090 BUT 70B models can fit to my vram so how to use all my power? Or is there any model for coding tasks which will give good coding results and fit all my hardware?
Я протестировал его для решения своих задач на Python, а также попросил написать несколько простых игр вроде арканоид. Очень слабая модель. Мне кажется даже Qwen 2.5 6B получше справляется.
I was eagerly waiting for your this video. I also think the model is awesome. I was waiting for your review
Thank you for the video!
how fast is it? can we use it on some fast platform like groq?
Much appreciated
Wow impressed with how fast it was able to debug the codes!
Can you make a video about how to host a nim on google cloud?
can you test it like apple did, make some small changes in names or numbers in test questions and see if response quality gets worse or not?
i've tried many of open source models, but still can't figure it out how to make a chatbot with tool calls with chat history. After answering according to its general knowledge and not using a tool even though it should, llm model cannot even use very simple tools. Because I think it looks at conversation history and says "oh, okay, I can handle this without using a tool"
I loaded a 4.3b exl2 quantized version of it on my 2x RTX 3090. I'm getting 15-18 tokens a second. This thing is super fast and smart! The only downside is that it's censored. Hopefully someone makes an uncensored version soon.
"Downside."
@@rportella9357Yes, that is a downside.
Where can I test this modrl like you did?
I heard it will use more tokens for a query die to the built-in CoT?
Thank you for the video. What hardware did you run this on?
I used hugging chat.
Ideally you need a high spec computer with good GPU
can we use it with groq
Nice!
Do you get the same results if you redo the whole test?
i bet not, cuz i've see some people get the strawberry question right.
I have an issue. Please help me people..i waited foe my new cpu with i914900kf 128 gb Kingston 6000 mhz ran nvme 980 pro and rtx 4090 BUT 70B models can fit to my vram so how to use all my power? Or is there any model for coding tasks which will give good coding results and fit all my hardware?
Ask ChatGPT ;)
🎉
Я протестировал его для решения своих задач на Python, а также попросил написать несколько простых игр вроде арканоид. Очень слабая модель. Мне кажется даже Qwen 2.5 6B получше справляется.
I am a Chinese user, and I also feel that this model is weak. Why is everyone praising it?