I believe it is time to change your 13 test questions. I feel the Microsoft PHI team is following you and training the model around your questions. :). You can create a different set of questions similar in concept.
You should make a longer video by creating a full stack application using the models who've scored really good in your benchmarking questions. That way we'll know which one's the best.
My only concern is the model only being good on benchmarking questions, because of the history of Phi models being trained specifically to score high in benchmarks rather than real world performance. But this model seems promising, I'm excited to try it out.
Next Level and local. I expect that tool use becomes a feature. It would greatly enhance the potential. I've ran it at q8, q6, and q4 and basically got the same performance. Trying it now with the settings as you recommended. Thanks for sharing CodeKing.
can't wait for Phi-4 small (~7B) and Phi-4 mini (~3B) and make it crush all benchmarks in these ranges the Phi-4 you're showcasing here is a Phi-4 medium
I believe it is time to change your 13 test questions. I feel the Microsoft PHI team is following you and training the model around your questions. :). You can create a different set of questions similar in concept.
😂😂
hahha classic benchmark issue
Actually the best benchmark is aider leadboard. Whichever LLM is on top it is the best period.
You should make a longer video by creating a full stack application using the models who've scored really good in your benchmarking questions. That way we'll know which one's the best.
then join the membership, simple
@@aculz thx for the Info🎉
My only concern is the model only being good on benchmarking questions, because of the history of Phi models being trained specifically to score high in benchmarks rather than real world performance.
But this model seems promising, I'm excited to try it out.
Next Level and local. I expect that tool use becomes a feature. It would greatly enhance the potential. I've ran it at q8, q6, and q4 and basically got the same performance. Trying it now with the settings as you recommended. Thanks for sharing CodeKing.
Can you do simple, non-impactful changes to the questions? for example, "2 plums" instead of "2 apples"
Okay, got it!
excellent video dude!!!!
Does open web ui normally display generated pages (like with the confetti button @ ~9:11)?
Yes
Best test is asking for a modern sleek landing page, you quickly see how good or bad the model is
can't wait for Phi-4 small (~7B) and Phi-4 mini (~3B) and make it crush all benchmarks in these ranges
the Phi-4 you're showcasing here is a Phi-4 medium
Lmao I won't trust Phi models until real world benchmark like arena/live bench.
Yep been disappointed too many times
You can run it on even a M4 Mac mini .
That's a quite good model. Thanks.
I think it's time to update your test questions!!
next time paste all the questions at once and lets see the fun.
Can you compare other small models?
Open Ai ... its gonna be Open
Woooooooow! It is incredible
This is quite insane 😮
Wow
🐿️🐿️🐿️🐿️🐿️🐿️