I use LLMs daily for coding assistance. Tried the experimental Gemini yesterday as a substitute and it was a mixed bag. For simple tasks it produced cleaner code than other LLMs, but for complex tasks it would greatly overcomplicate the code, and get stuck in suggestion loops when it does not compile.
@@elawchess If it is not using search like o1-preview, then that is a bit worrying because any new breakthroughs after transformer unlikely to be made public, and 32k context length suggests it's a model fresh out of the oven, I hope it's not a breakthrough like the transformer was any other improvement is welcome.
The reason why it mentions 1 B and 0 b in bananas is the context. You started the conversation asking for jokes. Probably it's trying to be funny and hence this response and a WINK
Benchmarks don't mean anything. I don't trust benchmarks anymore. I have to use it to believe, especially Google. Tried once just now to generate code but it didn't finish the generation.
Another great video, thanks for letting us know about Gemini Exp 1114, have done a little bit of testing myself, seems very smart on a similar level to o1 Preview but more concise. Interestingly I had it generate multiple paragraphs on various topics and in every test it passed as 75% - 100% human on 8 different AI checkers. As for your banana question, it may have been reading that as bs (b******t) and not B's. Could be why it was giving an odd answer?
Thanks for the content. Comparing two models from the same company is challenging, especially when one builds on top of the other. Random questions may not reveal the new model’s strengths. I don't think Google is promoting this as a replacement for Gemini 1.5. The shorter token window suggests it's an intelligent model meant to complement Gemini 1.5, handling cases it cannot. Instead, I suggest testing it on hard problems like math and logic where Gemini 1.5 struggles would better show improvements. In production, we could use the new model for complex tasks, switching back to 1.5 as needed. Please try to collect some of these problems and test it. Appreciate.
LLMs won't able to generate new logics in code because they come from human mind. Generating something new will only possible for LLMs when it will be able to think like human mind exactly. Just think for sec all the code which LLMs are generating right now is already available on the internet, so the code which is not available on the internet if LLMs need to generate it LLMs must think like human beings like the fully developed programmer mind, otherwise it is not possible.
i really like gemini 1.5 pro with 1/2million context window also it is free which is really cool --- cant wait for long outputt to drop gemini 2 or upcoming models would push openai to drop their things cant wait whats next
Man I didn't like that banana response, like its jokey but there shouldn't be a personality built into any model unless developers want to give it some personality.
I use LLMs daily for coding assistance. Tried the experimental Gemini yesterday as a substitute and it was a mixed bag. For simple tasks it produced cleaner code than other LLMs, but for complex tasks it would greatly overcomplicate the code, and get stuck in suggestion loops when it does not compile.
If it ain't coding then it's useless.
well i dont care about coding so... its perfect
Solipsism is a helluvah drug
its already better than most coders. thats why everyone uses AI for coding.
But only claude is good at coding… gpt-4o is pretty crap. It makes up libraries like every 2 seconds. I actually would rather use Qwen 2.5 coder 32b.
Qwen better tan o1 and 3,5 sonnet in coding ?
Thank you for making us all aware of this
I also found a strange on AI studio this morning Gemini 1.5 pro latest was giving amazing answers. Probably not related but this is awesome.
for OpenAI's o1-preview I think the model temperature is fixed to 1, does google allow changing the temperature for this new model?
just tested in programming and it's much weaker. I think it's not using monte carlo search.
@@elawchess If it is not using search like o1-preview, then that is a bit worrying because any new breakthroughs after transformer unlikely to be made public, and 32k context length suggests it's a model fresh out of the oven, I hope it's not a breakthrough like the transformer was any other improvement is welcome.
The reason why it mentions 1 B and 0 b in bananas is the context. You started the conversation asking for jokes. Probably it's trying to be funny and hence this response and a WINK
Exactly, that's why I think the model gave a decent answer
testing the logic of a model by asking it to write a joke about a famous person is pretty useless.
Benchmarks don't mean anything. I don't trust benchmarks anymore. I have to use it to believe, especially Google. Tried once just now to generate code but it didn't finish the generation.
bs means bull-sh*t
Another great video, thanks for letting us know about Gemini Exp 1114, have done a little bit of testing myself, seems very smart on a similar level to o1 Preview but more concise. Interestingly I had it generate multiple paragraphs on various topics and in every test it passed as 75% - 100% human on 8 different AI checkers. As for your banana question, it may have been reading that as bs (b******t) and not B's. Could be why it was giving an odd answer?
Thanks for the content. Comparing two models from the same company is challenging, especially when one builds on top of the other. Random questions may not reveal the new model’s strengths. I don't think Google is promoting this as a replacement for Gemini 1.5. The shorter token window suggests it's an intelligent model meant to complement Gemini 1.5, handling cases it cannot. Instead, I suggest testing it on hard problems like math and logic where Gemini 1.5 struggles would better show improvements. In production, we could use the new model for complex tasks, switching back to 1.5 as needed. Please try to collect some of these problems and test it. Appreciate.
Didn’t have enough tokens to use it lol 😂 literally couldn’t understand the question . By time it understood I ran out of tokens 😂😂
Those jokes 😂. Do you have a video describing how these llmsys ratings are calculated? You know what they say about measures and targets
With Google’s track record, they probably just told them it’s best
I think the bananas was because the model think you are expecting it to be funny perhaps?
Oh the wink
Good point, it's best to clear the context before asking several separate questions since the model should assume that it's an ongoing discussion.
it seems this is google's answer to o1-preview
IT's not. On the coding it's much lower.
LLMs won't able to generate new logics in code because they come from human mind. Generating something new will only possible for LLMs when it will be able to think like human mind exactly.
Just think for sec all the code which LLMs are generating right now is already available on the internet, so the code which is not available on the internet if LLMs need to generate it LLMs must think like human beings like the fully developed programmer mind, otherwise it is not possible.
i really like gemini 1.5 pro with 1/2million context window also it is free which is really cool --- cant wait for long outputt to drop
gemini 2 or upcoming models would push openai to drop their things cant wait whats next
great content lately
Google FTW? Wait... What?
this should have been the title of this video
@@1littlecoderping me for more quirky and free ideas.
BTW.. huge fan!
when is 1400 elo model out?
let me know when it is #1 on coding and not on BS metrics. Still good video but I don't trust Google for shit
You don't trust the company that made this all possible, and gives you the best free API access to its models!
Man I didn't like that banana response, like its jokey but there shouldn't be a personality built into any model unless developers want to give it some personality.
It isn't that good