There is A NEW KING!!!

Поделиться
HTML-код
  • Опубликовано: 14 ноя 2024

Комментарии • 47

  • @Roenbaeck
    @Roenbaeck 5 часов назад +2

    I use LLMs daily for coding assistance. Tried the experimental Gemini yesterday as a substitute and it was a mixed bag. For simple tasks it produced cleaner code than other LLMs, but for complex tasks it would greatly overcomplicate the code, and get stuck in suggestion loops when it does not compile.

  • @Macorelppa
    @Macorelppa 19 часов назад +39

    If it ain't coding then it's useless.

    • @Kaoru8168
      @Kaoru8168 19 часов назад +2

      well i dont care about coding so... its perfect

    • @CaptainSnackbeard
      @CaptainSnackbeard 18 часов назад

      Solipsism is a helluvah drug

    • @Cat-vs7rc
      @Cat-vs7rc 16 часов назад

      its already better than most coders. thats why everyone uses AI for coding.

    • @Nick_With_A_Stick
      @Nick_With_A_Stick 15 часов назад +2

      But only claude is good at coding… gpt-4o is pretty crap. It makes up libraries like every 2 seconds. I actually would rather use Qwen 2.5 coder 32b.

    • @samuelgarcia1802
      @samuelgarcia1802 14 часов назад

      Qwen better tan o1 and 3,5 sonnet in coding ?

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing 8 часов назад +1

    Thank you for making us all aware of this

  • @freddiechipres
    @freddiechipres 15 часов назад +1

    I also found a strange on AI studio this morning Gemini 1.5 pro latest was giving amazing answers. Probably not related but this is awesome.

  • @leeme179
    @leeme179 18 часов назад +2

    for OpenAI's o1-preview I think the model temperature is fixed to 1, does google allow changing the temperature for this new model?

    • @elawchess
      @elawchess 18 часов назад +2

      just tested in programming and it's much weaker. I think it's not using monte carlo search.

    • @leeme179
      @leeme179 18 часов назад +2

      @@elawchess If it is not using search like o1-preview, then that is a bit worrying because any new breakthroughs after transformer unlikely to be made public, and 32k context length suggests it's a model fresh out of the oven, I hope it's not a breakthrough like the transformer was any other improvement is welcome.

  • @annamalainarayanan1192
    @annamalainarayanan1192 12 часов назад +2

    The reason why it mentions 1 B and 0 b in bananas is the context. You started the conversation asking for jokes. Probably it's trying to be funny and hence this response and a WINK

    • @ammadali5799
      @ammadali5799 2 часа назад +2

      Exactly, that's why I think the model gave a decent answer

  • @aaronpaulina
    @aaronpaulina 17 часов назад +3

    testing the logic of a model by asking it to write a joke about a famous person is pretty useless.

  • @Cingku
    @Cingku 11 часов назад

    Benchmarks don't mean anything. I don't trust benchmarks anymore. I have to use it to believe, especially Google. Tried once just now to generate code but it didn't finish the generation.

  • @atypocrat1779
    @atypocrat1779 18 часов назад +4

    bs means bull-sh*t

  • @Serifinity
    @Serifinity 15 часов назад

    Another great video, thanks for letting us know about Gemini Exp 1114, have done a little bit of testing myself, seems very smart on a similar level to o1 Preview but more concise. Interestingly I had it generate multiple paragraphs on various topics and in every test it passed as 75% - 100% human on 8 different AI checkers. As for your banana question, it may have been reading that as bs (b******t) and not B's. Could be why it was giving an odd answer?

  • @unclecode
    @unclecode 13 часов назад

    Thanks for the content. Comparing two models from the same company is challenging, especially when one builds on top of the other. Random questions may not reveal the new model’s strengths. I don't think Google is promoting this as a replacement for Gemini 1.5. The shorter token window suggests it's an intelligent model meant to complement Gemini 1.5, handling cases it cannot. Instead, I suggest testing it on hard problems like math and logic where Gemini 1.5 struggles would better show improvements. In production, we could use the new model for complex tasks, switching back to 1.5 as needed. Please try to collect some of these problems and test it. Appreciate.

  • @trojanhell7639
    @trojanhell7639 16 часов назад

    Didn’t have enough tokens to use it lol 😂 literally couldn’t understand the question . By time it understood I ran out of tokens 😂😂

  • @d_b_
    @d_b_ 19 часов назад +1

    Those jokes 😂. Do you have a video describing how these llmsys ratings are calculated? You know what they say about measures and targets

    • @zacboyles1396
      @zacboyles1396 16 часов назад

      With Google’s track record, they probably just told them it’s best

  • @maninzn
    @maninzn 19 часов назад +5

    I think the bananas was because the model think you are expecting it to be funny perhaps?

    • @1littlecoder
      @1littlecoder  19 часов назад +1

      Oh the wink

    • @supercurioTube
      @supercurioTube 18 часов назад +3

      Good point, it's best to clear the context before asking several separate questions since the model should assume that it's an ongoing discussion.

  • @leeme179
    @leeme179 18 часов назад +3

    it seems this is google's answer to o1-preview

    • @elawchess
      @elawchess 18 часов назад

      IT's not. On the coding it's much lower.

  • @dogenrinzai6699
    @dogenrinzai6699 2 часа назад

    LLMs won't able to generate new logics in code because they come from human mind. Generating something new will only possible for LLMs when it will be able to think like human mind exactly.
    Just think for sec all the code which LLMs are generating right now is already available on the internet, so the code which is not available on the internet if LLMs need to generate it LLMs must think like human beings like the fully developed programmer mind, otherwise it is not possible.

  • @MichealScott24
    @MichealScott24 16 часов назад

    i really like gemini 1.5 pro with 1/2million context window also it is free which is really cool --- cant wait for long outputt to drop
    gemini 2 or upcoming models would push openai to drop their things cant wait whats next

  • @mort-ai
    @mort-ai 14 часов назад

    great content lately

  • @antrikshtewari
    @antrikshtewari 19 часов назад +1

    Google FTW? Wait... What?

    • @1littlecoder
      @1littlecoder  19 часов назад +1

      this should have been the title of this video

    • @antrikshtewari
      @antrikshtewari 19 часов назад

      ​@@1littlecoderping me for more quirky and free ideas.
      BTW.. huge fan!

  • @zen1tsu-sam
    @zen1tsu-sam 12 часов назад

    when is 1400 elo model out?

  • @augmentos
    @augmentos 12 часов назад +1

    let me know when it is #1 on coding and not on BS metrics. Still good video but I don't trust Google for shit

    • @TheReferrer72
      @TheReferrer72 3 часа назад

      You don't trust the company that made this all possible, and gives you the best free API access to its models!

  • @FactsNoCare
    @FactsNoCare 18 часов назад

    Man I didn't like that banana response, like its jokey but there shouldn't be a personality built into any model unless developers want to give it some personality.

  • @MoFields
    @MoFields 18 часов назад

    It isn't that good