NEW Gemini 2 Flash Thinking 0121 | First Impressions (Coding vs DeepSeek R1)

Поделиться
HTML-код
  • Опубликовано: 1 фев 2025

Комментарии • 32

  • @georgezorbas9036
    @georgezorbas9036 10 дней назад +3

    Bravo...seems always you have something really useful

  • @ragibhasan2.0
    @ragibhasan2.0 10 дней назад +7

    This Open-source model is truly revolutionary!🔆🔆

    • @MarvijoSoftware
      @MarvijoSoftware  9 дней назад +2

      @@ragibhasan2.0 it's not Open Source

    • @ragibhasan2.0
      @ragibhasan2.0 9 дней назад +3

      ​@@MarvijoSoftware i mean deepseek r1
      🤗

  • @JayBallentine
    @JayBallentine 10 дней назад +4

    It’s suuuuuch a relief to see your videos in feed, regarding something I’m interested in, and know instantly I WILL NOT be disappointed. RUclips is a long game and the cream always rises my good man. 💪

    • @MarvijoSoftware
      @MarvijoSoftware  10 дней назад +1

      @@JayBallentine 🙏🏾 I appreciate you my good man!

  • @anandkanade9500
    @anandkanade9500 10 дней назад +2

    its so satisfying to see spreadsheet of performance at end , 👍

  • @TheBuzzati
    @TheBuzzati 10 дней назад +1

    Appreciate your videos. Thanks for the comparisons and insights

    • @MarvijoSoftware
      @MarvijoSoftware  10 дней назад

      @@TheBuzzati I appreciate your viewing 🙏🏾

  • @MrParad0x
    @MrParad0x 10 дней назад +13

    #1 my ass. I tested it with 10+ prompts and it hallucinated a lot. It would suddenly stop generating response in AI studio. As of today, it's quite buggy and unreliable. I wouldn't recommend using this.

    • @MarvijoSoftware
      @MarvijoSoftware  10 дней назад +12

      @@MrParad0x yep, I don't trust LM Arena in the slightest, especially after it ranked the weak o1-mini so high for coding

  • @moidrugag
    @moidrugag 4 дня назад +1

    Can you compare OpenHands with the agents you've already tested? I’d like to see how it stacks up against them using DeepSeek V3 or R1.

  • @sLoNcE
    @sLoNcE 10 дней назад +1

    Thank you

  • @ninjaxae
    @ninjaxae 9 дней назад +2

    Please Use this voice in every video , and also please Do Cursor(Deepseek-R1) vs Windsurf(Sonnet 3.5) video .

    • @MarvijoSoftware
      @MarvijoSoftware  9 дней назад

      Alright, I'll queue it up. The problem is that Cursor + R1 don't support Composer. Cursor vs Windsurf: ruclips.net/video/duLRNDa-CR0/видео.html

  • @Arron_Mottram
    @Arron_Mottram 9 дней назад +3

    I don't trust the chatbot arena, they put Claude 3.5 Sonnet in 11th place

    • @MarvijoSoftware
      @MarvijoSoftware  9 дней назад +1

      I also don't trust it, but I get it. It's not Devs who vote for coding tasks, and people just cast random votes. That's why I believe actual benchmarks are needed, like we do on the channel, and what Aider does. The problem with Aider benchmarks is that LLMs can train on them because they are public

  • @gemini_537
    @gemini_537 10 дней назад +1

    ❤ Gemini 2.0

  • @andrewandreas5795
    @andrewandreas5795 10 дней назад +2

    Thanks for the nice video. Please make a comparison of Roocline vs Aider vs Cursor

    • @MarvijoSoftware
      @MarvijoSoftware  10 дней назад +1

      @@andrewandreas5795 A Roo-Cline video is incoming soon, after the R1 Architect video in a larger codebase

  • @mlsterlous
    @mlsterlous 10 дней назад +2

    How the f its number one on arena? Like one or two days after release? I had to wait a lot longer to see phi4 in that list.

    • @MarvijoSoftware
      @MarvijoSoftware  10 дней назад

      I asked myself the same question after it was just released! So quick, who voted? Bots? Something might be off

  • @ilyass-alami
    @ilyass-alami 9 дней назад +4

    No ,deepseek R1 is number one, it batter then new Gemini 2 Flash thinking

    • @MarvijoSoftware
      @MarvijoSoftware  9 дней назад +1

      I agree. That's the LMArena leaderboard which is based on random people voting

  • @abdusalamolamide
    @abdusalamolamide 10 дней назад +4

    lol.. these aren't three Rs Gemini..😂😂

  • @Dom-zy1qy
    @Dom-zy1qy 10 дней назад +1

    I think the new gemini models have been tuned pretty well in terms of human preference (at least I like the newer models more than their older ones).
    Claude imo is usually #1, then everyone else is about the same. However from my usage of the model, it seems like gemini 2.0 exp 1206's responses get pretty bad/mediocre after 60k tokens of context.

    • @MarvijoSoftware
      @MarvijoSoftware  10 дней назад

      All models get dumber as tokens increase. Also, they start to output random characters at a certain context