Sorry Sam - gemini-exp-1121 !!!

Поделиться
HTML-код
  • Опубликовано: 24 ноя 2024

Комментарии • 31

  • @1littlecoder
    @1littlecoder  2 дня назад

    New Video Generation AI model - ruclips.net/video/CwvN2Ccddgk/видео.html

  • @pixelperfectpravin
    @pixelperfectpravin 3 дня назад +21

    I have noticed this - openai just trys to overshadow attention once Google does something

  • @theJatak
    @theJatak 3 дня назад +3

    I think with smaller context window for now, is for letting people know about them, let them use it, give them the feedback. And then later they'll make it another pro model, like gemini 2 Pro. So this could be a testing (as the name already suggests) model, ready to be commercial very soon.
    just After 1 week, coming of another model means they are speeding their work on them.

  • @MattRodriguez-h7j
    @MattRodriguez-h7j 3 дня назад +5

    The more awesome thing is Google uses its own tpu chips. Sam and OpenAI will just burn the cashflow of msft in capex costs.

  • @idea_list
    @idea_list 3 дня назад +4

    Interestingly enough, Magnus is actually mentioned 3 times on that page (3rd time it's just 'Carlsen'). But, again, 3rd mention is not covered in the chunk of text copy&pasted by you in the prompt, it was later in sources. But I wonder if he was mentioned in that chunk as well the third time, just not by his name..? It should be possible to catch by llm probably. I'm too lazy to check though, so I'll just leave this thought here

    • @1littlecoder
      @1littlecoder  3 дня назад +1

      Woah. That's very interesting. Let me check it again.

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing 3 дня назад

    Really appreciate you covering this

  • @TheReferrer72
    @TheReferrer72 3 дня назад +1

    Chatbot Arena must be horrible broken if Claude models are not top in coding.

  • @dkgrinder
    @dkgrinder 3 дня назад

    The McKinsey comment is hilarious and so true

  • @AarreLisakki-s5e
    @AarreLisakki-s5e 3 дня назад +1

    That one result its not (tied for) leading on is not as you say "for style", its rather the leaderboard's attempt at _controlling_ for style, so that only substance counts -- so essentially exactly the reverse; its saying that when you discount idk human preferences for say greater length of response even when its not saying anything more or for way of its markdown use etc, gemini drops a rank to #2 in the overall category. Still pretty impressive ofc.
    This is how Chatbot Arena describes it in their blogpost about the criterion:
    " The goal here is to understand the effect of style vs substance on the Arena Score. Consider models A and B. Model A is great at producing code, factual and unbiased answers, etc., but it outputs short and terse responses. Model B is not so great on substance (e.g., correctness), but it outputs great markdown, and gives long, detailed, flowery responses. Which is better, model A, or model B?
    The answer is not one dimensional. Model A is better on substance, and Model B is better on style. Ideally, we would have a way of teasing apart this distinction: capturing how much of the model’s Arena Score is due to substance or style."

  • @Ann-yo5sb
    @Ann-yo5sb 3 дня назад +1

    Regularly follow your channel, keep it up. But do your thumbnails have to always feature Elon Musk, and Sam Altman when talking about their business. There are a lot of hard working people behind them that are building the companies.

  • @Zbezt
    @Zbezt 3 дня назад +1

    Gemini was "refined" into oblivion it has little to nothing to offer compared to other models i tried it myself and it was reduced to nothing more then a lazy persons prompt gadget sadly enough

  • @twobob
    @twobob 3 дня назад

    so, I could make a bot that runs my model locally and asks remotely and regardless of the "best" answer, simply choose the answer that matches my local output, and in this way skew the results to look like my model was best...
    And this is why clever people can't have nice things.

  • @Nick-h7f
    @Nick-h7f 3 дня назад +1

    Your Fav Indian chess player, apart from vishy ?

    • @1littlecoder
      @1littlecoder  2 дня назад

      Arjun Erigasi - mostly at this point! but nothing is so solid tbh, I used to root a lot to Nepo (RUS) but I think his time for championships are gone! Wesly is another favorite player - very humble!

  • @1voice4all
    @1voice4all День назад

    Arena votes are not really a good way to assess models. It's subjective.

  • @vaibhavgeek
    @vaibhavgeek 3 дня назад

    McKinsey employees - Mujhe kyu toda? 😂😂

  • @alx8439
    @alx8439 3 дня назад

    Gemini is leading by a margin of error, but nevertheless it is leading

  • @BrianMosleyUK
    @BrianMosleyUK 3 дня назад +2

    Try this prompt... Watch it fail dismally. Only o1-preview comes close to success.
    Find pairs of words where:
    1. The first and last letters of the first word are different from the first and last letters of the second word. For example, "TeacH" and "PeacE" are valid because:
    The first letters are "T" and "P" (different).
    The last letters are "H" and "E" (different).
    2. The central sequence of letters in both words is identical and unbroken. For example, the central sequence in "TeacH" and "PeacE" is "eac".
    3. The words should be meaningful and, where possible, evoke powerful, inspiring, or thought-provoking concepts. Focus on finding longer words for a more varied and extensive list.
    Examples
    1. Banged Danger
    2. Bated Gates
    3. Beached Reaches
    4. Belief Relied
    5. Blamed Flames
    6. Blamed Flamer
    7. Blazed Glazer
    8. Blended Slender
    9. Bolted Jolter
    10. Boned Toner
    11. Braced Traces
    12. Branded Grander
    13. Braved Craves
    14. Braved Graves
    15. Braver Craved
    16. Brushed Crusher
    17. Busted Luster
    18. Busted Muster
    19. Causes Paused
    20. Chased Phases
    21. Chaser Phased
    22. Cracked Tracker
    23. Craved Graves
    24. Crated Grates
    25. Creamy Dreams
    26. Created Greater
    27. Dared Bares
    28. Dancer Lanced
    29. Dreamed Creamer
    30. Fabled Tables
    31. Faith Baits
    32. Fallen Baller
    33. Favoured Savourer
    34. Famed Gamer
    35. Famed Cameo
    36. Fared Cares
    37. Fasten Master
    38. Fated Gates
    39. Faved Caves
    40. Feared Bearer
    41. Fiery Piers
    42. Fired Tires
    43. Flared Glares
    44. Flashed Clashes
    45. Flipped Slipper
    46. Foamed Roamer
    47. Folded Bolder
    48. Founder Sounded
    49. Gifted Lifter
    50. Gleaned Cleaner
    51. Graced Traces
    52. Hades Wader
    53. Hardened Gardener
    54. Hated Fates
    55. Laced Racer
    56. Laced Races
    57. Lasted Faster
    58. Leader Beaded
    59. Leaves Heaved
    60. Lighted Fighter
    61. Lives Given
    62. Manned Banner
    63. Mailer Sailed
    64. Mended Bender
    65. Missed Kisses
    66. Mounted Counter
    67. Moved Lover
    68. Named Games
    69. Paced Laces
    70. Paced Racer
    71. Paced Races
    72. Pained Gaines
    73. Painted Fainter
    74. Parched Marches
    75. Placed Glaces
    76. Plates Slated
    77. Popes Roped
    78. Races Faced
    79. Racer Laced
    80. Rarer Cares
    81. Rated Dates
    82. Raver Waves
    83. Rested Tester
    84. Saved Waver
    85. Seated Beater
    86. Sailer Wailed
    87. Sainted Painter
    88. Seeder Needed
    89. Slayer Played
    90. Tainted Painter
    91. Tamed Games
    92. Tailed Raider
    93. Teach Peace
    94. Tested Fester
    95. Tinker Linked
    96. Tired Siren
    97. Traced Graces
    98. Treated Greater
    99. Warmed Farmer
    100. Wasted Baster
    101. Watched Catcher

  • @AbuBakr1
    @AbuBakr1 3 дня назад

    Many of the new models now where simply trained to pass benchmark questions; for example, the new qwen 2.5 model, (which was once a favorite for coding) passed all the benchmark question, you will think its better than anthrophic's claude but its a complete trash when used in real life 😅

  • @alvarobyrne
    @alvarobyrne 3 дня назад

    attention is all you need

  • @Fatman305
    @Fatman305 3 дня назад +1

    Gave it a shot. Sucked as usual. Was hoping deep thinking with web access will give it the edge, but it sure didn't... 4o (not o1) did a much better job...

  • @TashiDorjeLinas
    @TashiDorjeLinas 3 дня назад +2

    Gemini is useless, it is so censored that it refuses to do things like hash a simple password.

    • @sgttomas
      @sgttomas 3 дня назад +2

      i uploaded an entire textbook into the context window and now i ask questions of it daily with useful results.

    • @MrKellvalami
      @MrKellvalami 3 дня назад +1

      me too. i use it daily in similar and other ways. sliders down and it does its job in a huge context.

  • @faaz12356
    @faaz12356 3 дня назад +1

    It's not that impressive at coding