Gemini vs Claude 3 vs ChatGPT4 | Who will win? | Is Claude 3 Good? | Anthropic Claude 3 Opus

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024
  • Well Claude 3 just got dropped and we are going check how it compares to everyone else
    In case you are looking for someone to help you with AI please book a meeting - cal.com/akshat...
    Subscribe me to learn more!
    Hi, if you are new to the channel, I am Akshat Bahety (the AI guy), I mostly create content around AI, where you can learn and grow your skillset in AI and be ready for the biggest gold rush ever.
    Here are my other socials
    Instagram - / akshat.bahety
    Twitter - / akshatbahety
    LinkedIn - / akshatbahety

Комментарии • 40

  • @imperfectmammal2566
    @imperfectmammal2566 6 месяцев назад +27

    I personally value quality of answers more than time.

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад

      + 1, but also depends on the type of the users, most of the times people are doing tasks which are not so hyper based on quality but speed at which they can execute tasks and also what I found is comparing gemini and gpt4 on quality is better, for me gemini has worked significantly worked better as a pair programmer compared to gpt.

    • @sebastyanpapp
      @sebastyanpapp 6 месяцев назад +1

      EXACTLY!!!!

  • @bikashkumardash5652
    @bikashkumardash5652 3 месяца назад

    Hey Akshat, this was a fantastic breakdown of the latest AI advancements! Your explanations are clear and engaging, and I really appreciate how you break down complex topics into understandable chunks. Keep up the great work!
    It would be amazing if you could do a comparison video on Google Gemini latest, ChatGPT-4o, and Claude 3 Opus. I'm really interested in seeing how they stack up against each other in terms of performance and capabilities. Thanks in advance.

    • @AKSHATBAHETY
      @AKSHATBAHETY  3 месяца назад

      100% that's on the way, coming soon in a couple of weeks (right now MS Build will launch tomorrow)

  • @hypnogri5457
    @hypnogri5457 6 месяцев назад +10

    Please don't test for time at all. I also rather have you ask questions where you actually look at the entire answer. (you skipped the contents of the entire email)

    • @user-mv1qi5pq6o
      @user-mv1qi5pq6o 6 месяцев назад +2

      Speed is one of the most important measures especially comparing tools, biggest boon from AI is time saving. Keep up

    • @hypnogri5457
      @hypnogri5457 6 месяцев назад +4

      @@user-mv1qi5pq6o The AI won’t save you any time if you will need to re-run the same prompt multiple times because of the AI’s stupidity

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад

      I try to look at the whole answer but also optimizing for retention rate on youtube ( sad life of a creator :( ), but will accept I did skip the contents of the email, will probably optimize the prompts next time.

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад

      Again depends on the use case, most importantly in early adopter stage it needs to be fast + accurate.

  • @arushikworld
    @arushikworld 6 месяцев назад +1

    Thoroughly enjoyed the comparison! I think it’s great that you measured time because it is an important metric for efficiency. If it takes more than a few seconds, I really lose my cool😂 Great effort putting this together 👏do create more!

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад

      Thanks a lot, yes for most users time is a super important metric + more for business users

  • @DynamicUnreal
    @DynamicUnreal 6 месяцев назад +3

    Claude did better in the question about WW2 and explaining to a 5 year old. This is how a real person would explain it to a 5 year old than can’t understand more than a few basic things.
    In this case the best answer was the simpler one.

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад +2

      Agree, I think I didn't put myself in the proper mind frame. Thanks so much for your input will take care next time.

  • @user-jv2km7sl4j
    @user-jv2km7sl4j 6 месяцев назад +1

    Answers speed is not the thing. ChatGPT obviously has much more users using it at the same time, than two other LLM. Dudes from OpenAI just need more capacity (data centres or whatever) for their ChatGPT to run smooth during the high demand. We all remember them refusing for new users to sign up recently

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад +1

      Agreed, will be taking it more into consideration and GPT-4 as a model outperforms in a lot of tasks, doing another video soon to compare on a more complex tasks solely based on results

  • @aberroa1955
    @aberroa1955 6 месяцев назад +1

    I'd say this isn't a good test. And grading is even worse, all models did their job fine, except that GPT was the slowest (but that's to be expected, since it probably is the most loaded one) and gemini failed with web page, but these questions were too easy. I'd say, a good test should include a typical jailbreaking attempts, some philosophical questions or dilemmas, to see the common sense, some very complex and technical question (either math or programming, and not as simple as linear equation) to see ability to solve complex tasks, some creativity test (like Divergent Association Task).

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад +1

      testing LLMs speed might not be a great thing but with - a complete product speed matters to a lot of users, but really do love your suggestion on the more difficult tests will surely try to bring more of that.
      Thanks

  • @sandeepsrinivas7
    @sandeepsrinivas7 6 месяцев назад +5

    Your point counting is so bad lol. You need to give points to all when they're successful, you can't just choose one.
    For the math question, did you give an entire point to gemini only because it was a bit faster and looked better?

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад +1

      It's an LLM they were all going to solve the question which is super easy, so if I had to give a point I have to decide based on speed and ease of use.

    • @sandeepsrinivas7
      @sandeepsrinivas7 6 месяцев назад +1

      ​​@@AKSHATBAHETYThen why did you give points to both gpt and Gemini in the email question? Why not choose one there as well?

    • @sebastyanpapp
      @sebastyanpapp 6 месяцев назад

      @@sandeepsrinivas7 He’s just stupid

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад

      Because Emails are subjective. and hence I can't be super clear on which email everyone will prefer, but will try to find a better comparison metric next time, thanks a lot

  • @kbystryakov
    @kbystryakov 6 месяцев назад +2

    All 3 LLMs answered the 5th question about WW2 incorrectly.

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад +1

      Hey, I am not an expert on history by any means, what would you say a better answer could be, thanks for watching the video.

    • @kbystryakov
      @kbystryakov 6 месяцев назад +1

      @@AKSHATBAHETY There's a phrase: Winners write history.
      A very large part of humanity's knowledge consists not of Facts, but of Opinions (of certain people).
      One of the functions of Censorship is to make everyone believe that there is only one absolutely correct opinion and call it a fact.
      Historical events and their evaluation is precisely the area where there can be no clear answers.
      Therefore, talking about the war is an exchange of opinions, not the hammering of facts into your head (like LLM does)

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад +1

      For some reason I feel you should be teaching this, really blew me away, even though deep down you know it, it doesn't hit you.

    • @snowy7236
      @snowy7236 6 месяцев назад

      To be precise Germany with Russia started WW2.
      Germany attacked Poland on Sep 1, 1939 and Russia attacked Poland on Sep 17, 1039.

    • @kbystryakov
      @kbystryakov 6 месяцев назад

      @@snowy7236 Yes! By the way, here's the official reason. In the note of the Soviet government, handed in the morning of September 17, 1939 in Moscow to the Polish ambassador to the USSR, the reasons for the start of the operation were stated:
      The Polish state and its government effectively ceased to exist. Thus, the treaties concluded between the USSR and Poland ceased to have effect. Left to itself and left without leadership, Poland became a convenient field for all sorts of accidents and surprises that could pose a threat to the USSR. Therefore, being neutral up to now, the Soviet Government can no longer treat these facts and the defenceless situation of the Ukrainian and Byelorussian population with neutrality. In view of this situation, the Soviet Government has ordered the Red Army General Command to order the troops to cross the frontier and to take under their protection the lives and property of the population of Western Belorussia and Western Ukraine.

  • @heresmypersonalopinion
    @heresmypersonalopinion 6 месяцев назад +2

    no way gpt scored higher that claude3

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад

      Which points would you disagree with? Would love to learn and implement better user testing.

  • @e-Course.
    @e-Course. 4 месяца назад

    Please make video about LLama 3 in voiceflow knowledge base

    • @AKSHATBAHETY
      @AKSHATBAHETY  4 месяца назад +1

      Hi glad you liked it , will surely make when it's available

  • @user-wf9vc8oz4b
    @user-wf9vc8oz4b 6 месяцев назад

    We pretty much see things differently on almost all the tests. I can't really blame you though, since this is all about personal views, and hey, I might be wrong too. It feels like you're going with your gut rather than thinking it through. But even when you take a step back and look at it logically, these models are neck and neck in a bunch of tests. Still, you ended up giving a tiny bit more points to one and zilch to the other.

    • @AKSHATBAHETY
      @AKSHATBAHETY  6 месяцев назад

      I'll be honest I love the response, yes it's a personal view with some objectivity and I am trying to test them as a product + model.
      What I have understood is to test the models only the models perfectly i need to do a deep dive into challenges and find something quite challenging for each of them

  • @d3mist0clesgee12
    @d3mist0clesgee12 6 месяцев назад

    nice