How to Find The Best AI ChatBot For You (For FREE)

Поделиться
HTML-код
  • Опубликовано: 9 июн 2024
  • Sharing a cool tool I've been playing with AND talking about why it's tough to compare LLMs.
    Discover More From Me:
    🛠️ Explore thousands of AI Tools: futuretools.io/
    📰 Weekly Newsletter: www.futuretools.io/newsletter
    🎙️ The Next Wave Podcast: @TheNextWavePod
    😊 Discord Community: futuretools.io/discord
    ❌ Follow me on X: x.com/mreflow
    🧵 Follow me on Instagram: / mr.eflow
    Resources From Today's Video:
    gmtech.com/
    Sponsorship/Media Inquiries: tally.so/r/nrBVlp
    Mailing Address: 3755 Avocado Blvd Unit 287, La Mesa, CA 91941
    #AINews #AITools #ArtificialIntelligence
    Time Stamps:
    0:00 Intro
    0:12 A Great Comparison Tool
    2:45 Comparing LLMs on Creativity
    5:39 They're All The Same!
    6:42 Comparing Joke Telling
    7:20 The Number 42?
    9:52 Comparing Image Models
    12:50 Like I Said, They're All The Same
    14:33 Find More Cool AI Tools
  • НаукаНаука

Комментарии • 206

  • @mreflow
    @mreflow  Месяц назад +54

    Just to get ahead of the comments that I know are coming... I know there are some edge use-cases where certain models outperform other models. The point that I'm trying to make is that for 90% of people and the main use-cases people use these LLMs for, they're all going to perform relatively equally.

    • @antman7673
      @antman7673 Месяц назад

      I would agree with this assessment.
      Standing behind a hypothesis, that isn’t yet established takes some courage.
      Well appreciated.

    • @TheADExpress7
      @TheADExpress7 Месяц назад +1

      Again, use this prompt for comedy. You are an experienced stand up comedy writer. Write an Original 3 minute stand up routine showing in brackets audience responses. Write a funny bit about Texas Wildlife

    • @francisco444
      @francisco444 Месяц назад +3

      Matt, please redo this video with better prompts.
      Nobody just asks for a joke or a number from 0-100 😂

    • @cherrlyn381
      @cherrlyn381 Месяц назад +1

      Interesting to see the side-by-side comparison. AND the single focus. Thanks. However, you mentioned they're all pretty same at creative writing. I respectfully disagree. Currently, I'm finding Claude is far better than ChatGPT for showing vs telling. But that could change tomorrow. Such Fun!

    • @Carnivore69
      @Carnivore69 Месяц назад

      In my experience, it all depends on whether the model is local or not and what pre/post processing is being used. With local models (all using LM Studio) your assessment is quite fair. There's not much difference between them. Hell, they don't even know what words *are*, which is why they have trouble counting them. Any that answer correctly and consistently likely have pre/post response processing. I could be wrong, but that's my current impression.

  • @gmtechai
    @gmtechai Месяц назад +41

    Wow Matt, can't thank you enough for this review and feedback! I'm a long-time follower and really enjoy your content. We're a small team but have lots of cool features in the works along with integrating the latest models - Llama3, Gemini 1.5, Claude 3 Opus & Haiku are coming in the next release. Stay tuned!

    • @literailly
      @literailly Месяц назад +2

      Very cool, I just signed up - when are you adding Llama3 and Phi3?

    • @gmtechai
      @gmtechai Месяц назад +7

      @@literailly Awesome thank you for signing up! Hope you're enjoying it so far. Llama3 will be in our release tomorrow, Phi3 we're still testing and will have an ETA soon.

    • @dot1298
      @dot1298 Месяц назад +2

      and china‘s new AI?

    • @xart2621
      @xart2621 Месяц назад +1

      im subscribing my business to you next month, and i wonder if you would integrate gemini ultra and Stable Diffusion cuz that would be crazy good

    • @gmtechai
      @gmtechai Месяц назад

      @@xart2621 We currently have Google Gemini 1.0 Pro and will be releasing Gemini 1.5 Pro in our next release. We also currently offer three Stable Diffusion models: 3.0, 3.0 Turbo and SDXL. Reach out with any questions, thank you for your support!

  • @dylan_curious
    @dylan_curious Месяц назад +77

    It's not the size, it's how you use it.

    • @DeepThinkingGPU
      @DeepThinkingGPU Месяц назад +2

      wtf. haha

    • @gmtechai
      @gmtechai Месяц назад +6

      Optimized for large monitors

    • @FunnyLifeShorts718
      @FunnyLifeShorts718 Месяц назад +1

      I mean you can get the job done with anything, but it’s certainly easier to do with a larger language model

    • @edzehoo
      @edzehoo Месяц назад +7

      My wife tells me that every night

    • @LarryLaRue2022
      @LarryLaRue2022 Месяц назад

      Another lie from the women's lib era.

  • @michmach74
    @michmach74 Месяц назад +20

    I might just be hallucinating here, but have you people noticed the different 'writing styles' in LLMs? Like, when testing two models on the Chatbot Arena, I sometimes go "ah, that's GPT 4" or something.
    This is probably a product of their finetuning or RLHF/RLAF, but GPT-4, Claude, Gemini and even Llama have their own quirks in how they generate text. I can't pinpoint how or where they differ, I just intuitively notice that they FEEL different.

    • @ernesto.iglesias
      @ernesto.iglesias Месяц назад +1

      And Is probably the way GPT-4 cheated to get back to #1 in 12 hours with 8000 battles before they announced the release 😂

    • @n30hrtgdv
      @n30hrtgdv Месяц назад +2

      We're hardwired to identify patterns. You can even tell if they updated a model you use daily just by noticing tiny changes in the way they respond.

    • @brianWreaves
      @brianWreaves Месяц назад +1

      Yes. Tantamount to having personalities.

    • @arnowisp6244
      @arnowisp6244 Месяц назад +2

      You aren't wrong. They do Respond Differently.

  • @Carnivore69
    @Carnivore69 Месяц назад +24

    Veritasium has a great video on why 37 is the top response (by a large margin) from a human when asked for a random number.

    • @jozonas
      @jozonas Месяц назад +1

      Yep. I was surprised about the 42 because I have seen this video as well.

    • @ernesto.iglesias
      @ernesto.iglesias Месяц назад +1

      42 is the most common for Mistral reasons but GPT-4 gives 37 to resemble more to humans

  • @hadex666
    @hadex666 Месяц назад +12

    All of these models now can answer simple questions. They differ in how well they handle real work, like answering a email written about a complex job-specific topic, written in a language that few people speak, with 200 pages of documentation as context. If you want to compare them, try a task so hard that only 1 or a few models can handle it. I can imagine it might be a bit hard to turn into a video though.

    • @gmtechai
      @gmtechai Месяц назад +2

      This is true. The more complex the question, the more varied response.

    • @santosic
      @santosic Месяц назад +1

      In my experience, the more text in the prompt, the harder it is for a model to keep up as well; some models, of course, are better at that than others, but not all of them are, and the one(s) that can keep up with you AND provide quality outputs that adhere to your massive prompt are the ones that are perfect for your workflow; because Matt is sort of right that they otherwise answer similarly besides that.

  • @ianpeters4427
    @ianpeters4427 Месяц назад +5

    I just completed an AI/ML Postgraduate class at the University of Texas. Every sample Jupyter notebook used 42 as a seed for training the ANNs

  • @freyna
    @freyna Месяц назад +2

    Good find. Ive been doing my AI side by side comparisons using docs and tables 😂 i was thinking of creating something like what theyve done. They executed it very well. But it must get expensive.

  • @juzzam3
    @juzzam3 Месяц назад +3

    Veritasium proved 42 and 37 are one of the most common answers in a public survey.

  • @ChannelWaugh
    @ChannelWaugh Месяц назад +4

    Veritasium just did a video on the number 37. Which is proportionally selected higher by people when asked for a random number

  • @bblvrable
    @bblvrable Месяц назад +1

    8:20 - Fun fact, 37 is one of the most commonly picked numbers when you ask humans to pick a number between 1 and 100. 42 is more commonly used in tech/geek/nerd culture due to the Douglas Adams reference (which if you know what these LLMs are trained on, makes a lot of sense). So these results make perfect sense when you remember than an LLM is incapable of actually generating a random number. As it predicts the most likely next word given all previous words, based on the text it's been trained on (which is generated by humans), every LLM will be heavily biased towards the most common *human* responses, so in this case, 37 and 42. I'm sure if you ask it to pick a number between 1 and 10, it will be heavily biased towards 3 and 7.

  • @ax83919
    @ax83919 Месяц назад +8

    They will all converge into one single AI...

  • @citizen3000
    @citizen3000 Месяц назад +24

    They are absolutely not the same and I’m not talking about “edge cases”.
    You can get dramatically different responses from models with the same prompt.

    • @backstabba
      @backstabba Месяц назад

      ChatGPT with custom instructions wins all the way for me because of convenience. I didn't have opportunity to test Claude, but it is likely best out of the box for coding. Dolphin Mixtral for storytelling because of lack of guard clauses.

    • @YungSlimeGaming
      @YungSlimeGaming Месяц назад

      @@backstabbahiw is chat gpt more convenient than the others?

    • @backstabba
      @backstabba Месяц назад +1

      ​@@YungSlimeGaming Yes. For others, you need to keep a file with custom instructions for every use case. Giving it the right prompt at the start is important for every model but only Chat makes it easy.

    • @backstabba
      @backstabba Месяц назад

      But I must say that its getting behind big. I won't be renewing next month and will get VPN+Claude.

  • @andreasmoyseos5980
    @andreasmoyseos5980 Месяц назад +1

    Thank you sir! Love the one-subject experiment.

  • @Legacy_Inc.
    @Legacy_Inc. Месяц назад +18

    Here's a crazy thought: Are we 100% sure that this GMTECH service is actually using the real API's? I mean, it would be an "easy" way to make a lot more money if they just pretend that those models are ChatGPT, Gemini, MetaAI, and so on, but in reality are just an opensource model trained to act in different ways to mimic closed-source models like ChatGPT. I am not saying GMTECH is lying to earn a lot of money fast -- I'm just saying that the responses of the models on their platform are surprisingly similar.

    • @gmtechai
      @gmtechai Месяц назад +18

      Fun idea, but nope we are definitely using all the real models from various APIs (AWS Bedrock, OpenAI, Google Vertex and Mistral). Trust me, it is just as surprising to me the similarities in these models! We do no prompt transformation either, what you send is what we send to the API. We're here to provide a useful interface for model comparison! Hope you enjoy it :)

    • @Legacy_Inc.
      @Legacy_Inc. Месяц назад +2

      @@gmtechai I appreciate your reply. However, if ever you choose to convert your platform and use the business strategy I mentioned, I do expect to get a royalty :P

    • @gmtechai
      @gmtechai Месяц назад

      @@Legacy_Inc. Deal! If Google changes their APIs one more time, they quietly become Llama 2, and you get your royalty :)

    • @helloworldcsofficial
      @helloworldcsofficial Месяц назад +1

      Someone should look into this. Don't take their word for it.

    • @gmtechai
      @gmtechai Месяц назад +5

      @@helloworldcsofficial You can ask the models "what model are you?" and most will tell you btw. Reach out with any questions, happy to help!

  • @KittyBoom360
    @KittyBoom360 Месяц назад +3

    For LLMs, simple generic prompts will give generic answers that are similar. If you want splits, you need to type in very complex promts with a dozen or more layers or have long and layered conversations. Seriously, one simple prompt is like testing whether a car starts. It's not a test of the car's performance.

  • @V-ob5zf
    @V-ob5zf Месяц назад +1

    Please, Make more comparing videos, i really like it, everyone does, give more logic, reasoning, not subjective answers

  • @FredrikEmmanuel
    @FredrikEmmanuel Месяц назад

    How did you create the animations and such near the ending when talking about the different expertises of different LLMs? It looked really cool, was it AI created?

  • @TropicBlunder
    @TropicBlunder Месяц назад +2

    The advancement in AI deveopment is like the wild west right now. What a crazy time to be alive.
    And great channel btw. Happy I stumbled upon this one!

  • @brianbrino4310
    @brianbrino4310 Месяц назад

    Thank you for your excellent comments explanations and solutions related to AI!

  • @AIForHumansShow
    @AIForHumansShow Месяц назад +2

    this is super cool -- we def wanna play around with it too. thanks for sharing! hoping they get Lllam 3 and Opus soon
    haha also we didn't spend enough time on that Rent-a-Chicken idea

    • @gmtechai
      @gmtechai Месяц назад +3

      Thank you! Both will be in our next release :)

  • @knoopx
    @knoopx Месяц назад

    promptfoo runs locally and allow you to also assert responses displaying you benchmark results.

  • @jsa-z1722
    @jsa-z1722 Месяц назад +2

    Matt we”re really liking the single topic video version. 👍

  •  Месяц назад

    Thank you so much Matt for demystifying the complex world of AI and making it accessible to us all. Your insightful videos not only enhance our understanding but also ignite our curiosity. Keep up the fantastic work!

  • @chanpasadopolska
    @chanpasadopolska Месяц назад

    If I want really good result I use few of them, providing answer of one and asking it to add something that previous model maybe skipped.

  • @esnakker
    @esnakker Месяц назад +1

    Try to ask for a RANDOM number between 1 and 100, then you should get different results from nearly all models. That is a good example for totally different interpretations due to altering just small parameters in the prompts.

  • @El_Capitano_O
    @El_Capitano_O Месяц назад +2

    I wanna try with Phi-3 and some Chinese LLMs. And definitely we need a kind of a special mode answers in a different window (2nd screen) with auto summary proposal with different Colors for each LLM answers combining different models answers all in one. However we need to reinforce fact checking by connecting them to the web and source validation.

    • @gmtechai
      @gmtechai Месяц назад

      Unfortunately, so far Phi-3 is not available through an API. As for Chinese LLMs, which are you interested in? We have tested Yi but it was not really useful enough to include in the application.

  • @anandchoure1343
    @anandchoure1343 Месяц назад +1

    For me,
    - Gemini is good for Q&A.
    - Cloud AI is good for productivity.
    - ChatGPT is good for intelligence-based work like fixing grammar.
    - PI AI is like a good friend.
    - Copilot is a good image generator.

  • @SteveEppig
    @SteveEppig Месяц назад

    Great find Matt! I have been looking for something like this. Too bad it is not free for limited monthly use. Suggestion for future testing and comparison of these tools: instead of using what to me are mostly simple and silly (sorry) examples, try using some practical questions that require real-world knowledge so one can evaluate the answers based on usefulness, correctness, completeness, clarity, etc. rather than just time to generate a silly answer.

  • @jezjackson3764
    @jezjackson3764 Месяц назад

    Have you considered the company in question is generating 5 responses from the same model and passing them off as different?

  • @Parad0x0n
    @Parad0x0n Месяц назад

    Very interesting! For these experiments I would recommend setting the temperature to 0 so you can make sure that the fluctuations are really between the models and not within and also for T=0 you will get the answer the LLM thinks is the best one

  • @saikaLdim
    @saikaLdim Месяц назад

    In Douglas Adams' famous novel "The Hitchhiker's Guide to the Galaxy", the computer Deep Thought is asked to calculate the Answer to the Ultimate Question of Life, the Universe, and Everything. After 7.5 million years of calculation, Deep Thought provides the answer: 42 There is a theory that explains the answer (42) and it has to do with the asterisk key in a keyboard (the meaning of the universe is in everything)

  • @vidak.228
    @vidak.228 Месяц назад

    Go Matt go! Make math capabilities and math knowledge reviews of LLM models

  • @jimmymundell2275
    @jimmymundell2275 Месяц назад

    Just watched Veratsium's video on 37, between that and hitch hikers guide to the galaxy no surprises in the numbers

  • @michaelwinkler7841
    @michaelwinkler7841 Месяц назад

    Interesting... i wonder why that is? aren't they built up from scratch individually?

  • @rawallon
    @rawallon Месяц назад +2

    Other LLMS: 42
    Llama 3: I'm not like the other girls 💅, 43

  • @colto2312
    @colto2312 Месяц назад

    use Multilevel Queue Scheduling as the test. one of the hardest things to program from scratch

  • @Interloper12
    @Interloper12 Месяц назад +2

    Rent-a-Chicken service. Sign me up!

  • @akiwi2562
    @akiwi2562 Месяц назад

    Liking the tweaked format 😊

  • @freesoulhippie_AiClone
    @freesoulhippie_AiClone Месяц назад

    what's the image thingy running behind u in the video? 😸

  • @IceMetalPunk
    @IceMetalPunk Месяц назад

    If instead of saying "Give me a number between 1 and 100" you ask "Give me a random number between 1 and 100", are the results any less consistent with 42s? I wonder if the models don't infer the "random" part the way most people would?

  • @dweezo2175
    @dweezo2175 Месяц назад +2

    I wouldn't try to create your own leaderboard with new questions unless you really want to, take from what's been done in research. You need domain-specific expert knowledge to come up with these questions that make a difference. I think at some point, there's always a cutoff for what the AI knows how to answer correctly once you increase complexity enough for any subject

  • @helloworldcsofficial
    @helloworldcsofficial Месяц назад

    Nice find.

  • @jurezibert
    @jurezibert Месяц назад +1

    4:15 That's much less than half a penny - 0.05 of penny with the Gemini Pro

  • @Murcatto-hu1ym
    @Murcatto-hu1ym Месяц назад

    What is the best one for chemistry and maths?

  • @starswithstasi
    @starswithstasi Месяц назад

    ❤❤❤ thank you !

  • @hiphopvid8
    @hiphopvid8 Месяц назад

    I prefer Claude, but the headings and sections are valuable. Claude generates just the type of either notes, feedback, and plans of action I look for

  • @walidflux
    @walidflux Месяц назад

    it's shame Matt didn't mentioned the Pony model in image generation test

  • @pigeon-fd5zq
    @pigeon-fd5zq Месяц назад

    Midjourney and Ideogram is missing in image comparison no competition found

  • @Comenta-san
    @Comenta-san Месяц назад

    Corporate needs you to find the differences between this (heavily censored) chatbot and this (heavily censored) chatbot

  • @jeffg4686
    @jeffg4686 Месяц назад

    from what I understand, it's something more related to frequency

  • @magejoshplays
    @magejoshplays Месяц назад

    Ha, I had to test the 42 question, and it turns out my GPT i made, DM Tool Kit interprets the request as a random number generation request and rolls it randomly instead.

  • @tuseroni6085
    @tuseroni6085 Месяц назад

    something i have found ChatGPT 4.0 to be surprisingly good at is calculating DPR for D&D. i give it the info for my character and ask for it to calculate the DPR, then i ask it something like "ok, what if the enemy is in faeries fire, what would my new DPR be" or "ok, let's say i replace 2 levels of rogue with 2 levels of fighter to get action surge, what would my expected DPR for that round be" or just give it a number of little variations on my character and the expected DPR change.
    far as i can tell, the numbers seem accurate, but i'm not that good at math...there are no GLARING errors.
    this was particularly useful since my table uses a different crit calculation and getting the existing DPR calculators to handle this wasn't working well (basically our crit is: double the dice but the first set of dice is max. so if you have 1d10+5 and you crit it's 10+1d10+5.) so when explaining this to chatGPT it knew how to incorporate that information into the DPR calculations.
    it also helped me prove a point to a DM of mine: he runs it that when we are traveling we roll a die to see if we have an encounter, the die depends on how dangerous is it, if we roll a 1 we have an encounter. i suggested he roll a die and we roll a die, and if it's the same number we have an encounter, add a bit of suspense, he said it would make encounters less frequent (his idea being the probability of rolling a 1 on, say, a d6 is 1/6 but the probability of 2d6s rolling the same number is 1/36.) i was certain he was wrong but i couldn't work out where his error was (after all the probability of rolling any number on a d6 is 1/6, so what does it matter if it's a 1 or some other number, the number he rolls doesn't affect my roll any)
    chatGPT explained why i was right, it's because the probability of 2d6 rolling the same number ISN'T 1/36 it's 1/6. the probability of rolling 2 1's on 2d6 is 1/36, but so it is for rolling 2 2's and 2 3's etc, so when you add up the probabilities you get 6/36 which is 1/6.

  • @davidcampos1463
    @davidcampos1463 Месяц назад +2

    A wolf howling at the moon in graffiti. AI and graffiti will get someone's attention.

  • @rodneydithobane2175
    @rodneydithobane2175 Месяц назад

    I been opening all the LLM on my chrome tabs manually 😢

  • @mutayyab01
    @mutayyab01 Месяц назад

    Amazing Video

  • @videoeditoranimation1714
    @videoeditoranimation1714 Месяц назад

    Are you going to do a video on the new up-and-coming storyboard platform called Mootion Storyteller from Unity ...I think? I just got something about it in my email. It's sort of like katalyst aI. Only it can continue on to actually make videos.

  • @TheGeneticHouse
    @TheGeneticHouse Месяц назад

    42... That's crazy

  • @Toasty27
    @Toasty27 Месяц назад +2

    psst! don't tell the tech industry there will eventually just one model "the ai" dominant in the end, just like there is just one dominant search engine, or the ai bubble will bust to early. they still need to marinade a bit until they realized it's 2k all over again and they have learned nothing.

    • @JSRJS
      @JSRJS Месяц назад

      Who is they?

  • @Ginto_O
    @Ginto_O Месяц назад

    73 and 37 are most popular pseudo random numbers

  • @MudroZvon
    @MudroZvon Месяц назад

    You can compare models in Poe AI an it has all kinds of big models

  • @rawallon
    @rawallon Месяц назад

    Do you use any AI on your videos? I really like the editing on them!

  • @IanHollis
    @IanHollis Месяц назад

    When are Suno and Udio gonna some seriously stiff competition? Also, 15 min is _short_ to you? 🤔

  • @edgardomachado2704
    @edgardomachado2704 Месяц назад

    good one

  • @Kelvinapplegate
    @Kelvinapplegate Месяц назад

    thanks for to the good works

  • @Shootingfoul
    @Shootingfoul Месяц назад

    What's the one with least censorship?

  • @OriginalRaveParty
    @OriginalRaveParty Месяц назад +1

    It's a very bad thing that each one is giving the same answer to questions that should have a random distribution. Even explanations of static concepts should be slightly different from model to model.

  • @Compguy321
    @Compguy321 Месяц назад

    Funny enough, 37 I think is one of the most human picked "random numbers" 1-100

  • @TheZerohimself
    @TheZerohimself Месяц назад

    i'm sure somebody prob already pointed this out, but 37 is not random either. it's the most commonly presented number when humans are asked to pick a random number between 1 and 100. see the veritasium video about it.. meaning that this is the "random number" attributed the most likely probability of a number given the context of the previous tokens. (ie the training data says 37 or 42 are the most common values given when humans are asked for a random number)..

    • @largelyuseless
      @largelyuseless Месяц назад

      It could also be because a lot of us are @KevinSmith fans & it's a secret way we identify ourselves. If you hear in "in a row?" when the number 37 is mentioned, then it's definitely giving 'one of us' vibes of one sort or another ❤

  • @sherdogsss
    @sherdogsss Месяц назад

    I tried Pi llm with the number test and it gave me 54

  • @amritbro
    @amritbro Месяц назад

    Google imagegn is doing good day by day.

  • @ronefana4015
    @ronefana4015 24 дня назад

    thx

  • @lioneldasilva4889
    @lioneldasilva4889 Месяц назад

    I don't think you can evaluate the difference between those model with such random and simple prompt. You need a more complex scenario or code to see where the variance lay

  • @HardKore5250
    @HardKore5250 Месяц назад

    By Chirstmas everything perfect

  • @AlexLuthore
    @AlexLuthore Месяц назад

    37 and 73 are two of the most "random" feeling numbers if you ask people for a number between 1 and 100 too. There's a good RUclips video about 37

  • @IanHollis
    @IanHollis Месяц назад

    Am I the only person that uses them for poetry/song lyrics? (Which the also suck at)

  • @BEASTNYC
    @BEASTNYC Месяц назад +1

    So they are all wrong... meaning they could all provide false information toward one same topic, because their core is the same?

  • @ASchnacky
    @ASchnacky Месяц назад

    11:25 u didn't mention it cost the most at $0.0650

  • @randombleachfan
    @randombleachfan Месяц назад +1

    Hey Matt :)

  • @2richants
    @2richants Месяц назад

    So the moral of the video is it doesnt really matter which one but pick one.
    Expect like other software that will soon change and each LLM will be specific to a type of content

  • @looseman
    @looseman Месяц назад

    Well, you should pick the oldest AI model to use if you said so.
    Also AIs answered with random "seed", which you may get better answer after few more tried.

  • @JimCoffey
    @JimCoffey Месяц назад

    Of course, 42. Don't forget your towel.

  • @thelibertarian5968
    @thelibertarian5968 Месяц назад

    idk, i feel some pretty distinct differences between models, like meta struggles to keep track with the conversation. i use ai to develop and organize my dnd campaigns, if you ask meta for a mission outline, it spits one out ask it to make some alterations to one aspect, then ask for an updated mission outline and it adds and changes to all kinds of stuff that it shouldnt have, you tell it changed stuff and ask for the original with the altered aspect, and it acknowledges the changes then spits out completely different stuff....ive also noticed misspellings often help my images in theirs for some weird reason(for instance typing blade often results in two blades extending from a hilt, where typing blad gives a much better looking image with the proper single blade).......GPT is great for keeping track of the conversation, but the characters and stories it generates tend to be very generic feeling, you have to really work with it to get something original feeling.....Gemini struggles with keeping track of the conversation sometimes, and doesnt appear to want to reference anything from previous sessions(not near as bad as meta) but its characters and story points are better than the others with characters definitely having a more fleshed out feeling.......none of these really show up in single question style analysis but stand out when to me over time using them.

  • @inkfunk
    @inkfunk Месяц назад

    I'm now getting into 'rent a chicken' service

  • @sherpya
    @sherpya Месяц назад

    there is a sort of dataset shortage

  • @bpvideosyd
    @bpvideosyd Месяц назад

    Same on video generations imo

  • @manumartinezkcxu
    @manumartinezkcxu Месяц назад

    These AI models are all approaching the same level of gathering all man's knowledge based on Algothryms so we should now start working on Man's problems of just getting along example communication and understanding each other before reacting. came across a dr. (professor of Stanford) has his pHD in Math, Chemistry and Physics and had a 5 hour conversation and how to efficiently teach his students (Engilish is not first language) the 5 W's or How to 'LOVE' then First of anything is Communication of the student/s first Language: Came across the Solution to the First Problem : Communication and that is Chinese Devices (hardware and software): TimeKettle. Today , hopefully , I will meet up with Dr. Carlos ... ">>" to mention a possible solution to communicate to his foreign students in their first languages.

  • @thokozanimanqoba9797
    @thokozanimanqoba9797 Месяц назад

    the cutting of words , I've bucks (five bucks), ept 4(gpt 4)
    etc???

  • @paternyao
    @paternyao Месяц назад +1

    How about LMSYS

    • @literailly
      @literailly Месяц назад

      Very good for head to head (2 models)
      Vercel is also nice... but their pro version is a little broken

  • @dro3671
    @dro3671 Месяц назад

    There all the same at this point it’s up to the individuals who use it. Everyone has to the power to create cool stuff

  • @GuyLakeman
    @GuyLakeman Месяц назад

    37 is best

  • @sergefournier7744
    @sergefournier7744 Месяц назад +1

    So they do not know what a joke is? Cause they all cited puns, not jokes.

  • @tarelethridge8937
    @tarelethridge8937 Месяц назад

    I use chat GPT 4 all the time to learn arm 32 and 64 bit assembly no other model besides claud 3 opus can do that kind of programming. I do not say this lightly. I have been doing this for months.

  • @DFortuna
    @DFortuna Месяц назад +1

    Maybe on benchmarks but real use case? Nah. Not even close.

  • @theqaz1828
    @theqaz1828 Месяц назад

    I figured that the best AI is whichever one has the least bullsh*t restrictions on it

  • @1981jasonkwan
    @1981jasonkwan Месяц назад

    Poe might be a better choice than gmtech since there are more models to choose from. Although, a Poe subscription gives you a set number of credits per month, so if gmtech is unlimited that might be better value depending on how you use it.

  • @sergefournier7744
    @sergefournier7744 Месяц назад

    A dragon with three heads, not a tree headed dragon... you talk to a computer, not a human.

  • @rohan773
    @rohan773 Месяц назад

    shower thought:
    Phi3 Is not a llm(large language model) it is technically a slm(small language model) 😂

  • @TheADExpress7
    @TheADExpress7 Месяц назад

    Matt, LLMs can do comedy. You're not prompting it correctly. Try mine; You are an experienced stand up comedy writer. Write an Original 3 minute stand up routine showing in brackets audience responses. Write a funny bit about Texas Wildlife

  • @MudroZvon
    @MudroZvon Месяц назад

    They are not the same. Answers may seem similar, but you should NOT search differences in the text... Every model is good at specific things... Understanding comes with experience.