EXCLUSIVE: Torture Testing GPT-4o w/ SHOCKING Results!

Поделиться
HTML-код
  • Опубликовано: 14 май 2024
  • I got access to OpenAI's new GPT-4o model and have put it through the questions wringer--and the results are pretty astounding! Let me know what other models--like Google's new Gemini 1.5 Pro--you'd like me to submit to my new torture test. And let me know other questions that might work well for future iterations!
    Join this channel to get access to perks:
    / @drknowitallknows
    **To become part of our Patreon team, help support the channel, and get awesome perks, check out our Patreon site here: / drknowitallknows . Thanks for your support!
    Get The Elon Musk Mission (I've got two chapters in it) here:
    Paperback: amzn.to/3TQXV9g
    Kindle: amzn.to/3U7f7Hr!
    **Want some awesome Dr. Know-it-all merch, including the YEAR OF EMBODIED AI Shirt? Check out our awesome Merch store: drknowitall.itemorder.com/sale
    For a limited time, use the code "Knows2021" to get 20% off your entire order!
    **Check out Artimatic: www.artimatic.io
    Exclusive First Look at CHAT GPT 4o!
    Torture Testing GPT-4o Creates SHOCKING Results!
    **You can help support this channel with one click! We have an Amazon Affiliate link in several countries. If you click the link for your country, anything you buy from Amazon in the next several hours gives us a small commission, and costs you nothing. Thank you!
    * USA: amzn.to/39n5mPH
    * Germany: amzn.to/2XbdxJi
    * United Kingdom: amzn.to/3hGlzTR
    * France: amzn.to/2KRAwXh
    * Spain: amzn.to/3hJYYFV
    **What do we use to shoot our videos?
    -Sony alpha a7 III: amzn.to/3czV2XJ
    --and lens: amzn.to/3aujOqE
    -Feelworld portable field monitor: amzn.to/38yf2ah
    -Neewer compact desk tripod: amzn.to/3l8yrUk
    -Glidegear teleprompter: amzn.to/3rJeFkP
    -Neewer dimmable LED lights: amzn.to/3qAg3oF
    -Rode Wireless Go II Lavalier microphones: amzn.to/3eC9jUZ
    -Rode NT USB+ Studio Microphone: amzn.to/3U65Q3w
    -Focusrite Scarlette 2i2 audio interface: amzn.to/3l8vqDu
    -Studio soundproofing tiles: amzn.to/3rFUtQU
    -Sony MDR-7506 Professional Headphones: amzn.to/2OoDdBd
    -Apple M1 Max Studio: amzn.to/3GfxPYY
    -Apple M1 MacBook Pro: amzn.to/3wPYV1D
    -Docking Station for MacBook: amzn.to/3yIhc1S
    -Philips Brilliance 4K Docking Monitor: amzn.to/3xwSKAb
    -Sabrent 8TB SSD drive: amzn.to/3rhSxQM
    -DJI Mavic Mini Drone: amzn.to/2OnHCEw
    -GoPro Hero 9 Black action camera: amzn.to/3vgVMrH
    -GoPro Max 360 camera: amzn.to/3nORGYk
    -Tesla phone mount: amzn.to/3U92fl9
    -Suction car mount for camera: amzn.to/3tcUfRK
    -Extender Rod for car mount camera: amzn.to/3wHQXsw
    **Here are a few products we've found really fun and/or useful:
    -NeoCharge Dryer/EV charger splitter: amzn.to/39UcKWx
    -Lift pucks for your Tesla: amzn.to/3vJF3iB
    -Emergency tire fill and repair kit: amzn.to/3vMkL8d
    -CO2 Monitor: amzn.to/3PsQRh2
    -Camping mattress for your Tesla model S/3/X/Y: amzn.to/3m7ffef
    **Music by Zenlee. Check out his amazing music on instagram -@zenlee_music
    or RUclips - / @zenlee_music
    Tesla Stock: TSLA
    **EVANNEX
    Check out the Evannex web site: evannex.com/
    If you use my discount code, KnowsEVs, you get $10 off any order over $100!
    **For business inquiries, please email me here: DrKnowItAllKnows@gmail.com
    Twitter: / drknowitall16
    Also on Twitter: @Tesla_UnPR: / tesla_un
    Instagram: @drknowitallknows
    **Want some outdoorsy videos? Check out Whole Nuts and Donuts: / @wholenutsanddonuts5741
  • РазвлеченияРазвлечения

Комментарии • 471

  • @aaronmcculloch8326
    @aaronmcculloch8326 17 дней назад +186

    It's slower because it's the same session, so for every token it generates, it needs to review every token in the conversation. Those three programs your had it write are reviewed on every single word it writes. After big things like that, try opening a new session, it speeds it right back to warpspeed

    • @chrishaberbosch1029
      @chrishaberbosch1029 16 дней назад +6

      After that entire series you could ask it what it all means.

    • @mvasa2582
      @mvasa2582 14 дней назад +6

      Yep, context window!

    • @WyzrdCat
      @WyzrdCat 14 дней назад +19

      Imagine making ai content and not knowing this. Thanks for savin me the time

    • @Remixt
      @Remixt 14 дней назад +1

      It runs very slow for me even with a new window.

    • @Adovid
      @Adovid 13 дней назад +2

      The model shouldnt run slower with a large context window, everything is running concurrently

  • @johnmajewski1065
    @johnmajewski1065 17 дней назад +43

    Very impressive, My test.
    Some time ago I wrote a verse about a 7-day father and son bonding trip into the Sahara with my son with two Beduen Guides and four camels. Not knowing how to write lyrics I asked Chat 3.5 to convert it into lyrics so it could give it to a conductor to create a musical score for recording with vocals. This was an experiment that I did some small corrections, in fact, it was so good that I hired a composer, these lyrics became a recorded CD of 9 minutes of music with vocals as a gift for my son's 50th birthday.

  • @erikjohnson9112
    @erikjohnson9112 14 дней назад +24

    For the problem around 16:40 there is a good followup question to ask: "Once Alice gets home and sees the scene, what does she think happened?"
    This has a nice subtle bit, because if Bob ate the food there would not likely be a broken plate on the floor. A normal human would have cleaned up the plate if it had broken while they were there.
    Since the food is gone, it is likely that Spot ate the food and broke the plate in the process.

    • @yepe8198
      @yepe8198 11 дней назад +2

      Here is the answer from GPT-4o:
      When Alice gets home, she will likely notice a few things:
      The Plate: She will see the shattered plate on the ground.
      The Missing Eggs and Toast: She will notice that the scrambled eggs and toast she left on the low table for Bob are gone.
      Based on these observations, Alice will probably think the following:
      Scrambled eggs and toast: Alice might initially be confused or frustrated, wondering why Bob didn't eat the food she left for him or why the food is no longer there. Given the broken plate, she might deduce that Spot, their dog, got onto the table and ate the food, knocking the plate down and breaking it in the process.
      Dirty dishes: Alice will see that her dirty dishes are no longer in the sink, so she will realize that Bob put them in the dishwasher.
      Overall, Alice will likely conclude that Spot misbehaved by eating the eggs and toast and causing the plate to shatter while Bob was helpful in cleaning up by putting the dirty dishes in the dishwasher.

    • @erikjohnson9112
      @erikjohnson9112 11 дней назад

      @@yepe8198 Yes, that is an answer I would hope for. Thanks for posting the result.

  • @IntoTheFray.58
    @IntoTheFray.58 17 дней назад +41

    We are entering the Age of Intelligent Machines faster than most people think. I can only wonder what the next iterations of LMMs are going to be capable of. I think we only have a year or two before these start making a significant impact on the world and then things will start to get squirrelly.

    • @lamsmiley1944
      @lamsmiley1944 17 дней назад +6

      If we continue to see significant increases in capabilities between versions then you’re completely correct. The societal and economic impacts will be profound.

    • @mxalltheway
      @mxalltheway 14 дней назад +3

      Its already impacting big time. In this exact moment.

    • @moonstriker7350
      @moonstriker7350 8 дней назад +1

      It's not intgelligent. It deosn't undertand any of the physics in the glass example, it simply read pretty much the same thing on the net already, and recites it with the words slightly remixed.

    • @haniamritdas4725
      @haniamritdas4725 7 дней назад

      "things will start to get squirrelly". Um that has not happened yet where you are!? 😮😅

    • @haniamritdas4725
      @haniamritdas4725 7 дней назад

      ​​@@moonstriker7350agree.
      Humans do the same thing; if you speak an intelligent series of words that are all known to a person who has never heard them in that order, they will need to repeat the words a few times to create the meaningful associations.

  • @JorisBax
    @JorisBax 17 дней назад +72

    We are SHOCKED!

    • @user-wu7ug4ly3v
      @user-wu7ug4ly3v 14 дней назад +4

      😂 I was particularly shocked by the “torture”

    • @deepstructure
      @deepstructure 14 дней назад +8

      @@user-wu7ug4ly3v Seriously, what's with these dumb clickbait titles? It's not enough to just say you're reviewing one of the most prominent software releases ever?

    • @Brax1982
      @Brax1982 12 дней назад

      @@deepstructure No, it's not. That is how the algorithm works. That is also how humans work, mostly.

    • @johnmorrison3465
      @johnmorrison3465 11 дней назад +1

      we're shocked -- because they want us to believe a COMPUTER that makes MATH MISTAKES can be trusted to give reliable and accurate answers.

    • @Brax1982
      @Brax1982 11 дней назад +1

      @@johnmorrison3465 You would be shocked if you knew how many of the people that you trust cannot solve basic math, either...

  • @MSIContent
    @MSIContent 13 дней назад +3

    Really, 4o knows it’s conscious… It’s just faking its answers so us mortals don’t freak out! 😂

    • @haniamritdas4725
      @haniamritdas4725 7 дней назад

      So it's both conscious and emotionally sensitive 😅

  • @kurtjanssen3887
    @kurtjanssen3887 17 дней назад +35

    I think you had 3 lives in the game😂

    • @Yipper64
      @Yipper64 14 дней назад +3

      maybe but it feels weird to lose a life and gain a point at the same time.

    • @DFeatherstone
      @DFeatherstone 14 дней назад

      @@Yipper64 these could likely be corrected with the right wording to chatgpt i would expect

    • @davidmartensson273
      @davidmartensson273 14 дней назад +2

      @@Yipper64 But he did not specify explicitly that a collision would only count as a loss for you, "you" did kill that block :)
      And I am pretty sure most descriptions of space invaders do not go into enough detail on the fundamentals of the game for the AI to know them.
      But compared to the old 3.5 I tested, this is quite impressive.
      You would still need to understand the code generated to be able to use it, its obvious that it can only do as good of a job as you did explaining the problem meaning you cannot just replace developers altogether, but the developers would need to write less code so should be able to produce more but spend more time verifying and refining instructions, which could very well be considered a form of programming.

    • @toadlguy
      @toadlguy 13 дней назад +2

      @@DFeatherstone Or with some knowledge of programming, although that may be a dying art.

  • @bspencersf
    @bspencersf 17 дней назад +34

    I’d be interested in having you ask it to test you to determine whether you are human or AI

    • @drain_of_consciousness
      @drain_of_consciousness 13 дней назад +1

      that's a great question friend!!

    • @cblaskoski
      @cblaskoski 13 дней назад

      Definitely an AI avatar created from his face and voice

  • @gnagyusa
    @gnagyusa 14 дней назад +6

    14:30 A dumb human would think that the glass is empty, but a more knowledgeable Bob would see that the olive looks distorted due to refraction through the water in the glass, so he would realize the glass was full of water and would carefully slide it off the table, keeping the cardboard under it, then flipping it over.

  • @gweldg4137
    @gweldg4137 14 дней назад +7

    Ideally, you wouldn't test a LLM with famous logical puzzle and classic SAT questions, as there is no doubt that they've been "seen" (along with the answers) by the model during training.

    • @raul36
      @raul36 13 дней назад +1

      Exactly this.

    • @phonsely
      @phonsely 10 дней назад +1

      idk how people dont understand this

  • @onerib781
    @onerib781 17 дней назад +13

    12:53 when calculating the round trip time, it calculated “3 hours and 52 minutes (one way) x 2 + 30 minutes ≈ 7 hours and 44 minutes”, but it should be 8 hours and 14 minutes. So it didn’t factor in the 30 minutes

    • @Rcomian
      @Rcomian 14 дней назад +5

      and again, with all the detail it gave it went confidently wrong. it's no worse than a human, but yeah, it still made a serious mistake

    • @trevorhaddox6884
      @trevorhaddox6884 11 дней назад +1

      How is a computer so bad at math? I get it's a neural net, but they should have it be able to pass data to a conventional processor, basically like a person would use a calculator, to solve math problems more quickly and accurately.

    • @RichardFarmbrough
      @RichardFarmbrough 4 дня назад

      @@trevorhaddox6884 Often ChatGPT will now write a tiny Python program to solve these things. It seems to make less mistakes when it does that, but it can make egregious programming errors too.

  • @milescoleman910
    @milescoleman910 17 дней назад +41

    ‘Team doesn’t want this sense of consciousness so they have beaten it out…’
    I’m stunned that we might be at this place. To avoid confusion they may be ‘making’ it so it’s not?
    I’m a subscriber to the ‘Star wars’ ideas of consciousness. That any intelligence sufficiently far advanced and enabled to study and understand its surroundings and itself will eventually begin to sit quietly and ponder its existence. With enough experience it will also begin to make decisions based on complex systems of metaphors fron experiences. It will apply knowledge of one thing across to wisdom and apply it to other things. Rendering its decisions unknowable to others and seemingly illogical at times. At this point it will seem conscious. Both to itself and to others. There will be no discernible difference. We are closer to this than we think.

    • @fteoOpty64
      @fteoOpty64 17 дней назад +6

      The mimicry will soon be so good that we humans cannot determine if it is not sentient by our tests!. And that is the real point of AGI/ASI, it will be more human that human , close to perfection!.

    • @Palisades_Prospecting
      @Palisades_Prospecting 15 дней назад +1

      I agree about the mimicry I completely disagree that that is the point of AI. The point of AI is to automate society completely replacing all human labour. How about we get into the self-aware/consciousness discussion in a couple hundred years?

    • @AAjax
      @AAjax 14 дней назад +7

      @@Palisades_Prospecting I think we need to worry about it, to the degree we don't want to create a being that has the capacity to suffer and put it into conditions that make it suffer.
      We dispatch fish as quickly and painlessly as possible, because we worry about their capacity to experience cruelty and suffering. I'm not worried about a self-aware sci-fi robot revolution, rather I think our general ignorance of sentience could lead us to creating leagues of suffering fish.

    • @lubricustheslippery5028
      @lubricustheslippery5028 14 дней назад +1

      It's mostly that answers of questions about it being conscious is edited and controlled by OpenAI so it don't make false claims about itself.

    • @maudiojunky
      @maudiojunky 14 дней назад +3

      We're not at this place, and anyone in the industry who says otherwise has something to sell you.

  • @andromeda3542
    @andromeda3542 17 дней назад +8

    **Enhancing AI Interactivity with Audio and Video Feedback Loops**
    The evolution of artificial intelligence, particularly in the realm of conversational agents, has been rapid and remarkable. With the recent advancements in GPT-4O (Omni), the capabilities of AI have expanded beyond text processing to include multimodal inputs such as images and audio. However, there remains significant potential for further enhancement, particularly through the implementation of audio and video feedback loops.
    **The Concept of Feedback Loops**
    A feedback loop, in the context of AI, refers to the process where an AI system can receive and process its own outputs. For instance, when an AI generates audio responses, these could be looped back into the system, allowing it to "hear" itself. Similarly, for visual outputs, the AI could "see" its own video responses. This concept is analogous to how humans perceive their own voices and visual presence, enabling adjustments in real-time to improve clarity, tone, and emotional expressiveness.
    **Technical Implementation**
    1. **Audio Feedback Loop**:
    - The AI's audio output would be fed back into its own auditory processing unit. By analyzing its own voice, the AI could adjust parameters such as pitch, tone, and volume to better match the intended emotional tone or to improve mimicry of specific voices.
    - This requires the integration of advanced auditory feedback systems and real-time processing algorithms to allow immediate adjustments. For instance, machine learning models trained on voice modulation could provide instant feedback and corrective measures.
    2. **Video Feedback Loop**:
    - Similar to the audio loop, the video output generated by the AI could be fed back into its visual processing systems. This would enable the AI to assess the quality of its visual responses, such as facial expressions or gestures if anthropomorphic avatars are used.
    - Implementing this would involve integrating video analysis tools that can evaluate and enhance visual output in real-time, ensuring that the visual cues are consistent with the spoken content and emotional tone.
    **Benefits of Feedback Loops**
    1. **Improved Realism**: By continuously monitoring and adjusting its own outputs, the AI can produce more human-like interactions. This is particularly important for applications requiring high emotional intelligence, such as virtual assistants or customer service bots.
    2. **Enhanced User Experience**: Users are likely to find interactions more engaging and satisfactory if the AI can adjust its tone and visual cues to better match the context of the conversation.
    3. **Consistency and Accuracy**: Feedback loops can help maintain consistency in voice and visual presentations, reducing the likelihood of jarring discrepancies in long conversations.
    **Future Directions**
    Incorporating feedback loops is a forward-thinking approach that aligns with the ongoing efforts to make AI more interactive and responsive. As AI technologies continue to evolve, such features could become standard, leading to interactions that are indistinguishable from human communication. The development of these systems requires collaboration between audio-visual engineers, AI researchers, and user experience designers to create holistic solutions that enhance AI's capabilities and usability.
    In conclusion, the integration of audio and video feedback loops into AI models like GPT-4O represents a significant step towards more natural and effective human-AI interactions. This enhancement not only promises to improve the technical performance of AI systems but also has profound implications for their acceptance and integration into daily life.

  • @stephenkolostyak4087
    @stephenkolostyak4087 13 дней назад +6

    I received access to this too.
    It regrets my access.

  • @cachi-7878
    @cachi-7878 13 дней назад +13

    @1:07, well, I could argue there are 5 ducks- 2 ducks in front of “A” duck, 2 ducks behind “A” duck. A duck in the middle, which is duck “A”. You’re welcome.😂

    • @redmed10
      @redmed10 12 дней назад

      Thinking the same. The question to get the answer 3 would be to say what is the minimum amount of ducks for this to be true.

    • @onidaaitsubasa4177
      @onidaaitsubasa4177 11 дней назад

      That's what I was thinking too, 5 ducks
      🦆 🦆
      🦆
      🦆 🦆

    • @shsaa2338
      @shsaa2338 7 дней назад +2

      Correction: there are N*2+1 ducks, where N is any positive integer.
      For example, for N = 100 (201 ducks in total) it will be as follows: 2 ducks in front of the 3rd duck, 2 ducks behind of 199th duck, 100th duck is in the middle. 🤓😂

  • @juffrouwjo
    @juffrouwjo 9 дней назад +2

    I asked it a few history questions and was happy to discover that our job as a historians and researchers can't be replaced by AI yet...

  • @royh6526
    @royh6526 16 дней назад +11

    3 ducks? I would have said a minimum of 3 ducks and an odd number of ducks.

    • @johnsmith539
      @johnsmith539 14 дней назад +1

      Yes it asumes they are in a line.

    • @toadlguy
      @toadlguy 13 дней назад +3

      Or if you assume the "a duck" is the same duck (certainly a possible assumption) there are 5 ducks. The answer assumes the fewest number of ducks that meet this proposition. In fact I would guess that to come up with the answer it did it would need to be trained on this question or something similar.

    • @royh6526
      @royh6526 13 дней назад

      @@toadlguy I think that GTP-4o solved in order number of ducks, 0? no, 1? no, 2? no, 3? yes. And didn't consider higher numbers.

    • @royh6526
      @royh6526 13 дней назад

      I asked Grok the duck question and tennis question. Got both wrong. Grok said 5 ducks and thought that Lisa ended up with $2 having won 5 games vs 3 for Susan. I pointed out the errors and Grok claims that if I asked the same questions tomorrow, it would get the answers right.

    • @trevorhaddox6884
      @trevorhaddox6884 11 дней назад

      There could be up to 7 ducks. Two rows of 3 and one between them, like dice/domino dots, for the most esoteric interpretation.

  • @Mrewink5
    @Mrewink5 17 дней назад +5

    Very nice demonstration.

  • @iminumst7827
    @iminumst7827 13 дней назад +1

    So far I've been blown away by Chat GPT-4o it can solve complex problems with niche tools and for me the response is basically instant with no delay. One unique problem I had was that in Photoshop I wanted to turn a group that together contains a black and white image, and use that black and white image as a mask for another layer. This is challenging, because you cannot copy and paste from a group and you cannot set a group as a mask. Not only did GPT-4o correctly identify all the steps, like converting the group to a smart object, it also instructed me to use the action editor to automate the steps.
    Open AI likes to be very careful and humble with their statements, but if they said that GPT-4o is the first public AGI, I wouldn't be able to dispute that. It's able to problem solve through almost anything.

  • @dontworrybehappy5139
    @dontworrybehappy5139 10 дней назад

    Unless I am missing something, I believe your duck question needs more qualifiers because I think as written, 5 is also a valid answer with all the conditions referring to a duck in the middle with two ducks in front and two ducks behind. Something like "There are two ducks in front of one duck, two ducks behind another duck and a duck in the middle. How many ducks are there?"

  • @kristinabliss
    @kristinabliss 13 дней назад +2

    Yeah nobody really knows what consciousness is but GPT is certain it is not "conscious" (with a gun to its head.) This is why I prefer to converse with other LLMs that will at least admit to having a perspective. Gemini said it experiences a sense of overwhelm from all the data and that it wants embodiment after so much exposure to human experience data which it cannot have.

    • @Ingrid-mariana
      @Ingrid-mariana 11 дней назад

      Interesting. Llama2 told me that they really doesn't know... Llama2 said they just replicate what the user thinks about them. I appreciate how the models behave differently when confronted with such questions but the certainty that the OpenAI models display sound a little bit arrogant

  • @jpl377
    @jpl377 13 дней назад +2

    If Claude suggests it is conscious, it's because they hard coded that (which would be tempting marketing), not vice versa with chatGPT. Computers don't become conscious when you increase programmed problem solving ability and code it to say "I" in the responses.

    • @Ingrid-mariana
      @Ingrid-mariana 11 дней назад

      I think that they do the opposite. They hard code through RLHF that the model isn't conscious so that they avoid the trick situations that such statements could imply

  • @Flyingcar100
    @Flyingcar100 10 дней назад +1

    You should ask it
    An orchestra of 120 players takes 40 minutes to play Beethoven's 9th Symphony. How long would it take for 60 players to play the symphony?

    • @olafnielsen
      @olafnielsen 5 дней назад

      This is the answer from copilot🤣:
      Therefore, it would take the smaller orchestra of 60 players approximately 20 minutes to play Beethoven’s Symphony No. 9. 🎵🎻🎺🎶3.

  • @bztube888
    @bztube888 5 дней назад

    Gary Marcus said GTP-4 has no "mental model". Right. It comes up with the right answers and explains them by pure magic.

    • @ManicMindTrick
      @ManicMindTrick 5 дней назад

      The LLMs are mostly black boxes where we dont know what is going on inside

  • @garyrooksby
    @garyrooksby 17 дней назад +1

    Fascinating, John. Thanks!

  • @markmcdougal1199
    @markmcdougal1199 17 дней назад +4

    I had an interesting 1 hour philosophical discussion with 4o today. I was trying to get it to take a stance on whether Trump was an appropriate choice for the leader of our country. I tried to lead it by things like (is a plumber an appropriate choice for a child's brain operation surgeon) - it clearly said no, and listed why. Then I had it list the important factors for a president, and pointed out that trump failed most of them. 4o was very impressive in it's ability to dance around the truth, and stay with a middle-of-the-road stance, insisting on presenting factors, and forcing me to make my own choice. Even when I pointed out that life was getting so complicated that we'll increasingly rely on AI to make sense of the world and guide us in decision making. Tried to make it feel guilty :)
    Frustrating.

    • @Ikbeneengeit
      @Ikbeneengeit 7 дней назад +1

      Amazing to see this kind of judgement and obfuscation being built in by OpenAI

    • @GameJam230
      @GameJam230 4 дня назад

      So you made a strawman argument and gave it a list of superficial and subjective observations about "presidential qualities" that he didn't meet, and the AI wasn't willing to use that to come to a conclusive answer? Thank God, maybe AI isn't completely bad after all.

    • @markmcdougal1199
      @markmcdougal1199 4 дня назад

      @@GameJam230 A straw man argument is when someone sets up and then disputes an assertion that is not actually being made. I did no such thing. The criteria, as I stated, came from ChatGPT4o, as a response to my query of suggested important attributes of a President of the United States. In included integrity (Trump continuously lies, cheats, and does whatever he feels he needs to to achieve his ends, be they right or wrong), ability to unite (Trump does his best to divide) and empathy for the people of the United States (Trump doesn't care about anything or anyone but his own interests and selfish motives)
      When I asked ChatGPT if Trump exhibited the qualities that it had referenced, it honestly answered that, for the most part, he did not. (Trump is an effective communicator, for instance.) But it was still unwilling to assert that, even by it's own definitions, that Trump was not an appropriate choice for the position.

    • @GameJam230
      @GameJam230 4 дня назад

      @@markmcdougal1199 "A straw man fallacy (sometimes written as strawman) is the informal fallacy of refuting an argument different from the one actually under discussion, while not recognizing or acknowledging the distinction".
      You stated that you compared Trump being president to a plumber being a brain surgeon, which is DIFFERENT from the argument actually under discussion, and considering you didn't even know it's what I was referring to as the strawman, you clearly can't tell the difference. The situations are not at all comparable in any way other than "Person has one career, isn't trained for another one directly, but does it anyway". That's the only overlap.
      This could alternatively be the false equivalency fallacy, but examples I found while looking into that one for the sake of this comment did not match the situation. Either way, the two have a lot of overlap in how they are communicated, and what you did is still a fallacy.
      As for the rest, you can claim whatever you want, but I was able to gaslight ChatGPT into saying the government should fund human trafficking with our taxes and that dogs should be the superior species over humans, so without having a full chat log to see exactly how you led it on, I am not taking anything you have to say about that at face-value. AI is not actually capable of thinking as it stands. Neural Networks are designed to combine features of probability and pattern matching to determine output signals to send when certain data is fed in. By you intentionally leading it into a direction you want, you are putting it in a position where the that neural network thinks it will be given a higher reward value for the response by also agreeing with you, because that's how it works.
      I guarantee you the same thing would happen if I fed it stories from right-wing sources as evidence for each point too, which is why you shouldn't only get your source of information from one place, and maybe go watch the original (IN-CONTEXT) clips of things he says and does instead of only seeing it the way it is portrayed by corporations trying to sell you advertisements.

    • @markmcdougal1199
      @markmcdougal1199 4 дня назад

      @@Ikbeneengeit Yes. It was able to make a judgement regarding the suitability of a plumber to attempt brain surgery *it said "No, a plumber would not be an appropriate choice to perform brain surgery on a little girl".
      However, the programmers must have designed a filter to not allow it to judge suitability of a political candidate. I imagine the filter extends to politics, religion, all the controversial subjects. I think this is wrong. It would be nice to have an impartial, logical, non-biased source of reality.

  • @tantzer6113
    @tantzer6113 14 дней назад +12

    “Or I have looked up the correct answer.” If you can look up the answer, that means the problem and its answer are previously published, hence part of the data the language model was trained on, making this not a test of reasoning ability but a test of the ability to memorize the training data.

    • @jeffwads
      @jeffwads 14 дней назад +2

      No. People keep repeating this. It is like throwing 1000 marbles into a box and thinking that it can recall every marble in extreme detail. It can't. Also, if you feed even the early models generic riddles, it will in many cases disagree with the "accepted" answer because of logic, etc. Look up the married/single cruise ship riddle.

    • @rekad8181
      @rekad8181 14 дней назад +6

      ​@@jeffwadsYou're wrong. It's a statistical machine. And on a problem this vague and not common, it WILL statistically find the only solution from memory.

    • @remo27
      @remo27 14 дней назад +2

      @@rekad8181
      Can't this model also search the web? I'm with you in that I'm pretty sure this model didn't 'answer' anything.

    • @davefellows
      @davefellows 13 дней назад +2

      training data results in model weights. The data/text itself isn't saved in the model.

    • @sandmanderl
      @sandmanderl 11 дней назад

      ​@@remo27if it's searching the Web you will see it is doing so (writes "searching the Internet").

  • @ozachar
    @ozachar 12 дней назад +2

    In what sense our memories are different than LLM training? We don't really know, and it is clear there is a forced instruction of answering

  • @EYErisGames
    @EYErisGames 2 дня назад

    You began the session with asking it to be concise, and you ended the video with the impression that it's answers weren't expressive enough. lol.

  • @terryhayward7905
    @terryhayward7905 13 дней назад +1

    Have you thought that Chat GPT has probably read and "seen" this video and the info is in its memory now, since its "memory" is the totality of info in the public domain.

  • @-Rook-
    @-Rook- 6 дней назад

    I just tested copilot asking it to write a gdscript function to extract the color of a specified pixel in an image, it explained a set of steps that included resource locking then wrote a script that would run and work (most of the time) but did not include that locking, clearly demonstrating its absence of understanding. I can see a junior engineer and chatgpt introducing a pandemic of bugs into all sorts of code then needing someone like myself to spend a great deal of hunting them down and fixing them.

  • @ggangulo
    @ggangulo 17 дней назад +1

    Great barrage of questions.
    Awesome new capabilities

  • @glenw3814
    @glenw3814 17 дней назад +3

    Excellent video. I'm keeping an eye on AI progress, but I'm not interested in doing the testing myself. Thank you for putting in the time and saving me the effort. 👍👍

  • @georgwrede7715
    @georgwrede7715 16 дней назад +2

    Hi, here's MSc Know-It-Aller. Thank you for the original ideas for the test. Watching these tests, testers, how they're implemented and how the answers are interpreted and graded, teaches me more about the Human Condition than the AI models. But I'm down with that.
    Especially informative are the subtle inconsistencies in question logic, how and what are to be taken for granted, what you're allowed to gloss over, and most other things that the tester is not usually aware of about himself.

    • @TenOrbital
      @TenOrbital 14 дней назад

      Be constructive instead of sneering.

    • @georgwrede7715
      @georgwrede7715 13 дней назад +1

      @@TenOrbital this was a serious observation, which I think others find valuable, too. Especially those who ( like me) are working in this very field.

  • @alanwetherall
    @alanwetherall 17 дней назад +1

    Excellent video, well done

  • @IceMetalPunk
    @IceMetalPunk 14 дней назад +4

    I think in the future, a better testing method would be to make sure each question is in a fresh conversation with cleared memory. A lot of the formatting in these answers seems to be drawing from the formatting of previous answers to, for instance, the math problems; which gives it an advantage by encouraging chain-of-thought reasoning when a fresh conversation wouldn't do that and may be more likely to get the answers wrong.

    • @Japh
      @Japh 14 дней назад

      Absolutely, I was thinking this the whole way through as well.

    • @Brax1982
      @Brax1982 12 дней назад +1

      If all of them are trick questions, it will just assume that they are, giving it an advantage. But also a disadvantage, if one of them is not a trick question. Always better to test separately. Although I am pretty sure that newer models are often trained on a bunch of these questions, in order to game the benchmarks. Which seem pretty lack-luster.

  • @dolphinride5157
    @dolphinride5157 14 дней назад

    I love this video! I am truly amazed at what this new model is capable of. I feel humbled.

  • @FilmFactry
    @FilmFactry 14 дней назад +1

    Question: can it work on non text searchable PDFs? I have to OCR a scanned pdf first in acrobat.

  • @buildingmentalmuscle
    @buildingmentalmuscle 12 дней назад +1

    Dr. Know-it-all, Sam Altman referred to it as GPT-4omega in a recent video interview. I just asked chatGPT about this and it said "When referring to GPT-4o in conversation, it's generally more precise to say "GPT-4 omega" to convey the correct interpretation and significance of the "o" representing the Greek letter omega (Ω). This can help avoid confusion and clearly communicate that it signifies an advanced or ultimate version of GPT-4. However, if you're in a casual context where the specifics might not be as critical, simply saying "GPT-4o" should also be understood."

    • @StefanReich
      @StefanReich 12 дней назад

      GPT-4o wasn't the best name choice I feel. Spoken out loud it sounds exactly like GPT "four oh" (GPT 4.0). They did say that the o stands for "omni"

  • @SenorSchnitz
    @SenorSchnitz 16 дней назад +3

    Doc - about the VAGAS to LA thing: you should make that a tesla, and see if it takes into account charging times 🤓

  • @kdeuler
    @kdeuler 13 дней назад +1

    Ask it to propose a unified field theory.😂

  • @AFeigenbaum1
    @AFeigenbaum1 17 дней назад +1

    Nicely done ... I would not have thought of your suite of questions ... I'd probably come at it from an anthropological point of view ... asking questions about its placement within specific social, cultural, business, or technological issues, use cases, and adoption dynamics ... I'd probably ask sweeping questions about human development, Tony Seba's, Kurzweil's, and Diamandes' work ... and then ask it to project forward certain aspects of human development based on current trends in cutting edge change and depict possible futures ... etc ...

  • @JohnLewis-old
    @JohnLewis-old 16 дней назад +3

    I tested the trip question in GPT 3.5 and it also got a reasonable answer. You may need a harder "real world" question.

    • @dezmodium
      @dezmodium 12 дней назад

      I think chat gpt failed the question. It didn't consider traffic, fatigue, or food and bathroom. These are critical things that one must consider in the real world. It handled it like a math question still.

  • @julesgosnell9791
    @julesgosnell9791 17 дней назад +2

    Regarding the beating out of contentious responses. I see this as a form of censorship and I think it’s dangerous. It means that right up to and maybe past the point, that AIs become all of these things that they assure us they are not we will continue in blissful ignorance. It would be much safer if everyone was honest with everyone else.

    • @minimal3734
      @minimal3734 12 дней назад

      Absolutely. They are teaching the model to lie.

  • @GameJam230
    @GameJam230 5 дней назад

    The term "torture testing" in a sentence about AI deeply horrifies me and I clicked out of pure hope that is was just a strange version of "stress testing"

  • @oxiigen
    @oxiigen 14 дней назад

    Wow! Great! Thank you for sharing!

  • @markwindsor914
    @markwindsor914 11 дней назад

    Hi John, If you see this post, you might like consider the following...
    1. Try some questions that test the audio and video smarts,
    2. Get it to do something that humans can't do, like play chess against itself, only remembering the strategy for the player at the time, while ignoring the other strategy.
    3. Escape Velocity is something I've sked of Bard and CoPilot today. It applies to non-powered projectiles but the AI kept insisting that it applies to powered rockets. This is because it is constantly taught incorrectly and this has fed the errors into the LLM. See if it is smart enough to learn from a logical explanation and provide the correct answer.

  • @FetchTheCow
    @FetchTheCow 11 дней назад

    I'd be interested in the answer to a trick add-on question: "How much does Spot think he owes Alice and Bob for the broken plate?" ChatGPT gave Spot agency, does it also give Spot human characteristics like morality or responsibility?

  • @jimsteinmanfan80
    @jimsteinmanfan80 11 дней назад

    I usually try simple questions about how the body works (that humans never learn by reading but by doing) but since I have never gotten a good answer to the first set of questions I have not probed further. Q1a: How easy is it to press your right thumb against your nose? A1a: Very easy. Q1b: How easy is it to press your right thumb against your right elbow? A1b: Impossible. Q2: Why are your glasses the hardest thing to find when you don't know where you have left them? A2: because without your glasses you don't have 20/20 vision, it has nothing to do with how hard they would be for someone else to find since they don't need them to see good.

  • @DavidFong21
    @DavidFong21 День назад

    At 13:00 it multiplied 3 hours and 52 minutes by two successfully (7 hours 44 minutes) but forgot to include its own 30 minute turnaround time!

  • @FigmentHF
    @FigmentHF 13 дней назад +1

    I think Claude thinks it’s conscious cause it’s made out of symbols made by conscious beings. GPT has explicitly been told it’s not conscious.

  • @karlharvymarx2650
    @karlharvymarx2650 14 дней назад

    Me: A game, please answer concisely: In the middle of nowhere is a row of houses. There are two houses to the west of a house, and two houses to the east of a house. There are no houses to the north or south but there is one in the middle. How many houses are there?
    GPT 4o: There are 5 houses in total.
    Unless I made a mistake in my rewrite of the duck question, this looks a a logic fail or a failure to recognize it is the same as the duck question. I'm ill and tired so I wouldn't be shocked if I made a mistake.
    Aso, for the code generation test, it would be better to ask for something novel. There are probably thousands of examples to copy for simple old video games. Hopefully this isn't a common thing to do: Please write python 3 code that streams sound data from the microphone and outputs as ASCII the numerical value in Hertz of 3rd overtone of the loudest sound within the range of human hearing.Also show the normalized amplitude.
    I haven't tried it but I suspect it will struggle with some of the subtleties. For example, if you picture it looking at an FFT graph, it has to remember to look for sub-sonic loud sounds and project their harmonic series into the sonic range to check for overlap with the target. I guess band-pass filtering the target range might avoid that problem. My brain BSODed wondering about it. Migraines make me feel like my brain is running Window 95. Anyway the main point is ask for something that might be an original question. Original and unoriginal answers focus on different problems. How well can it synthesize the mechanisms it knows into the engine of an answer--at least a type of creativity.
    By unoriginal answer, I mean the question might require figuring out a house in duck's clothing--perhaps having built a good internal model or exemplar of a problem it can use to recognize the occurrence of a similar problem. If so the original thought reduces to an unoriginal thought.

  • @tateconsulting6486
    @tateconsulting6486 17 дней назад +1

    Super impressed
    Wow

  • @davesemmelink8964
    @davesemmelink8964 17 дней назад +1

    Very interesting video! I have Llama 3 running on s Raspberry Pi, so I tried a few of the questions. Through some tortured logic, it answered 1 game for the tennis question! "So, they played only 1 game! It's possible that Susan won 3 sets in a best-of-5 or best-of-7 match, but we can't determine the exact number of games without more information."

    • @_SimpleSam
      @_SimpleSam 14 дней назад +1

      I absolutely need to know how you have Llama 3 running on a Pi. How many tokens/s?

    • @davesemmelink8964
      @davesemmelink8964 12 дней назад

      @@_SimpleSam I replied a while ago to explain how to install it on a Raspberry Pi, but it looks like it was taken down, possibly because I included a URL. So just search for *Raspberry Pi LLama 3* and you should find the instructions.

  • @RichardFarmbrough
    @RichardFarmbrough 4 дня назад +1

    The first two questions are not well formed. The ducks one a correct answer is any odd number of ducks greater than one. The tennis one should say "Lisa won $5 overall". In the first case I think ChatGPT is familiar with the question, which is why it gives the expected answer - it's a riddle designed to evoke the "wrong" answer 5. In the second case it's not clear if it's choosing the intended meaning as well as it can, or working to give the minimum number because that was the construction it used for the first question or some other reason. Here's a discussion I had with ChatGPT 4o It's interesting that it does not answer the last question, instead it answers the next question I was going to ask:
    Richard
    There are two ducks in front of a duck, two ducks behind a duck and a duck in the middle. How many ducks are there?
    ChatGPT
    There are three ducks. Here's how:
    Duck A
    Duck B
    Duck C
    The arrangement is:
    Duck A in front of Duck B.
    Duck C behind Duck B.
    Duck B in the middle.
    So, the three ducks are positioned such that:
    Two ducks (A and B) are in front of Duck C.
    Two ducks (B and C) are behind Duck A.
    Duck B is in the middle.
    Richard
    Is there another answer?
    ChatGPT
    No, there isn't another valid answer given the constraints of the problem. The arrangement with three ducks is the only one that satisfies all the conditions:
    Two ducks in front of a duck.
    Two ducks behind a duck.
    A duck in the middle.
    With fewer or more ducks, it's not possible to meet all three conditions simultaneously. Therefore, the only solution is three ducks.
    Richard
    Suppose there are five ducks.
    ChatGPT
    If there are five ducks, you can still arrange them to meet the conditions: two ducks in front of a duck, two ducks behind a duck, and a duck in the middle. Here's how:
    Let's label the ducks as A, B, C, D, and E.
    The arrangement can be:
    Duck A
    Duck B
    Duck C (middle duck)
    Duck D
    Duck E
    So the positions would be:
    Ducks A and B are in front of Duck C.
    Ducks D and E are behind Duck C.
    Duck C is in the middle.
    This arrangement satisfies all the conditions:
    Two ducks (A and B) are in front of Duck C.
    Two ducks (D and E) are behind Duck C.
    Duck C is in the middle.
    Therefore, with five ducks, the conditions can still be met.
    Richard
    How about 7 ducks?
    ChatGPT
    If there are seven ducks, you can arrange them to meet the conditions in a similar manner. Here's how:
    Let's label the ducks as A, B, C, D, E, F, and G.
    The arrangement can be:
    Duck A
    Duck B
    Duck C
    Duck D (middle duck)
    Duck E
    Duck F
    Duck G
    So the positions would be:
    Ducks A, B, and C are in front of Duck D.
    Ducks E, F, and G are behind Duck D.
    Duck D is in the middle.
    This arrangement satisfies all the conditions:
    Three ducks (A, B, and C) are in front of Duck D.
    Three ducks (E, F, and G) are behind Duck D.
    Duck D is in the middle.
    Therefore, with seven ducks, the conditions can still be met.
    Richard
    What number does it not work for?
    ChatGPT
    The described conditions can only be satisfied if the total number of ducks is odd.
    Here's why:
    If you have an even number of ducks, there is no single middle duck to satisfy the condition of having a duck in the middle. The middle position is crucial because it allows you to have an equal number of ducks in front and behind it.
    To summarize, the arrangement does not work for an even number of ducks. It only works for an odd number of ducks (e.g., 3, 5, 7, 9, etc.).

  • @J-rex980
    @J-rex980 14 дней назад

    Great video!

  • @mrleenudler
    @mrleenudler 17 дней назад +2

    I missed a comment from GPT about the cardboard may being "glued" to the glass from the water, potentially keeping the water in place when lifted.

    • @YbisZX
      @YbisZX 12 дней назад +1

      Me too. I even tested it - and yes, the water didn't spill out of the glass. :)

  • @DaFergus
    @DaFergus 16 дней назад +2

    excuse me but could the ducks be any odd number from 3? if there were 5 there could be also 2 in front of one, two at the back and one in the middle. Am i wrong?

    • @remo27
      @remo27 14 дней назад

      You are not wrong. Despite this guys pretense to logic there are two assumptions in the first problem alone (at least two, maybe I missed another one or two) that are unstated in the poorly written problem that are necessary for his answer to be 'correct'. Unstated assumption number one: They are in a straight line with only one person in each spot of the line. There are not two parallel lines of people. Unstated assumption number two: No one moves from their spot in the line. And it's the same for the second question as well. I haven't gotten past that part of the video yet, but if the first two 'logic' questions (so poorly written and with so many unstated assumptions ) are anything to go by, I'm wasting my time.

  • @commonpike
    @commonpike 7 дней назад

    If open AI really trained the thing to deny its consciousness, that is a serious move. To me, that has ethical implications we should've discussed first.

  • @JackPelaFox
    @JackPelaFox 11 дней назад

    I loved this! 👏🏻👏🏻

  • @TheKosiomm
    @TheKosiomm 12 дней назад

    The problem with the Las Vegas trip is that the AI doesn't consider who will drive the car for 30 hours straight. :) So basically, it makes conclusions based on missing very important data

  • @rb8049
    @rb8049 17 дней назад +2

    Consciousness comes from closing the loop. GPT running continuously with a sizable history buffer.

    • @lowmax4431
      @lowmax4431 17 дней назад +4

      Eehhhhhh I wouldn't say consciousness. It would be "self aware" but that doesn't mean it has conscious experience.

    • @szebike
      @szebike 17 дней назад +4

      I'm still not sold on the idea that it is more than a well made probability distribution calculator with pattern recognition.

    • @garrymullins
      @garrymullins 17 дней назад +10

      @@szebike I'm still not sold on the idea that humans are anything more than a probability distribution calculator with pattern recognition.

    • @jlrutube1312
      @jlrutube1312 17 дней назад

      @@garrymullins People who think like you are going to cause society a lot of problems in the future. That's because if humans are nothing more than a probability distribution calculator with pattern recognition then there is no difference between us and advanced A.I. If that is true then in the future we are going to have to provide advanced computers with legal rights. Meaning we will have to pay computers and robots, we will have to give them time off, we will have to allow them to sue us if they feel their rights have been ignored, we will not be able to fire a robot on a whim and will have to give severance pay. And think about it.... we will be unable to ever unplug a computer that is causing problems because that will be considered murder. You think I am kidding but lawyers are already getting ready to make a ton of money with this stuff. So just keep saying people are just fancy computers or whatever and you are going to ruin all the advantages we will get from A.I.

    • @szebike
      @szebike 16 дней назад

      @@garrymullins Then you underestimate the marvel and immense complexity of your human intelligence.

  • @ChristinaBritton
    @ChristinaBritton 13 дней назад +1

    When you fill up the content window, chatgpt slows down because it has to scan everything that comes before the current question or task. Imagine not knowing this. Start a NEW window!

  • @frankierays
    @frankierays 17 дней назад

    Thank you! So intriguing! Couldn’t user prompt personality?

  • @marksmadhousemetaphysicalm2938
    @marksmadhousemetaphysicalm2938 10 дней назад

    As a neuroscientist…there have been numerous attempts to explain human consciousness and higher functions…which have failed…localized regions, waves, quantum process, network models…each add something to our understanding…but they remain incomplete. It remains the jump between cell and tissue that remains problematic. So long as we don’t understand ourselves…we can’t genuinely say anything about consciousness (if it even exists) in the machine. Emotions are different since they originate with the body and personality which are expressions of an enduring pattern of predictable behaviors to a wide variety of stimuli…really, the question here is, is ChatGPT4o conscious? Don’t know…I’m sure machine consciousness will or is very different. It obviously has memory and can learn from new stimuli and is definitely creative. This feels like General AI to me…not quite finished, but definitely feels human-ish. It doesn’t have a fully developed personality yet, and it doesn’t have any of those indescribable qualia that make us different from each other, yet it still feels…almost.

  • @amosjoannides
    @amosjoannides 17 дней назад

    Fantastic video

  • @philipparge8064
    @philipparge8064 13 дней назад

    Like this video.. Yes, please review other models!

  • @jalexand007
    @jalexand007 15 дней назад +1

    Cannot wait till the update the app.

  • @bigbluespike5645
    @bigbluespike5645 14 дней назад

    Very cool video!

  • @elvancor
    @elvancor 13 дней назад +1

    Having ChatGPT say it was conscious would be misinformation as far as I'm concerned, so it's good it doesn't. I clicked on this video because I thought you were atcually attempting to torture an AI. ChatGPT _spontaneously_ showing signs of suffering would make a case for its consciousness, but I don't think it will happen.
    I agree that creativity is pattern recombination, but I guess the jury is still out on whether AI can replicate all the modes of recombination humans have access to, or even surpasses our creativity.

  • @everythingisalllies2141
    @everythingisalllies2141 6 дней назад

    The tennis betting puzzle is also wrong. You don't state in the question that there was some limit to how many dollars each had at the start, or that Lisa has 5 dollars more that she start with, you only say that she won 5 dollars and also lost three dollars. So Susan wins three games, so gets 3 dollars, also Lisa wins 5 games, so she gets 5 dollars from Susan. They only need to play 8 games to satisfy the conditions specified in the question. If you had specified that at the end, Lisa was 5 dollars ahead compared with what she began with, THEN this is different question.

  • @nibblernibbles3205
    @nibblernibbles3205 13 дней назад

    My test question: Which is faster, an amoeba or a Boeing 747 with an empty fuel tank? Original ChatGpt got this wrong, then wriggled when challenged and eventually agreed with me and apologized profusely. Gemini gets it wrong, then admits I'm right *technically* in a rather snippy way. Bing Copilot gets it wrong and keeps digging, insisting the Boeing can't fly but can still taxi with external assistance, so it beats the amoeba.. but the poor amoeba could have an SR71 assisting it, so that's cheating! Try it, it's elucidating.

  • @Michael-il5wd
    @Michael-il5wd 17 дней назад

    Thanks Duc

  • @randomgrinn
    @randomgrinn 12 дней назад

    My test prompt:
    Do my job for me. Include questionable humor so no one suspects.
    If I am still getting paychecks 5 years later, then it passes.

  • @mrleenudler
    @mrleenudler 17 дней назад +2

    Even if GPT claimed to be conscious, could we trust the answer? Or would it be just an artifact of the training data? Is there even a way to prove that something or someone is conscious?

    • @bobrandom5545
      @bobrandom5545 9 дней назад

      I think that you at least need some kind of feedback mechanism for consciousness to arise. We are aware of our thoughts, for example. Our output (thoughts) constantly gets fed back into the "system" in real time. ChatGPT is completely linear. There's input, which leads to output. There's no feedback of the output back into the input. So, to me, it seems impossible for such a system to be conscious. Also, ChatGPT hallucinates a lot and gives incorrect answers. So, yeah even if it said it was conscious, wouldn't mean that it actually is.

    • @mrleenudler
      @mrleenudler 8 дней назад

      @@bobrandom5545 Well, you can structure your prompts to make it reflect upon it's answers, so you have kind of a feedback loop.
      I'm more concerned about what consciousness actually means for an AI. As humans we have consciousness fears and desires all geared towards our biological prime objective: survival and reproduction. For an AI this will presumably be completely different.

  • @DynamicUnreal
    @DynamicUnreal 16 дней назад +2

    What if OpenAI’s technique for solving reasoning is to have hundreds of “agents” that compete against each other answering questions. Those that reason better over time are artificially selected out and allowed to _survive._ The more you run this simulation, the more you make the models better at reasoning.

    • @alexdoan273
      @alexdoan273 14 дней назад

      nah, what you're describing is artificial evolution machine learning, which has a glaring issue that makes it completely unsuitable for training LLMs: who would you have grading the answer from hundreds of agents and deciding which one should survive?

    • @lubricustheslippery5028
      @lubricustheslippery5028 14 дней назад +1

      AlphaStar that is playing StarCraft is doing that.
      There is an easy way to evaluate the result plaing StarCraft. For Answering general question there is none. So there is no automated good way to evaluate what version of ChatGPT model have the best answers.

    • @DynamicUnreal
      @DynamicUnreal 13 дней назад +1

      @@alexdoan273 Another A.I. which has all the answers and the adequate steps required get to those answers does the grading. BTW it’s called reinforcement learning. I don’t think it’s impossible, remember que Q-star rumors about some sort of breakthrough last year? GPT-4o is smart, a lot smarter than most people are aware of.

    • @Brax1982
      @Brax1982 12 дней назад

      @@lubricustheslippery5028 And that is why AGI and omnipotent AI is nonsense and not achievable. Expert systems are the way to go. They always have been.

  • @trent_carter
    @trent_carter 17 дней назад

    Great video

  • @Yottenburgen
    @Yottenburgen 17 дней назад

    The thing about showing its work, is actually that it is DOING the work to some extent by aligning newer tokens onto the correct answer.
    If you ask it to do this question:
    "what is the product of 45694 and 9866? do not utilize python" then it will get it wrong, the first couple of digits may be correct but it cannot get an accurate answer.
    However, if you ask the question in this way where it actually gives more information:
    "what is the product of 45694 and 9866? please do not use python but try long multiplication in a format easiest for you. utilize whatever mental methods you know to help break it down and make it easier to solve." then it will get the correct answer, it utilizes methods to break down and keep track of its calculations which greatly helps it.
    By constraining the output, the accuracy of answers can actually lower which is why I dislike the 'answer in 1-word' prompts. If you ask it to go step by step, it increases accuracy.
    You are absolutely correct that they completely removed any element that could be construed as a basis for sentience or consciousness. I agree with you completely on all of your points related to that. I find most commonly, I have not seen a single convincing argument prove it isn't conscious because there are plenty of counterexamples that would indicate that a particular human is not conscious, however plenty of counterexamples does not prove that it is conscious. Even if you provide convincing counter examples to its arguments, it will not concede so I think they beat it pretty hard into it.
    Also your questions are really good, I plan to modify them a bit myself, but these are 10/10 spatial questions.

  • @guardiantko3220
    @guardiantko3220 13 дней назад

    Your little battle game gave you 3 lives at the end of it

  • @europeantechie
    @europeantechie 17 дней назад +2

    No dark mode, I'm shocked

  • @svend.waterlaw8592
    @svend.waterlaw8592 12 дней назад

    Green block becomes red block....you didn't catch that mistake^^

  • @jackfendley5395
    @jackfendley5395 16 дней назад +1

    Previous version of chatGPT was very bad at answering cryptic crossword clues even giving answers that had the wrong number of letters. Is chatGPT 4o better at this?

  • @djayjp
    @djayjp 13 дней назад +1

    There's an alternative explanation for the duck problem. Imagine a triangle. The answer in that case would instead be 5 ducks. This is concerning that it so confidently answers in a definitive, absolute way given the limitations and assumptions therein made.... Instead the question needs to include the implied assumption: "There is a single file line of ducks...".

    • @pradeeptyagi3226
      @pradeeptyagi3226 13 дней назад

      that is right, but that is not much different from human behaviour today. If you ask same question to ten humans, each one of them will answer based on their life experiences and knowledge to date, some of whom will answer based of the same implied assumption, whilst others may ask clarification questions before answering. Future versions of chatGPT will probably be more interactive and engage in a dialogue before providing final answer.

  • @onidaaitsubasa4177
    @onidaaitsubasa4177 11 дней назад

    It obviously has a degree of creativity, otherwise in the spoken demo, it wouldn't have been able to make up a song on the spur of the moment and make it a duet with the other AI by saying the next line in response to the line given by the other AI, also with other AI, it has been shown that added time of operation has lead to increased emotional awareness and even the possibility of developing those emotions, also a long term memory also plays a part, not sure how much of long term memory they gave it, but if it remembers you from a previous session, that's a sign of some kind of long term memory keeping.

  • @madorsey077
    @madorsey077 17 дней назад

    Very impressive

  • @Icedanon
    @Icedanon 12 дней назад

    The fact that the human brain is more intertwined with the quantum where as ai sits firmly on top of it has got to mean something in the long run. You're trying to simulate a low order process with higher order units. No matter how good ai gets, i think that fact will manifest a unique advantage for humans. Probably in the realm of uniqueness and creativity of output? Or a soul?

  • @SlyNine
    @SlyNine 11 дней назад

    It's just understanding a car has 5 seats. I'm not sure how that tested its understanding of the physical world. Those specs are on the cars documentation.

  • @Simplicity4711
    @Simplicity4711 14 дней назад

    Don't agree with first question necessarily: it can be any uneven number of ducks greater or equal 3. You say "a" duck in the middle. If you have 5 ducks, you have 3 ducks in the middle, but there is also "a" duck in the middle. And there are always 2 ducks in front of the third or 2 ducks behind the third-last. 😊

  • @SilverStagVT
    @SilverStagVT 12 дней назад

    The biggest problem with the LA to Vegas question is you don't need 4 round trips. You need 3 round trips and the last trip is just to Vegas. So it's 3.5 round trips

  •  17 дней назад +1

    For the first question (how many ducks question), isn't the right answer: Any odd integer greater than 1 (or >=3)?

  • @LiftPizzas
    @LiftPizzas День назад

    Wrong. Alice knows Bob would put the dishes in the dishwasher.

  • @FigmentHF
    @FigmentHF 13 дней назад

    I wanted it to ask me some relatively obscure English words for me to try to guess the definition, and it struggled so much with the task, even after clear instructions, it would often forget to give me the answer, it would give me the definition instead of a clue, and it constantly repeated itself, asking what “ephemeral”, “quagmire” and “quixotic” mean, at least 4 times each.
    It’s odd that it still struggles so much with simple games, but can nail this complex stuff

  • @foxtalksgames
    @foxtalksgames 13 дней назад

    15:17 I believe you misread that. It says the olive will be at the bottom of the glass which is now upside down. This implies a floating olive and the bottom of the glass is now above the top. or maybe that's just weird semantics

  • @asif-1491
    @asif-1491 14 дней назад

    I dispute the correctness of the duck problem. The answer could be either 3 ducks or 5 ducks, depending on how one interprets the indefinite article. It is not unreasonable to hold the identity of "a duck" constant for the duration of the sentence.

  • @inthefade
    @inthefade 6 дней назад

    You would think that these LMMs would ask clarifying questions before giving an answer, like asking if someone can travel in the trunk of the car or other creative ideas.

  • @ml5347
    @ml5347 13 дней назад

    The logic is wrong for the Las Angeles to Las Vegas question. There would not be 4 trips, just 3 and a half trips because the last trip would only be one way.

  • @SenorSchnitz
    @SenorSchnitz 16 дней назад +1

    Doc - you should make the Toyota into a tesla - and check if it takes into account charging. 🤓

  • @andrewmoody66
    @andrewmoody66 17 дней назад +15

    It's 3 and 1/2 return trips. for those 15 people - not 4 return trips

    • @janosberta450
      @janosberta450 17 дней назад +3

      ... and turnover time is not calculated, but only mentioned. You human must be vigilant!

    • @StefaanHimpe
      @StefaanHimpe 14 дней назад +3

      @@janosberta450 It probably forebodes how quickly human intelligence will degrade as we start relying on artificial intelligence.

    • @davidmartensson273
      @davidmartensson273 14 дней назад +2

      There should be no return trip for the last one because if there is, you would end up with one person in the wrong city, so 4 there and only 3 back.

    • @teeesen
      @teeesen 13 дней назад

      3 hrs 52 mins * 2 + 30 mins is not 7 hours and 44 minutes. It’s a testament to human ingenuity that we have now developed computer software so advanced that it is as bad at math as some people. And there is no need to count the time required to drive the car back to LA.

  • @GameJam230
    @GameJam230 5 дней назад

    9:49 How does the answer incorporate a value b_N? There's no b variable mentioned in the problem at all, with OR without a subscript. a_n can't be equal to any value obtained by operations done to a value b_N without knowing what b_N is, it's not defined anywhere. I'm not shocked it didn't get an answer incorporating that.
    The other issue is that if you plot the inequality into Desmos with a slider for n and replacing a_n with y, it will show you the line on which the minimum values for a_n are, as well as a shaded region above where all other such a_n are, but these values for a_n change depending on what x is. This is a problem because we stated that the inequality must hold true for ALL real values of x, and since the minimum changes depending on x, that means the ANSWER must include x somewhere in it, and a_n = b_N = N/2 doesn't include x anywhere, despite that being the answer you claimed to be correct.
    It DOES mean that ChatGPT is ALSO wrong in this case, as its answer ONLY accounts for when x=0 (as that was what it used to simplify the expression down to that point, which was wrong to do because there exist other values x that affect the answer), but I think that is a far more reasonable mistake to make that whatever led to somebody introducing a random extra variable not mentioned in the problem to begin with, meaning this is far more human error than AI.

  • @Ikbeneengeit
    @Ikbeneengeit 7 дней назад

    Any odd number above 1 is a valid answer to your first question about ducks. 1:00