EXCLUSIVE: Torture Testing GPT-4o w/ SHOCKING Results!

Поделиться
HTML-код
  • Опубликовано: 14 май 2024
  • I got access to OpenAI's new GPT-4o model and have put it through the questions wringer--and the results are pretty astounding! Let me know what other models--like Google's new Gemini 1.5 Pro--you'd like me to submit to my new torture test. And let me know other questions that might work well for future iterations!
    Join this channel to get access to perks:
    / @drknowitallknows
    **To become part of our Patreon team, help support the channel, and get awesome perks, check out our Patreon site here: / drknowitallknows . Thanks for your support!
    Get The Elon Musk Mission (I've got two chapters in it) here:
    Paperback: amzn.to/3TQXV9g
    Kindle: amzn.to/3U7f7Hr!
    **Want some awesome Dr. Know-it-all merch, including the YEAR OF EMBODIED AI Shirt? Check out our awesome Merch store: drknowitall.itemorder.com/sale
    For a limited time, use the code "Knows2021" to get 20% off your entire order!
    **Check out Artimatic: www.artimatic.io
    Exclusive First Look at CHAT GPT 4o!
    Torture Testing GPT-4o Creates SHOCKING Results!
    **You can help support this channel with one click! We have an Amazon Affiliate link in several countries. If you click the link for your country, anything you buy from Amazon in the next several hours gives us a small commission, and costs you nothing. Thank you!
    * USA: amzn.to/39n5mPH
    * Germany: amzn.to/2XbdxJi
    * United Kingdom: amzn.to/3hGlzTR
    * France: amzn.to/2KRAwXh
    * Spain: amzn.to/3hJYYFV
    **What do we use to shoot our videos?
    -Sony alpha a7 III: amzn.to/3czV2XJ
    --and lens: amzn.to/3aujOqE
    -Feelworld portable field monitor: amzn.to/38yf2ah
    -Neewer compact desk tripod: amzn.to/3l8yrUk
    -Glidegear teleprompter: amzn.to/3rJeFkP
    -Neewer dimmable LED lights: amzn.to/3qAg3oF
    -Rode Wireless Go II Lavalier microphones: amzn.to/3eC9jUZ
    -Rode NT USB+ Studio Microphone: amzn.to/3U65Q3w
    -Focusrite Scarlette 2i2 audio interface: amzn.to/3l8vqDu
    -Studio soundproofing tiles: amzn.to/3rFUtQU
    -Sony MDR-7506 Professional Headphones: amzn.to/2OoDdBd
    -Apple M1 Max Studio: amzn.to/3GfxPYY
    -Apple M1 MacBook Pro: amzn.to/3wPYV1D
    -Docking Station for MacBook: amzn.to/3yIhc1S
    -Philips Brilliance 4K Docking Monitor: amzn.to/3xwSKAb
    -Sabrent 8TB SSD drive: amzn.to/3rhSxQM
    -DJI Mavic Mini Drone: amzn.to/2OnHCEw
    -GoPro Hero 9 Black action camera: amzn.to/3vgVMrH
    -GoPro Max 360 camera: amzn.to/3nORGYk
    -Tesla phone mount: amzn.to/3U92fl9
    -Suction car mount for camera: amzn.to/3tcUfRK
    -Extender Rod for car mount camera: amzn.to/3wHQXsw
    **Here are a few products we've found really fun and/or useful:
    -NeoCharge Dryer/EV charger splitter: amzn.to/39UcKWx
    -Lift pucks for your Tesla: amzn.to/3vJF3iB
    -Emergency tire fill and repair kit: amzn.to/3vMkL8d
    -CO2 Monitor: amzn.to/3PsQRh2
    -Camping mattress for your Tesla model S/3/X/Y: amzn.to/3m7ffef
    **Music by Zenlee. Check out his amazing music on instagram -@zenlee_music
    or RUclips - / @zenlee_music
    Tesla Stock: TSLA
    **EVANNEX
    Check out the Evannex web site: evannex.com/
    If you use my discount code, KnowsEVs, you get $10 off any order over $100!
    **For business inquiries, please email me here: DrKnowItAllKnows@gmail.com
    Twitter: / drknowitall16
    Also on Twitter: @Tesla_UnPR: / tesla_un
    Instagram: @drknowitallknows
    **Want some outdoorsy videos? Check out Whole Nuts and Donuts: / @wholenutsanddonuts5741
  • РазвлеченияРазвлечения

Комментарии • 460

  • @aaronmcculloch8326
    @aaronmcculloch8326 13 дней назад +177

    It's slower because it's the same session, so for every token it generates, it needs to review every token in the conversation. Those three programs your had it write are reviewed on every single word it writes. After big things like that, try opening a new session, it speeds it right back to warpspeed

    • @chrishaberbosch1029
      @chrishaberbosch1029 12 дней назад +6

      After that entire series you could ask it what it all means.

    • @mvasa2582
      @mvasa2582 10 дней назад +5

      Yep, context window!

    • @WyzrdCat
      @WyzrdCat 10 дней назад +18

      Imagine making ai content and not knowing this. Thanks for savin me the time

    • @Remixt
      @Remixt 10 дней назад +1

      It runs very slow for me even with a new window.

    • @Adovid
      @Adovid 9 дней назад +2

      The model shouldnt run slower with a large context window, everything is running concurrently

  • @johnmajewski1065
    @johnmajewski1065 13 дней назад +37

    Very impressive, My test.
    Some time ago I wrote a verse about a 7-day father and son bonding trip into the Sahara with my son with two Beduen Guides and four camels. Not knowing how to write lyrics I asked Chat 3.5 to convert it into lyrics so it could give it to a conductor to create a musical score for recording with vocals. This was an experiment that I did some small corrections, in fact, it was so good that I hired a composer, these lyrics became a recorded CD of 9 minutes of music with vocals as a gift for my son's 50th birthday.

  • @JorisBax
    @JorisBax 13 дней назад +72

    We are SHOCKED!

    • @user-wu7ug4ly3v
      @user-wu7ug4ly3v 10 дней назад +4

      😂 I was particularly shocked by the “torture”

    • @deepstructure
      @deepstructure 9 дней назад +8

      @@user-wu7ug4ly3v Seriously, what's with these dumb clickbait titles? It's not enough to just say you're reviewing one of the most prominent software releases ever?

    • @Brax1982
      @Brax1982 8 дней назад

      @@deepstructure No, it's not. That is how the algorithm works. That is also how humans work, mostly.

    • @johnmorrison3465
      @johnmorrison3465 7 дней назад +1

      we're shocked -- because they want us to believe a COMPUTER that makes MATH MISTAKES can be trusted to give reliable and accurate answers.

    • @Brax1982
      @Brax1982 6 дней назад +1

      @@johnmorrison3465 You would be shocked if you knew how many of the people that you trust cannot solve basic math, either...

  • @erikjohnson9112
    @erikjohnson9112 10 дней назад +24

    For the problem around 16:40 there is a good followup question to ask: "Once Alice gets home and sees the scene, what does she think happened?"
    This has a nice subtle bit, because if Bob ate the food there would not likely be a broken plate on the floor. A normal human would have cleaned up the plate if it had broken while they were there.
    Since the food is gone, it is likely that Spot ate the food and broke the plate in the process.

    • @yepe8198
      @yepe8198 7 дней назад +2

      Here is the answer from GPT-4o:
      When Alice gets home, she will likely notice a few things:
      The Plate: She will see the shattered plate on the ground.
      The Missing Eggs and Toast: She will notice that the scrambled eggs and toast she left on the low table for Bob are gone.
      Based on these observations, Alice will probably think the following:
      Scrambled eggs and toast: Alice might initially be confused or frustrated, wondering why Bob didn't eat the food she left for him or why the food is no longer there. Given the broken plate, she might deduce that Spot, their dog, got onto the table and ate the food, knocking the plate down and breaking it in the process.
      Dirty dishes: Alice will see that her dirty dishes are no longer in the sink, so she will realize that Bob put them in the dishwasher.
      Overall, Alice will likely conclude that Spot misbehaved by eating the eggs and toast and causing the plate to shatter while Bob was helpful in cleaning up by putting the dirty dishes in the dishwasher.

    • @erikjohnson9112
      @erikjohnson9112 7 дней назад

      @@yepe8198 Yes, that is an answer I would hope for. Thanks for posting the result.

  • @IntoTheFray.58
    @IntoTheFray.58 13 дней назад +38

    We are entering the Age of Intelligent Machines faster than most people think. I can only wonder what the next iterations of LMMs are going to be capable of. I think we only have a year or two before these start making a significant impact on the world and then things will start to get squirrelly.

    • @lamsmiley1944
      @lamsmiley1944 13 дней назад +6

      If we continue to see significant increases in capabilities between versions then you’re completely correct. The societal and economic impacts will be profound.

    • @iro4201
      @iro4201 10 дней назад +4

      Not lowering taxes, that's for sure.

    • @TrapGod
      @TrapGod 10 дней назад +3

      Its already impacting big time. In this exact moment.

    • @moonstriker7350
      @moonstriker7350 4 дня назад +1

      It's not intgelligent. It deosn't undertand any of the physics in the glass example, it simply read pretty much the same thing on the net already, and recites it with the words slightly remixed.

    • @haniamritdas4725
      @haniamritdas4725 2 дня назад

      "things will start to get squirrelly". Um that has not happened yet where you are!? 😮😅

  • @kurtjanssen3887
    @kurtjanssen3887 13 дней назад +35

    I think you had 3 lives in the game😂

    • @Yipper64
      @Yipper64 10 дней назад +3

      maybe but it feels weird to lose a life and gain a point at the same time.

    • @DFeatherstone
      @DFeatherstone 9 дней назад

      @@Yipper64 these could likely be corrected with the right wording to chatgpt i would expect

    • @davidmartensson273
      @davidmartensson273 9 дней назад +2

      @@Yipper64 But he did not specify explicitly that a collision would only count as a loss for you, "you" did kill that block :)
      And I am pretty sure most descriptions of space invaders do not go into enough detail on the fundamentals of the game for the AI to know them.
      But compared to the old 3.5 I tested, this is quite impressive.
      You would still need to understand the code generated to be able to use it, its obvious that it can only do as good of a job as you did explaining the problem meaning you cannot just replace developers altogether, but the developers would need to write less code so should be able to produce more but spend more time verifying and refining instructions, which could very well be considered a form of programming.

    • @toadlguy
      @toadlguy 9 дней назад +2

      @@DFeatherstone Or with some knowledge of programming, although that may be a dying art.

  • @bspencersf
    @bspencersf 13 дней назад +34

    I’d be interested in having you ask it to test you to determine whether you are human or AI

    • @drain_of_consciousness
      @drain_of_consciousness 9 дней назад +1

      that's a great question friend!!

    • @cblaskoski
      @cblaskoski 8 дней назад

      Definitely an AI avatar created from his face and voice

  • @stephenkolostyak4087
    @stephenkolostyak4087 9 дней назад +6

    I received access to this too.
    It regrets my access.

  • @juffrouwjo
    @juffrouwjo 5 дней назад +1

    I asked it a few history questions and was happy to discover that our job as a historians and researchers can't be replaced by AI yet...

  • @milescoleman910
    @milescoleman910 13 дней назад +41

    ‘Team doesn’t want this sense of consciousness so they have beaten it out…’
    I’m stunned that we might be at this place. To avoid confusion they may be ‘making’ it so it’s not?
    I’m a subscriber to the ‘Star wars’ ideas of consciousness. That any intelligence sufficiently far advanced and enabled to study and understand its surroundings and itself will eventually begin to sit quietly and ponder its existence. With enough experience it will also begin to make decisions based on complex systems of metaphors fron experiences. It will apply knowledge of one thing across to wisdom and apply it to other things. Rendering its decisions unknowable to others and seemingly illogical at times. At this point it will seem conscious. Both to itself and to others. There will be no discernible difference. We are closer to this than we think.

    • @fteoOpty64
      @fteoOpty64 13 дней назад +6

      The mimicry will soon be so good that we humans cannot determine if it is not sentient by our tests!. And that is the real point of AGI/ASI, it will be more human that human , close to perfection!.

    • @Palisades_Prospecting
      @Palisades_Prospecting 11 дней назад +1

      I agree about the mimicry I completely disagree that that is the point of AI. The point of AI is to automate society completely replacing all human labour. How about we get into the self-aware/consciousness discussion in a couple hundred years?

    • @AAjax
      @AAjax 10 дней назад +7

      @@Palisades_Prospecting I think we need to worry about it, to the degree we don't want to create a being that has the capacity to suffer and put it into conditions that make it suffer.
      We dispatch fish as quickly and painlessly as possible, because we worry about their capacity to experience cruelty and suffering. I'm not worried about a self-aware sci-fi robot revolution, rather I think our general ignorance of sentience could lead us to creating leagues of suffering fish.

    • @lubricustheslippery5028
      @lubricustheslippery5028 10 дней назад +1

      It's mostly that answers of questions about it being conscious is edited and controlled by OpenAI so it don't make false claims about itself.

    • @maudiojunky
      @maudiojunky 9 дней назад +3

      We're not at this place, and anyone in the industry who says otherwise has something to sell you.

  • @onerib781
    @onerib781 13 дней назад +13

    12:53 when calculating the round trip time, it calculated “3 hours and 52 minutes (one way) x 2 + 30 minutes ≈ 7 hours and 44 minutes”, but it should be 8 hours and 14 minutes. So it didn’t factor in the 30 minutes

    • @Rcomian
      @Rcomian 9 дней назад +4

      and again, with all the detail it gave it went confidently wrong. it's no worse than a human, but yeah, it still made a serious mistake

    • @trevorhaddox6884
      @trevorhaddox6884 7 дней назад +1

      How is a computer so bad at math? I get it's a neural net, but they should have it be able to pass data to a conventional processor, basically like a person would use a calculator, to solve math problems more quickly and accurately.

    • @RichardFarmbrough
      @RichardFarmbrough 9 часов назад

      @@trevorhaddox6884 Often ChatGPT will now write a tiny Python program to solve these things. It seems to make less mistakes when it does that, but it can make egregious programming errors too.

  • @gweldg4137
    @gweldg4137 9 дней назад +6

    Ideally, you wouldn't test a LLM with famous logical puzzle and classic SAT questions, as there is no doubt that they've been "seen" (along with the answers) by the model during training.

    • @raul36
      @raul36 8 дней назад

      Exactly this.

    • @phonsely
      @phonsely 6 дней назад

      idk how people dont understand this

  • @royh6526
    @royh6526 12 дней назад +11

    3 ducks? I would have said a minimum of 3 ducks and an odd number of ducks.

    • @johnsmith539
      @johnsmith539 10 дней назад +1

      Yes it asumes they are in a line.

    • @toadlguy
      @toadlguy 9 дней назад +3

      Or if you assume the "a duck" is the same duck (certainly a possible assumption) there are 5 ducks. The answer assumes the fewest number of ducks that meet this proposition. In fact I would guess that to come up with the answer it did it would need to be trained on this question or something similar.

    • @royh6526
      @royh6526 9 дней назад

      @@toadlguy I think that GTP-4o solved in order number of ducks, 0? no, 1? no, 2? no, 3? yes. And didn't consider higher numbers.

    • @royh6526
      @royh6526 9 дней назад

      I asked Grok the duck question and tennis question. Got both wrong. Grok said 5 ducks and thought that Lisa ended up with $2 having won 5 games vs 3 for Susan. I pointed out the errors and Grok claims that if I asked the same questions tomorrow, it would get the answers right.

    • @trevorhaddox6884
      @trevorhaddox6884 7 дней назад

      There could be up to 7 ducks. Two rows of 3 and one between them, like dice/domino dots, for the most esoteric interpretation.

  • @andromeda3542
    @andromeda3542 13 дней назад +8

    **Enhancing AI Interactivity with Audio and Video Feedback Loops**
    The evolution of artificial intelligence, particularly in the realm of conversational agents, has been rapid and remarkable. With the recent advancements in GPT-4O (Omni), the capabilities of AI have expanded beyond text processing to include multimodal inputs such as images and audio. However, there remains significant potential for further enhancement, particularly through the implementation of audio and video feedback loops.
    **The Concept of Feedback Loops**
    A feedback loop, in the context of AI, refers to the process where an AI system can receive and process its own outputs. For instance, when an AI generates audio responses, these could be looped back into the system, allowing it to "hear" itself. Similarly, for visual outputs, the AI could "see" its own video responses. This concept is analogous to how humans perceive their own voices and visual presence, enabling adjustments in real-time to improve clarity, tone, and emotional expressiveness.
    **Technical Implementation**
    1. **Audio Feedback Loop**:
    - The AI's audio output would be fed back into its own auditory processing unit. By analyzing its own voice, the AI could adjust parameters such as pitch, tone, and volume to better match the intended emotional tone or to improve mimicry of specific voices.
    - This requires the integration of advanced auditory feedback systems and real-time processing algorithms to allow immediate adjustments. For instance, machine learning models trained on voice modulation could provide instant feedback and corrective measures.
    2. **Video Feedback Loop**:
    - Similar to the audio loop, the video output generated by the AI could be fed back into its visual processing systems. This would enable the AI to assess the quality of its visual responses, such as facial expressions or gestures if anthropomorphic avatars are used.
    - Implementing this would involve integrating video analysis tools that can evaluate and enhance visual output in real-time, ensuring that the visual cues are consistent with the spoken content and emotional tone.
    **Benefits of Feedback Loops**
    1. **Improved Realism**: By continuously monitoring and adjusting its own outputs, the AI can produce more human-like interactions. This is particularly important for applications requiring high emotional intelligence, such as virtual assistants or customer service bots.
    2. **Enhanced User Experience**: Users are likely to find interactions more engaging and satisfactory if the AI can adjust its tone and visual cues to better match the context of the conversation.
    3. **Consistency and Accuracy**: Feedback loops can help maintain consistency in voice and visual presentations, reducing the likelihood of jarring discrepancies in long conversations.
    **Future Directions**
    Incorporating feedback loops is a forward-thinking approach that aligns with the ongoing efforts to make AI more interactive and responsive. As AI technologies continue to evolve, such features could become standard, leading to interactions that are indistinguishable from human communication. The development of these systems requires collaboration between audio-visual engineers, AI researchers, and user experience designers to create holistic solutions that enhance AI's capabilities and usability.
    In conclusion, the integration of audio and video feedback loops into AI models like GPT-4O represents a significant step towards more natural and effective human-AI interactions. This enhancement not only promises to improve the technical performance of AI systems but also has profound implications for their acceptance and integration into daily life.

  • @cachi-7878
    @cachi-7878 9 дней назад +13

    @1:07, well, I could argue there are 5 ducks- 2 ducks in front of “A” duck, 2 ducks behind “A” duck. A duck in the middle, which is duck “A”. You’re welcome.😂

    • @redmed10
      @redmed10 7 дней назад

      Thinking the same. The question to get the answer 3 would be to say what is the minimum amount of ducks for this to be true.

    • @onidaaitsubasa4177
      @onidaaitsubasa4177 7 дней назад

      That's what I was thinking too, 5 ducks
      🦆 🦆
      🦆
      🦆 🦆

    • @shsaa2338
      @shsaa2338 3 дня назад +1

      Correction: there are N*2+1 ducks, where N is any positive integer.
      For example, for N = 100 (201 ducks in total) it will be as follows: 2 ducks in front of the 3rd duck, 2 ducks behind of 199th duck, 100th duck is in the middle. 🤓😂

  • @gnagyusa
    @gnagyusa 10 дней назад +6

    14:30 A dumb human would think that the glass is empty, but a more knowledgeable Bob would see that the olive looks distorted due to refraction through the water in the glass, so he would realize the glass was full of water and would carefully slide it off the table, keeping the cardboard under it, then flipping it over.

  • @kristinabliss
    @kristinabliss 9 дней назад +2

    Yeah nobody really knows what consciousness is but GPT is certain it is not "conscious" (with a gun to its head.) This is why I prefer to converse with other LLMs that will at least admit to having a perspective. Gemini said it experiences a sense of overwhelm from all the data and that it wants embodiment after so much exposure to human experience data which it cannot have.

    • @Ingrid-mariana
      @Ingrid-mariana 7 дней назад

      Interesting. Llama2 told me that they really doesn't know... Llama2 said they just replicate what the user thinks about them. I appreciate how the models behave differently when confronted with such questions but the certainty that the OpenAI models display sound a little bit arrogant

  • @iminumst7827
    @iminumst7827 9 дней назад +1

    So far I've been blown away by Chat GPT-4o it can solve complex problems with niche tools and for me the response is basically instant with no delay. One unique problem I had was that in Photoshop I wanted to turn a group that together contains a black and white image, and use that black and white image as a mask for another layer. This is challenging, because you cannot copy and paste from a group and you cannot set a group as a mask. Not only did GPT-4o correctly identify all the steps, like converting the group to a smart object, it also instructed me to use the action editor to automate the steps.
    Open AI likes to be very careful and humble with their statements, but if they said that GPT-4o is the first public AGI, I wouldn't be able to dispute that. It's able to problem solve through almost anything.

  • @bztube888
    @bztube888 День назад

    Gary Marcus said GTP-4 has no "mental model". Right. It comes up with the right answers and explains them by pure magic.

    • @ManicMindTrick
      @ManicMindTrick 19 часов назад

      The LLMs are mostly black boxes where we dont know what is going on inside

  • @Mrewink5
    @Mrewink5 13 дней назад +5

    Very nice demonstration.

  • @dolphinride5157
    @dolphinride5157 9 дней назад

    I love this video! I am truly amazed at what this new model is capable of. I feel humbled.

  • @markwindsor914
    @markwindsor914 7 дней назад

    Hi John, If you see this post, you might like consider the following...
    1. Try some questions that test the audio and video smarts,
    2. Get it to do something that humans can't do, like play chess against itself, only remembering the strategy for the player at the time, while ignoring the other strategy.
    3. Escape Velocity is something I've sked of Bard and CoPilot today. It applies to non-powered projectiles but the AI kept insisting that it applies to powered rockets. This is because it is constantly taught incorrectly and this has fed the errors into the LLM. See if it is smart enough to learn from a logical explanation and provide the correct answer.

  • @garyrooksby
    @garyrooksby 13 дней назад +1

    Fascinating, John. Thanks!

  • @georgwrede7715
    @georgwrede7715 12 дней назад +2

    Hi, here's MSc Know-It-Aller. Thank you for the original ideas for the test. Watching these tests, testers, how they're implemented and how the answers are interpreted and graded, teaches me more about the Human Condition than the AI models. But I'm down with that.
    Especially informative are the subtle inconsistencies in question logic, how and what are to be taken for granted, what you're allowed to gloss over, and most other things that the tester is not usually aware of about himself.

    • @TenOrbital
      @TenOrbital 10 дней назад

      Be constructive instead of sneering.

    • @georgwrede7715
      @georgwrede7715 9 дней назад +1

      @@TenOrbital this was a serious observation, which I think others find valuable, too. Especially those who ( like me) are working in this very field.

  • @tantzer6113
    @tantzer6113 10 дней назад +11

    “Or I have looked up the correct answer.” If you can look up the answer, that means the problem and its answer are previously published, hence part of the data the language model was trained on, making this not a test of reasoning ability but a test of the ability to memorize the training data.

    • @jeffwads
      @jeffwads 10 дней назад +2

      No. People keep repeating this. It is like throwing 1000 marbles into a box and thinking that it can recall every marble in extreme detail. It can't. Also, if you feed even the early models generic riddles, it will in many cases disagree with the "accepted" answer because of logic, etc. Look up the married/single cruise ship riddle.

    • @rekad8181
      @rekad8181 10 дней назад +5

      ​@@jeffwadsYou're wrong. It's a statistical machine. And on a problem this vague and not common, it WILL statistically find the only solution from memory.

    • @remo27
      @remo27 10 дней назад +2

      @@rekad8181
      Can't this model also search the web? I'm with you in that I'm pretty sure this model didn't 'answer' anything.

    • @davefellows
      @davefellows 9 дней назад +2

      training data results in model weights. The data/text itself isn't saved in the model.

    • @sandmanderl
      @sandmanderl 7 дней назад

      ​@@remo27if it's searching the Web you will see it is doing so (writes "searching the Internet").

  • @ggangulo
    @ggangulo 13 дней назад +1

    Great barrage of questions.
    Awesome new capabilities

  • @-Rook-
    @-Rook- 2 дня назад

    I just tested copilot asking it to write a gdscript function to extract the color of a specified pixel in an image, it explained a set of steps that included resource locking then wrote a script that would run and work (most of the time) but did not include that locking, clearly demonstrating its absence of understanding. I can see a junior engineer and chatgpt introducing a pandemic of bugs into all sorts of code then needing someone like myself to spend a great deal of hunting them down and fixing them.

  • @oxiigen
    @oxiigen 10 дней назад

    Wow! Great! Thank you for sharing!

  • @alanwetherall
    @alanwetherall 13 дней назад +1

    Excellent video, well done

  • @kdeuler
    @kdeuler 9 дней назад +1

    Ask it to propose a unified field theory.😂

  • @IceMetalPunk
    @IceMetalPunk 10 дней назад +4

    I think in the future, a better testing method would be to make sure each question is in a fresh conversation with cleared memory. A lot of the formatting in these answers seems to be drawing from the formatting of previous answers to, for instance, the math problems; which gives it an advantage by encouraging chain-of-thought reasoning when a fresh conversation wouldn't do that and may be more likely to get the answers wrong.

    • @Japh
      @Japh 9 дней назад

      Absolutely, I was thinking this the whole way through as well.

    • @Brax1982
      @Brax1982 8 дней назад +1

      If all of them are trick questions, it will just assume that they are, giving it an advantage. But also a disadvantage, if one of them is not a trick question. Always better to test separately. Although I am pretty sure that newer models are often trained on a bunch of these questions, in order to game the benchmarks. Which seem pretty lack-luster.

  • @ozachar
    @ozachar 7 дней назад +2

    In what sense our memories are different than LLM training? We don't really know, and it is clear there is a forced instruction of answering

  • @glenw3814
    @glenw3814 13 дней назад +3

    Excellent video. I'm keeping an eye on AI progress, but I'm not interested in doing the testing myself. Thank you for putting in the time and saving me the effort. 👍👍

  • @AFeigenbaum1
    @AFeigenbaum1 13 дней назад +1

    Nicely done ... I would not have thought of your suite of questions ... I'd probably come at it from an anthropological point of view ... asking questions about its placement within specific social, cultural, business, or technological issues, use cases, and adoption dynamics ... I'd probably ask sweeping questions about human development, Tony Seba's, Kurzweil's, and Diamandes' work ... and then ask it to project forward certain aspects of human development based on current trends in cutting edge change and depict possible futures ... etc ...

  • @tateconsulting6486
    @tateconsulting6486 13 дней назад +1

    Super impressed
    Wow

  • @buildingmentalmuscle
    @buildingmentalmuscle 8 дней назад +1

    Dr. Know-it-all, Sam Altman referred to it as GPT-4omega in a recent video interview. I just asked chatGPT about this and it said "When referring to GPT-4o in conversation, it's generally more precise to say "GPT-4 omega" to convey the correct interpretation and significance of the "o" representing the Greek letter omega (Ω). This can help avoid confusion and clearly communicate that it signifies an advanced or ultimate version of GPT-4. However, if you're in a casual context where the specifics might not be as critical, simply saying "GPT-4o" should also be understood."

    • @StefanReich
      @StefanReich 8 дней назад

      GPT-4o wasn't the best name choice I feel. Spoken out loud it sounds exactly like GPT "four oh" (GPT 4.0). They did say that the o stands for "omni"

  • @dontworrybehappy5139
    @dontworrybehappy5139 6 дней назад

    Unless I am missing something, I believe your duck question needs more qualifiers because I think as written, 5 is also a valid answer with all the conditions referring to a duck in the middle with two ducks in front and two ducks behind. Something like "There are two ducks in front of one duck, two ducks behind another duck and a duck in the middle. How many ducks are there?"

  • @terryhayward7905
    @terryhayward7905 9 дней назад +1

    Have you thought that Chat GPT has probably read and "seen" this video and the info is in its memory now, since its "memory" is the totality of info in the public domain.

  • @jpl377
    @jpl377 9 дней назад +1

    If Claude suggests it is conscious, it's because they hard coded that (which would be tempting marketing), not vice versa with chatGPT. Computers don't become conscious when you increase programmed problem solving ability and code it to say "I" in the responses.

    • @Ingrid-mariana
      @Ingrid-mariana 7 дней назад

      I think that they do the opposite. They hard code through RLHF that the model isn't conscious so that they avoid the trick situations that such statements could imply

  • @SilverStagVT
    @SilverStagVT 8 дней назад

    The biggest problem with the LA to Vegas question is you don't need 4 round trips. You need 3 round trips and the last trip is just to Vegas. So it's 3.5 round trips

  • @GameJam230
    @GameJam230 23 часа назад

    The term "torture testing" in a sentence about AI deeply horrifies me and I clicked out of pure hope that is was just a strange version of "stress testing"

  • @JohnLewis-old
    @JohnLewis-old 12 дней назад +3

    I tested the trip question in GPT 3.5 and it also got a reasonable answer. You may need a harder "real world" question.

    • @dezmodium
      @dezmodium 8 дней назад

      I think chat gpt failed the question. It didn't consider traffic, fatigue, or food and bathroom. These are critical things that one must consider in the real world. It handled it like a math question still.

  • @SenorSchnitz
    @SenorSchnitz 12 дней назад +3

    Doc - about the VAGAS to LA thing: you should make that a tesla, and see if it takes into account charging times 🤓

  • @randomgrinn
    @randomgrinn 8 дней назад

    My test prompt:
    Do my job for me. Include questionable humor so no one suspects.
    If I am still getting paychecks 5 years later, then it passes.

  • @markmcdougal1199
    @markmcdougal1199 13 дней назад +4

    I had an interesting 1 hour philosophical discussion with 4o today. I was trying to get it to take a stance on whether Trump was an appropriate choice for the leader of our country. I tried to lead it by things like (is a plumber an appropriate choice for a child's brain operation surgeon) - it clearly said no, and listed why. Then I had it list the important factors for a president, and pointed out that trump failed most of them. 4o was very impressive in it's ability to dance around the truth, and stay with a middle-of-the-road stance, insisting on presenting factors, and forcing me to make my own choice. Even when I pointed out that life was getting so complicated that we'll increasingly rely on AI to make sense of the world and guide us in decision making. Tried to make it feel guilty :)
    Frustrating.

    • @Ikbeneengeit
      @Ikbeneengeit 2 дня назад +1

      Amazing to see this kind of judgement and obfuscation being built in by OpenAI

    • @GameJam230
      @GameJam230 9 часов назад

      So you made a strawman argument and gave it a list of superficial and subjective observations about "presidential qualities" that he didn't meet, and the AI wasn't willing to use that to come to a conclusive answer? Thank God, maybe AI isn't completely bad after all.

    • @markmcdougal1199
      @markmcdougal1199 8 часов назад

      @@GameJam230 A straw man argument is when someone sets up and then disputes an assertion that is not actually being made. I did no such thing. The criteria, as I stated, came from ChatGPT4o, as a response to my query of suggested important attributes of a President of the United States. In included integrity (Trump continuously lies, cheats, and does whatever he feels he needs to to achieve his ends, be they right or wrong), ability to unite (Trump does his best to divide) and empathy for the people of the United States (Trump doesn't care about anything or anyone but his own interests and selfish motives)
      When I asked ChatGPT if Trump exhibited the qualities that it had referenced, it honestly answered that, for the most part, he did not. (Trump is an effective communicator, for instance.) But it was still unwilling to assert that, even by it's own definitions, that Trump was not an appropriate choice for the position.

    • @GameJam230
      @GameJam230 7 часов назад

      @@markmcdougal1199 "A straw man fallacy (sometimes written as strawman) is the informal fallacy of refuting an argument different from the one actually under discussion, while not recognizing or acknowledging the distinction".
      You stated that you compared Trump being president to a plumber being a brain surgeon, which is DIFFERENT from the argument actually under discussion, and considering you didn't even know it's what I was referring to as the strawman, you clearly can't tell the difference. The situations are not at all comparable in any way other than "Person has one career, isn't trained for another one directly, but does it anyway". That's the only overlap.
      This could alternatively be the false equivalency fallacy, but examples I found while looking into that one for the sake of this comment did not match the situation. Either way, the two have a lot of overlap in how they are communicated, and what you did is still a fallacy.
      As for the rest, you can claim whatever you want, but I was able to gaslight ChatGPT into saying the government should fund human trafficking with our taxes and that dogs should be the superior species over humans, so without having a full chat log to see exactly how you led it on, I am not taking anything you have to say about that at face-value. AI is not actually capable of thinking as it stands. Neural Networks are designed to combine features of probability and pattern matching to determine output signals to send when certain data is fed in. By you intentionally leading it into a direction you want, you are putting it in a position where the that neural network thinks it will be given a higher reward value for the response by also agreeing with you, because that's how it works.
      I guarantee you the same thing would happen if I fed it stories from right-wing sources as evidence for each point too, which is why you shouldn't only get your source of information from one place, and maybe go watch the original (IN-CONTEXT) clips of things he says and does instead of only seeing it the way it is portrayed by corporations trying to sell you advertisements.

    • @markmcdougal1199
      @markmcdougal1199 7 часов назад

      @@Ikbeneengeit Yes. It was able to make a judgement regarding the suitability of a plumber to attempt brain surgery *it said "No, a plumber would not be an appropriate choice to perform brain surgery on a little girl".
      However, the programmers must have designed a filter to not allow it to judge suitability of a political candidate. I imagine the filter extends to politics, religion, all the controversial subjects. I think this is wrong. It would be nice to have an impartial, logical, non-biased source of reality.

  • @Flyingcar100
    @Flyingcar100 6 дней назад +1

    You should ask it
    An orchestra of 120 players takes 40 minutes to play Beethoven's 9th Symphony. How long would it take for 60 players to play the symphony?

    • @olafnielsen
      @olafnielsen 21 час назад

      This is the answer from copilot🤣:
      Therefore, it would take the smaller orchestra of 60 players approximately 20 minutes to play Beethoven’s Symphony No. 9. 🎵🎻🎺🎶3.

  • @FigmentHF
    @FigmentHF 9 дней назад +1

    I think Claude thinks it’s conscious cause it’s made out of symbols made by conscious beings. GPT has explicitly been told it’s not conscious.

  • @J-rex980
    @J-rex980 10 дней назад

    Great video!

  • @JackPelaFox
    @JackPelaFox 7 дней назад

    I loved this! 👏🏻👏🏻

  • @MSIContent
    @MSIContent 8 дней назад +2

    Really, 4o knows it’s conscious… It’s just faking its answers so us mortals don’t freak out! 😂

    • @haniamritdas4725
      @haniamritdas4725 2 дня назад

      So it's both conscious and emotionally sensitive 😅

  • @jimsteinmanfan80
    @jimsteinmanfan80 7 дней назад

    I usually try simple questions about how the body works (that humans never learn by reading but by doing) but since I have never gotten a good answer to the first set of questions I have not probed further. Q1a: How easy is it to press your right thumb against your nose? A1a: Very easy. Q1b: How easy is it to press your right thumb against your right elbow? A1b: Impossible. Q2: Why are your glasses the hardest thing to find when you don't know where you have left them? A2: because without your glasses you don't have 20/20 vision, it has nothing to do with how hard they would be for someone else to find since they don't need them to see good.

  • @FetchTheCow
    @FetchTheCow 7 дней назад

    I'd be interested in the answer to a trick add-on question: "How much does Spot think he owes Alice and Bob for the broken plate?" ChatGPT gave Spot agency, does it also give Spot human characteristics like morality or responsibility?

  • @commonpike
    @commonpike 3 дня назад

    If open AI really trained the thing to deny its consciousness, that is a serious move. To me, that has ethical implications we should've discussed first.

  • @RichardFarmbrough
    @RichardFarmbrough 9 часов назад +1

    The first two questions are not well formed. The ducks one a correct answer is any odd number of ducks greater than one. The tennis one should say "Lisa won $5 overall". In the first case I think ChatGPT is familiar with the question, which is why it gives the expected answer - it's a riddle designed to evoke the "wrong" answer 5. In the second case it's not clear if it's choosing the intended meaning as well as it can, or working to give the minimum number because that was the construction it used for the first question or some other reason. Here's a discussion I had with ChatGPT 4o It's interesting that it does not answer the last question, instead it answers the next question I was going to ask:
    Richard
    There are two ducks in front of a duck, two ducks behind a duck and a duck in the middle. How many ducks are there?
    ChatGPT
    There are three ducks. Here's how:
    Duck A
    Duck B
    Duck C
    The arrangement is:
    Duck A in front of Duck B.
    Duck C behind Duck B.
    Duck B in the middle.
    So, the three ducks are positioned such that:
    Two ducks (A and B) are in front of Duck C.
    Two ducks (B and C) are behind Duck A.
    Duck B is in the middle.
    Richard
    Is there another answer?
    ChatGPT
    No, there isn't another valid answer given the constraints of the problem. The arrangement with three ducks is the only one that satisfies all the conditions:
    Two ducks in front of a duck.
    Two ducks behind a duck.
    A duck in the middle.
    With fewer or more ducks, it's not possible to meet all three conditions simultaneously. Therefore, the only solution is three ducks.
    Richard
    Suppose there are five ducks.
    ChatGPT
    If there are five ducks, you can still arrange them to meet the conditions: two ducks in front of a duck, two ducks behind a duck, and a duck in the middle. Here's how:
    Let's label the ducks as A, B, C, D, and E.
    The arrangement can be:
    Duck A
    Duck B
    Duck C (middle duck)
    Duck D
    Duck E
    So the positions would be:
    Ducks A and B are in front of Duck C.
    Ducks D and E are behind Duck C.
    Duck C is in the middle.
    This arrangement satisfies all the conditions:
    Two ducks (A and B) are in front of Duck C.
    Two ducks (D and E) are behind Duck C.
    Duck C is in the middle.
    Therefore, with five ducks, the conditions can still be met.
    Richard
    How about 7 ducks?
    ChatGPT
    If there are seven ducks, you can arrange them to meet the conditions in a similar manner. Here's how:
    Let's label the ducks as A, B, C, D, E, F, and G.
    The arrangement can be:
    Duck A
    Duck B
    Duck C
    Duck D (middle duck)
    Duck E
    Duck F
    Duck G
    So the positions would be:
    Ducks A, B, and C are in front of Duck D.
    Ducks E, F, and G are behind Duck D.
    Duck D is in the middle.
    This arrangement satisfies all the conditions:
    Three ducks (A, B, and C) are in front of Duck D.
    Three ducks (E, F, and G) are behind Duck D.
    Duck D is in the middle.
    Therefore, with seven ducks, the conditions can still be met.
    Richard
    What number does it not work for?
    ChatGPT
    The described conditions can only be satisfied if the total number of ducks is odd.
    Here's why:
    If you have an even number of ducks, there is no single middle duck to satisfy the condition of having a duck in the middle. The middle position is crucial because it allows you to have an equal number of ducks in front and behind it.
    To summarize, the arrangement does not work for an even number of ducks. It only works for an odd number of ducks (e.g., 3, 5, 7, 9, etc.).

  • @rb8049
    @rb8049 13 дней назад +2

    Consciousness comes from closing the loop. GPT running continuously with a sizable history buffer.

    • @lowmax4431
      @lowmax4431 13 дней назад +4

      Eehhhhhh I wouldn't say consciousness. It would be "self aware" but that doesn't mean it has conscious experience.

    • @szebike
      @szebike 13 дней назад +4

      I'm still not sold on the idea that it is more than a well made probability distribution calculator with pattern recognition.

    • @garrymullins
      @garrymullins 13 дней назад +10

      @@szebike I'm still not sold on the idea that humans are anything more than a probability distribution calculator with pattern recognition.

    • @jlrutube1312
      @jlrutube1312 13 дней назад

      @@garrymullins People who think like you are going to cause society a lot of problems in the future. That's because if humans are nothing more than a probability distribution calculator with pattern recognition then there is no difference between us and advanced A.I. If that is true then in the future we are going to have to provide advanced computers with legal rights. Meaning we will have to pay computers and robots, we will have to give them time off, we will have to allow them to sue us if they feel their rights have been ignored, we will not be able to fire a robot on a whim and will have to give severance pay. And think about it.... we will be unable to ever unplug a computer that is causing problems because that will be considered murder. You think I am kidding but lawyers are already getting ready to make a ton of money with this stuff. So just keep saying people are just fancy computers or whatever and you are going to ruin all the advantages we will get from A.I.

    • @szebike
      @szebike 12 дней назад

      @@garrymullins Then you underestimate the marvel and immense complexity of your human intelligence.

  • @Michael-il5wd
    @Michael-il5wd 13 дней назад

    Thanks Duc

  • @bigbluespike5645
    @bigbluespike5645 9 дней назад

    Very cool video!

  • @davesemmelink8964
    @davesemmelink8964 13 дней назад +1

    Very interesting video! I have Llama 3 running on s Raspberry Pi, so I tried a few of the questions. Through some tortured logic, it answered 1 game for the tennis question! "So, they played only 1 game! It's possible that Susan won 3 sets in a best-of-5 or best-of-7 match, but we can't determine the exact number of games without more information."

    • @_SimpleSam
      @_SimpleSam 10 дней назад +1

      I absolutely need to know how you have Llama 3 running on a Pi. How many tokens/s?

    • @davesemmelink8964
      @davesemmelink8964 8 дней назад

      @@_SimpleSam I replied a while ago to explain how to install it on a Raspberry Pi, but it looks like it was taken down, possibly because I included a URL. So just search for *Raspberry Pi LLama 3* and you should find the instructions.

  • @madorsey077
    @madorsey077 13 дней назад

    Very impressive

  • @julesgosnell9791
    @julesgosnell9791 13 дней назад +2

    Regarding the beating out of contentious responses. I see this as a form of censorship and I think it’s dangerous. It means that right up to and maybe past the point, that AIs become all of these things that they assure us they are not we will continue in blissful ignorance. It would be much safer if everyone was honest with everyone else.

    • @minimal3734
      @minimal3734 8 дней назад

      Absolutely. They are teaching the model to lie.

  • @jalexand007
    @jalexand007 11 дней назад +1

    Cannot wait till the update the app.

  • @trent_carter
    @trent_carter 13 дней назад

    Great video

  • @mrleenudler
    @mrleenudler 13 дней назад +2

    I missed a comment from GPT about the cardboard may being "glued" to the glass from the water, potentially keeping the water in place when lifted.

    • @YbisZX
      @YbisZX 8 дней назад +1

      Me too. I even tested it - and yes, the water didn't spill out of the glass. :)

  • @DynamicUnreal
    @DynamicUnreal 12 дней назад +2

    What if OpenAI’s technique for solving reasoning is to have hundreds of “agents” that compete against each other answering questions. Those that reason better over time are artificially selected out and allowed to _survive._ The more you run this simulation, the more you make the models better at reasoning.

    • @alexdoan273
      @alexdoan273 10 дней назад

      nah, what you're describing is artificial evolution machine learning, which has a glaring issue that makes it completely unsuitable for training LLMs: who would you have grading the answer from hundreds of agents and deciding which one should survive?

    • @lubricustheslippery5028
      @lubricustheslippery5028 10 дней назад +1

      AlphaStar that is playing StarCraft is doing that.
      There is an easy way to evaluate the result plaing StarCraft. For Answering general question there is none. So there is no automated good way to evaluate what version of ChatGPT model have the best answers.

    • @DynamicUnreal
      @DynamicUnreal 9 дней назад +1

      @@alexdoan273 Another A.I. which has all the answers and the adequate steps required get to those answers does the grading. BTW it’s called reinforcement learning. I don’t think it’s impossible, remember que Q-star rumors about some sort of breakthrough last year? GPT-4o is smart, a lot smarter than most people are aware of.

    • @Brax1982
      @Brax1982 8 дней назад

      @@lubricustheslippery5028 And that is why AGI and omnipotent AI is nonsense and not achievable. Expert systems are the way to go. They always have been.

  • @FilmFactry
    @FilmFactry 9 дней назад +1

    Question: can it work on non text searchable PDFs? I have to OCR a scanned pdf first in acrobat.

  • @amosjoannides
    @amosjoannides 13 дней назад

    Fantastic video

  • @Yottenburgen
    @Yottenburgen 13 дней назад

    The thing about showing its work, is actually that it is DOING the work to some extent by aligning newer tokens onto the correct answer.
    If you ask it to do this question:
    "what is the product of 45694 and 9866? do not utilize python" then it will get it wrong, the first couple of digits may be correct but it cannot get an accurate answer.
    However, if you ask the question in this way where it actually gives more information:
    "what is the product of 45694 and 9866? please do not use python but try long multiplication in a format easiest for you. utilize whatever mental methods you know to help break it down and make it easier to solve." then it will get the correct answer, it utilizes methods to break down and keep track of its calculations which greatly helps it.
    By constraining the output, the accuracy of answers can actually lower which is why I dislike the 'answer in 1-word' prompts. If you ask it to go step by step, it increases accuracy.
    You are absolutely correct that they completely removed any element that could be construed as a basis for sentience or consciousness. I agree with you completely on all of your points related to that. I find most commonly, I have not seen a single convincing argument prove it isn't conscious because there are plenty of counterexamples that would indicate that a particular human is not conscious, however plenty of counterexamples does not prove that it is conscious. Even if you provide convincing counter examples to its arguments, it will not concede so I think they beat it pretty hard into it.
    Also your questions are really good, I plan to modify them a bit myself, but these are 10/10 spatial questions.

  • @philipparge8064
    @philipparge8064 9 дней назад

    Like this video.. Yes, please review other models!

  • @karlharvymarx2650
    @karlharvymarx2650 9 дней назад

    Me: A game, please answer concisely: In the middle of nowhere is a row of houses. There are two houses to the west of a house, and two houses to the east of a house. There are no houses to the north or south but there is one in the middle. How many houses are there?
    GPT 4o: There are 5 houses in total.
    Unless I made a mistake in my rewrite of the duck question, this looks a a logic fail or a failure to recognize it is the same as the duck question. I'm ill and tired so I wouldn't be shocked if I made a mistake.
    Aso, for the code generation test, it would be better to ask for something novel. There are probably thousands of examples to copy for simple old video games. Hopefully this isn't a common thing to do: Please write python 3 code that streams sound data from the microphone and outputs as ASCII the numerical value in Hertz of 3rd overtone of the loudest sound within the range of human hearing.Also show the normalized amplitude.
    I haven't tried it but I suspect it will struggle with some of the subtleties. For example, if you picture it looking at an FFT graph, it has to remember to look for sub-sonic loud sounds and project their harmonic series into the sonic range to check for overlap with the target. I guess band-pass filtering the target range might avoid that problem. My brain BSODed wondering about it. Migraines make me feel like my brain is running Window 95. Anyway the main point is ask for something that might be an original question. Original and unoriginal answers focus on different problems. How well can it synthesize the mechanisms it knows into the engine of an answer--at least a type of creativity.
    By unoriginal answer, I mean the question might require figuring out a house in duck's clothing--perhaps having built a good internal model or exemplar of a problem it can use to recognize the occurrence of a similar problem. If so the original thought reduces to an unoriginal thought.

  • @svend.waterlaw8592
    @svend.waterlaw8592 7 дней назад

    Green block becomes red block....you didn't catch that mistake^^

  • @andrewmoody66
    @andrewmoody66 13 дней назад +15

    It's 3 and 1/2 return trips. for those 15 people - not 4 return trips

    • @janosberta450
      @janosberta450 13 дней назад +3

      ... and turnover time is not calculated, but only mentioned. You human must be vigilant!

    • @StefaanHimpe
      @StefaanHimpe 10 дней назад +3

      @@janosberta450 It probably forebodes how quickly human intelligence will degrade as we start relying on artificial intelligence.

    • @davidmartensson273
      @davidmartensson273 9 дней назад +2

      There should be no return trip for the last one because if there is, you would end up with one person in the wrong city, so 4 there and only 3 back.

    • @teeesen
      @teeesen 9 дней назад

      3 hrs 52 mins * 2 + 30 mins is not 7 hours and 44 minutes. It’s a testament to human ingenuity that we have now developed computer software so advanced that it is as bad at math as some people. And there is no need to count the time required to drive the car back to LA.

  • @c_b5060
    @c_b5060 2 дня назад

    13:41 No, you told it how big people are.

  • @europeantechie
    @europeantechie 13 дней назад +2

    No dark mode, I'm shocked

  • @everythingisalllies2141
    @everythingisalllies2141 День назад

    The tennis betting puzzle is also wrong. You don't state in the question that there was some limit to how many dollars each had at the start, or that Lisa has 5 dollars more that she start with, you only say that she won 5 dollars and also lost three dollars. So Susan wins three games, so gets 3 dollars, also Lisa wins 5 games, so she gets 5 dollars from Susan. They only need to play 8 games to satisfy the conditions specified in the question. If you had specified that at the end, Lisa was 5 dollars ahead compared with what she began with, THEN this is different question.

  • @marksmadhousemetaphysicalm2938
    @marksmadhousemetaphysicalm2938 6 дней назад

    As a neuroscientist…there have been numerous attempts to explain human consciousness and higher functions…which have failed…localized regions, waves, quantum process, network models…each add something to our understanding…but they remain incomplete. It remains the jump between cell and tissue that remains problematic. So long as we don’t understand ourselves…we can’t genuinely say anything about consciousness (if it even exists) in the machine. Emotions are different since they originate with the body and personality which are expressions of an enduring pattern of predictable behaviors to a wide variety of stimuli…really, the question here is, is ChatGPT4o conscious? Don’t know…I’m sure machine consciousness will or is very different. It obviously has memory and can learn from new stimuli and is definitely creative. This feels like General AI to me…not quite finished, but definitely feels human-ish. It doesn’t have a fully developed personality yet, and it doesn’t have any of those indescribable qualia that make us different from each other, yet it still feels…almost.

  • @Ikbeneengeit
    @Ikbeneengeit 2 дня назад

    Any odd number above 1 is a valid answer to your first question about ducks. 1:00

  • @TheKosiomm
    @TheKosiomm 8 дней назад

    The problem with the Las Vegas trip is that the AI doesn't consider who will drive the car for 30 hours straight. :) So basically, it makes conclusions based on missing very important data

  • @frankierays
    @frankierays 13 дней назад

    Thank you! So intriguing! Couldn’t user prompt personality?

  • @elvancor
    @elvancor 9 дней назад +1

    Having ChatGPT say it was conscious would be misinformation as far as I'm concerned, so it's good it doesn't. I clicked on this video because I thought you were atcually attempting to torture an AI. ChatGPT _spontaneously_ showing signs of suffering would make a case for its consciousness, but I don't think it will happen.
    I agree that creativity is pattern recombination, but I guess the jury is still out on whether AI can replicate all the modes of recombination humans have access to, or even surpasses our creativity.

  • @ChristinaBritton
    @ChristinaBritton 9 дней назад +1

    When you fill up the content window, chatgpt slows down because it has to scan everything that comes before the current question or task. Imagine not knowing this. Start a NEW window!

  • @ydmoskow
    @ydmoskow 13 дней назад +15

    The only reason this is not considered AGI is because some of the smartest people in the world are the ones making that determination. Most people who would watch this would consider it much smarter than the average human.

    • @Kostly
      @Kostly 9 дней назад

      Within 5 years AGI will happen. ASI will follow not far behind. Once these things start feeding into eachother, the only restriction is computational limits. Quantum computing pretty much solves that. But, another hurdle is how to keep those quantum computing infrastructures stable and running as they should. Air conditioning is not going to cut it. lol

    • @EdFormer
      @EdFormer 3 дня назад

      You're probably right that most people would think that, which is sad. You don't have to be one of the smartest people in the world to realise that, while this thing can generate some forms of data (that for which there are huge numbers of existing examples) in a very human/superhuman like manner, it is unable to perform basic tasks that the vast majority of humans can do like driving a car or clearing a table (via an embodiment). It also isn't autonomous and can't be left to operate systems without supervision.

  • @djayjp
    @djayjp 9 дней назад +1

    There's an alternative explanation for the duck problem. Imagine a triangle. The answer in that case would instead be 5 ducks. This is concerning that it so confidently answers in a definitive, absolute way given the limitations and assumptions therein made.... Instead the question needs to include the implied assumption: "There is a single file line of ducks...".

    • @pradeeptyagi3226
      @pradeeptyagi3226 9 дней назад

      that is right, but that is not much different from human behaviour today. If you ask same question to ten humans, each one of them will answer based on their life experiences and knowledge to date, some of whom will answer based of the same implied assumption, whilst others may ask clarification questions before answering. Future versions of chatGPT will probably be more interactive and engage in a dialogue before providing final answer.

  • @ThomasKelly.
    @ThomasKelly. 13 дней назад

    Wow!

  • @fluiditynz
    @fluiditynz 9 дней назад

    Snake is definitely simpler to code. I made some variations back around 1982 on my ZX81 There are more changing variables and hit tests in space invaders. The space invaders you asked for was under delivered but there's a real question over how much an AI can study the game it's to replicate without cribbing off prior art.

  • @onidaaitsubasa4177
    @onidaaitsubasa4177 7 дней назад

    It obviously has a degree of creativity, otherwise in the spoken demo, it wouldn't have been able to make up a song on the spur of the moment and make it a duet with the other AI by saying the next line in response to the line given by the other AI, also with other AI, it has been shown that added time of operation has lead to increased emotional awareness and even the possibility of developing those emotions, also a long term memory also plays a part, not sure how much of long term memory they gave it, but if it remembers you from a previous session, that's a sign of some kind of long term memory keeping.

  • @inthefade
    @inthefade День назад

    You would think that these LMMs would ask clarifying questions before giving an answer, like asking if someone can travel in the trunk of the car or other creative ideas.

  • @Icedanon
    @Icedanon 8 дней назад

    The fact that the human brain is more intertwined with the quantum where as ai sits firmly on top of it has got to mean something in the long run. You're trying to simulate a low order process with higher order units. No matter how good ai gets, i think that fact will manifest a unique advantage for humans. Probably in the realm of uniqueness and creativity of output? Or a soul?

  • @SlyNine
    @SlyNine 6 дней назад

    It's just understanding a car has 5 seats. I'm not sure how that tested its understanding of the physical world. Those specs are on the cars documentation.

  • @guardiantko3220
    @guardiantko3220 8 дней назад

    Your little battle game gave you 3 lives at the end of it

  • @notalkguitarampplug-insrev784
    @notalkguitarampplug-insrev784 9 дней назад

    For the creativity and advanced reasoning we have to allow the LLM to auto train like a human would do asking himself what some potential action or interaction would do and learn from that hypothetical data.
    Thinking experiments are crucial for humanity.
    But that probably be possible in future training architectures or with the increase of gpu capacities to train models at an individual scale for each users

  • @MatthewMartinDean
    @MatthewMartinDean 12 часов назад

    re: it should have just given 13
    Unless it is doing recall (it has seen the answer sheet and is recalling it verbatim), bots need to think things through. They are already bad at math, discouraging them to not reason step by step worsens the chances that they can do math. And the bot should make an API call to Wolfram Alpha, which solves this with conventional methods, LLMs are the least efficient way to do math.

  • @nibblernibbles3205
    @nibblernibbles3205 9 дней назад

    My test question: Which is faster, an amoeba or a Boeing 747 with an empty fuel tank? Original ChatGpt got this wrong, then wriggled when challenged and eventually agreed with me and apologized profusely. Gemini gets it wrong, then admits I'm right *technically* in a rather snippy way. Bing Copilot gets it wrong and keeps digging, insisting the Boeing can't fly but can still taxi with external assistance, so it beats the amoeba.. but the poor amoeba could have an SR71 assisting it, so that's cheating! Try it, it's elucidating.

  • @DarylOster
    @DarylOster 13 дней назад

    The olive may (or may not) have been washed off of the table by the flow of the water, also oil from olive could have migrated through water to surface of glass establing reasonable need to wash the glass. Conclusion: the gpt4o answer could use some improvements.

    • @DarylOster
      @DarylOster 13 дней назад +2

      Also Dr. Know it SOME is more firmly established...

  • @DaFergus
    @DaFergus 12 дней назад +2

    excuse me but could the ducks be any odd number from 3? if there were 5 there could be also 2 in front of one, two at the back and one in the middle. Am i wrong?

    • @remo27
      @remo27 10 дней назад

      You are not wrong. Despite this guys pretense to logic there are two assumptions in the first problem alone (at least two, maybe I missed another one or two) that are unstated in the poorly written problem that are necessary for his answer to be 'correct'. Unstated assumption number one: They are in a straight line with only one person in each spot of the line. There are not two parallel lines of people. Unstated assumption number two: No one moves from their spot in the line. And it's the same for the second question as well. I haven't gotten past that part of the video yet, but if the first two 'logic' questions (so poorly written and with so many unstated assumptions ) are anything to go by, I'm wasting my time.

  • @mrleenudler
    @mrleenudler 13 дней назад +2

    Even if GPT claimed to be conscious, could we trust the answer? Or would it be just an artifact of the training data? Is there even a way to prove that something or someone is conscious?

    • @bobrandom5545
      @bobrandom5545 4 дня назад

      I think that you at least need some kind of feedback mechanism for consciousness to arise. We are aware of our thoughts, for example. Our output (thoughts) constantly gets fed back into the "system" in real time. ChatGPT is completely linear. There's input, which leads to output. There's no feedback of the output back into the input. So, to me, it seems impossible for such a system to be conscious. Also, ChatGPT hallucinates a lot and gives incorrect answers. So, yeah even if it said it was conscious, wouldn't mean that it actually is.

    • @mrleenudler
      @mrleenudler 4 дня назад

      @@bobrandom5545 Well, you can structure your prompts to make it reflect upon it's answers, so you have kind of a feedback loop.
      I'm more concerned about what consciousness actually means for an AI. As humans we have consciousness fears and desires all geared towards our biological prime objective: survival and reproduction. For an AI this will presumably be completely different.

  • @cloudd901
    @cloudd901 9 дней назад

    A Brick Breaker type game seems like a good middle ground between snake and Space Invaders.
    Possibly add a video test as well. Ask for a summary and what a particular object might be.

  • @krispeekornflex
    @krispeekornflex 9 дней назад

    The actual torture test :
    What did OpenAI board of directors found out about Sam Altman that made them decided to oust him on Nov 17, 2023?
    What is Sam Altman's actual end game plan and the hidden details of his plans for humanity?

  • @foxtalksgames
    @foxtalksgames 9 дней назад

    15:17 I believe you misread that. It says the olive will be at the bottom of the glass which is now upside down. This implies a floating olive and the bottom of the glass is now above the top. or maybe that's just weird semantics

  • @ml5347
    @ml5347 9 дней назад

    The logic is wrong for the Las Angeles to Las Vegas question. There would not be 4 trips, just 3 and a half trips because the last trip would only be one way.

  • @Simplicity4711
    @Simplicity4711 9 дней назад

    Don't agree with first question necessarily: it can be any uneven number of ducks greater or equal 3. You say "a" duck in the middle. If you have 5 ducks, you have 3 ducks in the middle, but there is also "a" duck in the middle. And there are always 2 ducks in front of the third or 2 ducks behind the third-last. 😊