ChatGPT O1 Preliminary test comparison with previous model test videos

Поделиться
HTML-код
  • Опубликовано: 24 сен 2024

Комментарии • 189

  • @InternetOfBugs
    @InternetOfBugs  4 дня назад +7

    iob.fyi/codecrafters will let you sign up to try CodeCrafters challenges yourself. If you're interested in seeing if you're smarter than an AI.

  • @Cephandrius016
    @Cephandrius016 3 дня назад +124

    I'm sure the hype of this model has nothing to do with OpenAI trying to fundraise $100 Billion right now

    • @cbnewham5633
      @cbnewham5633 3 дня назад +8

      Only a fool would join those dots... 😏

    • @realdevdiv
      @realdevdiv 3 дня назад +1

      Duhhh

    • @RomeTWguy
      @RomeTWguy 3 дня назад

      Phd lvl bruh

    • @TheReferrer72
      @TheReferrer72 3 дня назад +1

      It's not hype as the Channel owner now has to admit these LLM's are improving fast.
      Tech usually moves much slower than this.

    • @cbnewham5633
      @cbnewham5633 3 дня назад +4

      @@TheReferrer72 do you think claiming "PhD level" intelligence is not hype? It's clearly not at that level, despite what OpenAI may claim.

  • @artscollab
    @artscollab 3 дня назад +52

    I appreciate the grounded opinions shared on this channel. Particularly as someone who has been building applications for almost 30 years using traditional patterns while also adopting new techniques. Whew, where does the time go?

    • @ApexFunplayer
      @ApexFunplayer 3 дня назад +1

      I'm in the same boat here. I started as a kid and I'm still building them regularly. To me o1 preview is highly useful for certain things, and other things it tends to produce only substandard or completely unnecessary code solutions. Just like standard AI too it often rewrites things you don't want or includes things that shouldn't be there.
      It's okay with python and the rest of the languages I've tried haven't resulted very much.

  • @trappedcat3615
    @trappedcat3615 3 дня назад +32

    Much respect for adding chapter marks on a 5 minute video. You are amazing!

  • @hindsightcapital
    @hindsightcapital 3 дня назад +7

    I love this guy man, amazing counterbalance to otherwise overwhelming narratives. And exactly the right person to deliver this information

  • @danielraftery4550
    @danielraftery4550 3 дня назад +18

    Signed on as a member. Big fan of your stuff - only about 4 years of experience programming professionally, but I've been losing my mind at all the AI code bots. I am way faster without it - if that ever changed I'd be happy to use them, so it's nice to follow your progress in testing different models. Getting a more experienced perspective is appreciated too.

  • @tlz124
    @tlz124 3 дня назад +7

    I ask Chat GPT to do something and every time, it starts doing things I don't want it to and I start to lose my mind figuring out how to ask the right thing to make it do what I want

  • @MynamedidntFitDonkey
    @MynamedidntFitDonkey 3 дня назад +29

    I like how from bashing AI coding this channel has turned to AI coding benchmark channel.

    • @Tverse3
      @Tverse3 3 дня назад +5

      Soon he will be promoting to use AI

    • @2639theboss
      @2639theboss 3 дня назад +6

      I mean they're the same thing right now. Any basic benchmarking is "bashing" simply on the sheer hype these companies are pushing.

    • @johnsandro7735
      @johnsandro7735 3 дня назад +11

      @@Tverse3 The bashing was for the unnecessary hype these all was getting. But, if it proves actually useful to even very experienced programmers, then why not use them? A tool's a tool, if it fails to help, discard it and move on.

  • @goldsucc6068
    @goldsucc6068 3 дня назад +12

    I tested this model for real enterprise task (I even tried breaking task to small easy steps and removed some steps that require domain understandable) and it failed. But then I found real use for it - sample generation. It created some test sample soap requests for provided wsdl and structure was correct, so saved me some time. Better to extract samples from actual system indeed, but due to nature of project it was nearly impossible until other team finished their job

    • @RomeTWguy
      @RomeTWguy 3 дня назад +1

      You can achieve the same results with Sonnet 3.5 for a fraction of the cost

    • @goldsucc6068
      @goldsucc6068 3 дня назад

      @@RomeTWguy what do you mean? It cost me nothing, I have subscription. It saved me time because constructing soap xml by hand takes time and it did it in 50 seconds with supplied data

    • @TheReferrer72
      @TheReferrer72 3 дня назад

      @@goldsucc6068 Don't understand you could get GPT 3 you know before ChatGPT to that.

  • @rezNezami
    @rezNezami 3 дня назад +50

    The part about being as good as a Ph.D in writing a software piece, I think they likely are correct, just pay attention, Ph.D.'s even in CS are not known for writing proper software application!! haha

    • @Tverse3
      @Tverse3 3 дня назад +3

      I think software engineers are overestimating the value they provide, we could now have an Ai model just trained for every programming language , which would do better than most junior and mid level devs...this profession is doomed and about senior engineers they arr not special, with Ai the midlevel engineers will become seniors quickly.

    • @quantum5768
      @quantum5768 3 дня назад +27

      @@Tverse3 Citation needed, AI models need training data. There are tons of problems in industry where training data is sparse if it exists at all. The tests Internet of Bugs has been doing show that AI can't even handle a simple language like Python for relatively straight forward tasks. Why do you expect that AI models can beat junior devs for things like C in embedded software, or C# in a more application oriented field if it can't handle a batteries included language like Python?

    • @Easternromanfan
      @Easternromanfan 3 дня назад +18

      ​@@Tverse3Useless hype comment. These LLMs capabilities are vastly overestimated

    • @realurilordjonhnsoni7342
      @realurilordjonhnsoni7342 3 дня назад +12

      ​@@Tverse3Jesus Christ being so confidently wrong and shallow is a skill within itself

    • @carultch
      @carultch 3 дня назад

      @@realurilordjonhnsoni7342 Now that we have LLM's, it's trivially easy to be confidently wrong.

  • @DrMwenya
    @DrMwenya 3 дня назад +2

    Thank you for your honesty. Im not s tech person and i knew there was too much hype but very small changes for the average user

  • @szebike
    @szebike 3 дня назад +9

    Its a bit strange to me that they charge the user for tokens it takes for reasoning yet they don't show the reasoning part in detail. Its like they can add a x amount of extra tokens to your bill without being able to check it.

  • @cbnewham5633
    @cbnewham5633 3 дня назад +20

    Yes. You are correct. As i show on my channel, it cannot do simple geometry. It's a lot better than other models, but everyone is swallowing OpenAI's hype about "PhD level" and the cherry-picked examples some RUclipsrs have published for likes. It's nowhere near PhD in the maths sphere. More like high school. You are about the only other person I've seen on RUclips so far who is pointing out real deficiencies. Subscribed.

    • @drdca8263
      @drdca8263 3 дня назад +2

      Didn’t the famous mathematician Terry Tao say that it seemed to him like it was getting close to like, a mediocre grad student, or something like that?

    • @cbnewham5633
      @cbnewham5633 3 дня назад +1

      @@drdca8263 it's very good at some things - especially when it comes to language (which is, after all, the main part of the architecture). Until they can integrate mathematics into it flawlessly and remove hallucinations then these models will remain stuck at the "passable" level averaged across all types of queries. Maths is really important because without that you don't get the rest of the sciences - certainly not for doing anything serious.

    • @Nnm26
      @Nnm26 2 дня назад

      I passed it questions from the 2023 Putnam and it solved 80% of them. Idk why you even ask geometry questions to a model that can’t even see but do try it with other problems from other math domains.

    • @Happyduderawr
      @Happyduderawr 2 дня назад +3

      Its like a failing topology student in math. High school is a bit harsh. I ask it measure theory questions and it trips up. I reckon it might be a C's get degrees graduate :)

    • @cbnewham5633
      @cbnewham5633 2 дня назад

      @@Happyduderawr ok, a bit harsh perhaps, but in certain things it is a real brainiac while in others brain-impaired. If we take the average, it is muddling along with a checkered academic history, but won't be sweeping up the academic prizes any time soon. 😄

  • @Horizon-hj3yc
    @Horizon-hj3yc 3 дня назад +14

    Yep... overhyped... one hype train arriving after the other... that's why I lost interest in AI.... too much hype and not enough progress, still stuck in that old language model with all its flaws.

    • @arnavprakash7991
      @arnavprakash7991 3 дня назад +3

      So transformers were invented in 2017. At most, we have seen 8 years of work on these types neural networks, mostly niche work
      We did not see industry wide efforts until chatgpt 3.5, released November 30 2022. It has not even been 2 years
      All of these developments have been maxing out what transformers can do. So even without further breakthroughs in architecture this is enough to change society, it already has started to
      Also, this is not mentioning diffusion models

  • @namensklauer
    @namensklauer 3 дня назад +12

    honestly im looking forward to 1-2 years in the future, where i expect AI models on this level will be open source and no longer locked behind some paid service

    • @genericgorilla
      @genericgorilla День назад

      That's probably the most unlikely scenario

  • @SuperMarioTomma95
    @SuperMarioTomma95 3 дня назад +7

    It seems like they really have to hype these models to make people worry about job displacement and AI taking over most fields; otherwise, they wouldn’t be able to justify the billions invested in training and development needed to keep the improvements advancing at a competitive pace. Either this hype will turn into a self-fulfilling prophecy in the short to medium term, or the industry will hit a plateau as diminishing returns set in, leading to the AI bubble bursting. Ultimately, we’ll be left with advanced tools that, while highly capable, remain far from the true AGI people envision.

    • @arnavprakash7991
      @arnavprakash7991 3 дня назад +1

      @@SuperMarioTomma95 chatgpt is now the 13th most visited website globally. Right behind amazon. It’s the only site besides google, amazon, and yahoo in the top 13 that is not social media.
      So clearly hundreds of millions of people find it useful enough.

  • @roycohen.
    @roycohen. 3 дня назад +18

    It felt like a huge nothingburger. I really feel that we hit the wall of LLMs, not to mention that these models cannot inherently reason regardless as much as Altman wants to make his investors think it can.

    • @arnavprakash7991
      @arnavprakash7991 3 дня назад +5

      Altman is not the only one working on this. This is not some handcrafted product made by open AI.
      Its a discovery that when you give transformers compute and data they display emergent abilities.
      So anyone with compute and data can make these, and as we are seeing they are

    • @lyznav9439
      @lyznav9439 3 дня назад

      @@arnavprakash7991 emergent stupidity

    • @RomeTWguy
      @RomeTWguy 3 дня назад

      ​@@arnavprakash7991 more compute at inference time anyone can, but they must have also fine tuned it on cot datasets to simulate reasoning

    • @arnavprakash7991
      @arnavprakash7991 3 дня назад

      @@RomeTWguy yeah but now we have multiple solid LLMs (claude, llama, gemini… I guess)
      What Open AI did is replicatable, llama 3 papers prove it.
      The next generation of LLMs now have 3 avenues for enhancement:
      Training (model learning off data)
      Hardware
      Inference (model thinking/processing of inputs)

    • @Nnm26
      @Nnm26 2 дня назад

      @@arnavprakash7991yeah hundred of billions of dollars compute

  • @leversofpower
    @leversofpower 3 дня назад +1

    I was looking forward to your cometary. Thanks.

  • @Gredias
    @Gredias 3 дня назад +8

    From Open AI's own article:
    "These results do not imply that o1 is more capable than a PhD in all respects - only that the model is more proficient in solving some problems that a PhD would be expected to solve."

    • @Cephandrius016
      @Cephandrius016 3 дня назад +14

      Translates to:
      We hyper-trained the model on a subset of questions and now it can solve those problems “better” than some Ph.D.

    • @califresh0807
      @califresh0807 3 дня назад

      @@Cephandrius016People like you and him think your smart saying blatantly obvious shit like:
      “iTs jUSt sToCHaStiC graDiEnT dEsCeNT!!! hOw cOUlD it eVeR (bullshit false equivalence here)”
      As if the researchers working on this shit, day in and day out don’t know everything you do and more.
      These models have gotten consistently better given more compute, that is just a fact.
      You can only argue the degree.
      But I bet fucking anything that in 2-3 years time, this guy will be doing this video with an increasingly more complex problem that he continues to downplays as “I classify this as easy to very easy”.
      People like you and this guy think you’re clever pointing out obvious fucking flaws whilst completely overseeing the broader direction, and none of you…this guy in the video especially could ever fucking hope to build anything remotely as useful as these models.

    • @oldchris5258
      @oldchris5258 3 дня назад

      What an extremely scientific way for them to measure their product's capabilities.

    • @jshowao
      @jshowao 3 дня назад +3

      Yet a PhD student is expected to solve or make progress on a original problem through original research so Im not sure what that statement even means.
      I mean ask the AI to solve one of the millennium problems or quantum gravity. I doubt it could.

  • @ladsbois7302
    @ladsbois7302 3 дня назад +1

    Thank you for your brief thoughts.

  • @Happyduderawr
    @Happyduderawr 2 дня назад +2

    But im a phd and im shit at writing code.

  • @bfranceschin1
    @bfranceschin1 3 дня назад

    Thanks for the update! The o1-preview is available in the supermavem vscode extension

  • @RandyRanderson404
    @RandyRanderson404 3 дня назад

    I was looking forward to your assessment of Oi1.

  • @하하호호-h3u
    @하하호호-h3u 3 дня назад +1

    Limitations of o1-preview:
    1. The limitations of being a language model are still evident. Its perception of the physical world is very poor, making it difficult to utilize for tasks requiring spatial awareness.
    2. While it actively uses the Chain of Thoughts technique, which significantly improves accuracy on tasks where there is a clear logical answer, this simultaneously makes its thinking process rigid. As a result, it performs worse than GPT-4 in areas where subjective nuances and no clear answers are involved, such as writing. In contrast, traditional language models like GPT-4 may have a higher occurrence of hallucinations, but this also makes them more adept at generating plausible responses, which ultimately aids in tasks like creative writing.
    Therefore, o1 is not a one-size-fits-all solution, and it seems necessary to first determine whether to use the Chain of Thoughts technique or the traditional language model approach based on the given task before proceeding with the process. Furthermore, o1 is merely another language model and not a fundamental leap forward; it's simply a specialization of the existing method. Due to the inherent limitations of language models, which learn about the world through language, achieving AGI (Artificial General Intelligence) is still a distant goal.

  • @EleroyGreen
    @EleroyGreen 3 дня назад +1

    I like how your facial expression on the video thumbnail provides the tl:dr on this 😀

  • @Nnm26
    @Nnm26 2 дня назад +1

    LLM capabilities are a bit weird, you can’t based its intelligence on a few questions and declare it’s better/worse than a human. It’s subpar to a Phd student in a lot of domain but in others it’s nothing short of superhuman. It’d be great if you could check out Kyle Kabasares channel on different tests he conducted on O1, there he actually uses PhD level questions and it blew all of them out of the water.

  • @Michael-yu9ix
    @Michael-yu9ix 3 дня назад +2

    The camera angle... If hes not moving his hands, it looks like a recording from a locked-in patient.

  • @DoubleOhSilver
    @DoubleOhSilver 3 дня назад +7

    AI definitely isn't taking my job anytime soon, but it has been helping me a lot at work lately. If I already know what I need to do, I can tell AI to write it for me. Then I just fix it up a bit, rename stuff, clean it up, etc. But it has probably saved me a couple of hours at work this week.

    • @DoubleOhSilver
      @DoubleOhSilver 3 дня назад +6

      Anyway my pay hasn't gone up so I'm just taking those extra hours I gain off from work.

    • @JP-ek3mc
      @JP-ek3mc 3 дня назад

      @@DoubleOhSilver This is the way

    • @jshowao
      @jshowao 3 дня назад +2

      I've seen AI produce written prose and it produces a lot of repetitive slop.
      Good to know you are happy with sentences starting with the same words over and over again.
      From what I've seen, you'd have to rewrite the whole thing.

  • @theaugur1373
    @theaugur1373 3 дня назад +1

    o1 is available in Cursor, but it’s no included with the monthly fee. You have to pay separately.

  • @riser9644
    @riser9644 3 дня назад

    Fact that he's saying it's better, now that's some progress

  • @samuelyao2637
    @samuelyao2637 3 дня назад

    Thank you so much!!

  • @wonseoklee80
    @wonseoklee80 12 часов назад

    Of course, human PhD with GPTO1 + google beats vanilla GPTO1.

  • @mitchlindgren
    @mitchlindgren 3 дня назад +1

    To be fair, I’ve seen some PhDs who write pretty bad code 😂

    • @marcovoetberg6618
      @marcovoetberg6618 2 дня назад

      I don't even know what it means to program at a PhD level. I'm not saying there are no PhD's that are good programmers, but there is nothing about being a PhD that makes someone a good programmer.

  • @cuentadeyoutube5903
    @cuentadeyoutube5903 3 дня назад +2

    4:07 o1 is already integrated in Cursor. But it is expensive

    • @artscollab
      @artscollab 3 дня назад +1

      Interesting. Fixed monthly price or per token cost? I like Cursor so far.

  • @_RobertOnline_
    @_RobertOnline_ 23 часа назад

    Keeping it real 👍

  • @HelloCorbra
    @HelloCorbra 2 дня назад

    It’s already integrated into Cursor or other AI IDEs I’m not mistaken. You have a look at it for next video. And looking forward to it, good stuffs as usual

  • @CherryBlossomStorm
    @CherryBlossomStorm День назад

    ok but every PHD I've worked with has been garbage at writing code.

  • @ancwhor
    @ancwhor 3 дня назад +14

    10x the compute for 0.2% improvement imo

    • @generichuman_
      @generichuman_ 3 дня назад +7

      if you think we got 0.2% improvement, then your opinion clearly isn't worth that much

    • @ancwhor
      @ancwhor 3 дня назад +11

      @@generichuman_ if you think it's more then your opinion clearly isn't worth that much

    • @justafreak15able
      @justafreak15able 3 дня назад

      ​@@generichuman_What was the most complex system you both worked on without AIs then compare who's opinion is worth more lol.

    • @ancwhor
      @ancwhor 3 дня назад

      @@justafreak15able express API backend linked to python for an algo to manage distribution. Vue frontend. Self thought. In prod.

    • @JohnDoe-jp4em
      @JohnDoe-jp4em 3 дня назад

      ​@@ancwhorThis channels viewers are like the inverse of an AI techbro sometimes. Instead of insane hype it's constant insane lowballing. Did you even listen to the video?
      If someone with a track-record of being skeptical of AI admits that it's significantly better at coding tasks and beats all other AIs he's tested, it's clearly a lot more than 0.2%.

  • @dabbieyt-xv9jd
    @dabbieyt-xv9jd 2 дня назад

    I don't understand why you directly made the video on o1 and skipped the project strawberry?

  • @EnigmaCodeCrusher
    @EnigmaCodeCrusher 2 дня назад

    Thansk

  • @karanmungra5630
    @karanmungra5630 День назад

    Doing God's work

  • @altffyra2365
    @altffyra2365 3 дня назад +2

    try to make it give you "hello world" from Bend Language, i gave 4o 8 attempts then i gave it the code and it actually got that wrong aswell

    • @estefencosta1835
      @estefencosta1835 3 дня назад +1

      That's actually one thing I'm unclear about in these videos, none of these models are going to give you the same response every time. If you just run it once it doesn't really tell you much. I'd rather understand not just that it fails, but how badly it fails each time if you run it 10 or 100 times. Is it 10% sort of ok, 40% pretty bad and 50% terrible?

  • @vintagewander
    @vintagewander 2 дня назад +1

    isn't the entire AI thing is running on hype fuel?

  • @Titere05
    @Titere05 23 часа назад

    We're 1 year into 2 months away from AI taking over your job

  • @doesthingswithcomputers
    @doesthingswithcomputers 3 дня назад +1

    I’ve worked with phds, using that comparison is a really bad idea…

  • @spencerjames9417
    @spencerjames9417 2 дня назад

    Altman is a bit evil considering how much he’s willing to throw lives under the bus for his toy that doesn’t do near what he claims

    • @eye776
      @eye776 День назад

      It's all about the money, money, money.

  • @leojack1225
    @leojack1225 2 дня назад

    I am a Math PhD and I can not write any software.

  • @krasensspenevpenev3167
    @krasensspenevpenev3167 2 дня назад

    Now I'm doing a masters degree in software architecture for 2 years, because my bachelor's degree is something else, with these new models coming out, I'm quite stressed that I'll have a future as a programmer. do you think it is worth studying something that is not directly related to artificial intelligence, for example, there was a specialty in software technologies with artificial intelligence, this is what I mean as directly related to artificial intelligence.

  • @VoodooD0g
    @VoodooD0g 2 дня назад

    It was integrated into cursor on day 1....

  • @tear728
    @tear728 3 дня назад +9

    These things will not be autonomous. At best they will be a "living" stackoverflow.

    • @personzorz
      @personzorz 3 дня назад +9

      And what happens to the real stack overflow that they are parasitic upon to function?

    • @tear728
      @tear728 3 дня назад

      @@personzorz it remains as relevant as ever

    • @Elintasokas
      @Elintasokas 3 дня назад +1

      Alright, see you in a couple of years.

    • @arnavprakash7991
      @arnavprakash7991 3 дня назад

      @@tear728 do you actually use any of these LLMs? Have you used the most recent models?
      Or are you just making statements to make yourself feel better

    • @drdca8263
      @drdca8263 3 дня назад

      What do you mean by “autonomous”? Do you mean like, “takes actions to earn enough money to pay for its continued server costs”, or do you just mean, “takes actions as if to accomplish some kind of goal”?
      If the latter: people have already kinda set up harnesses that do this?

  • @young9534
    @young9534 3 дня назад +8

    This is still o1 preview, not o1. If the benchmark results they released aren't lying, then o1 should be a nice jump in capabilities. I look forward to seeing your test videos when o1 is released

    • @personzorz
      @personzorz 3 дня назад +8

      Several previous benchmarks have been lies.

    • @young9534
      @young9534 3 дня назад

      @@personzorz are you talking about the o1 results they released?

    • @Easternromanfan
      @Easternromanfan 3 дня назад +5

      ​@young9534 He might be referring to the chatgpt 4 benchmarks where they said it could pass the bar in the 95th percentile but they used a very faulty way to measure it. IOB mentioned it previously. They also do the same thing here when they say it is a gold medalist in the math Olympia with "adjusted time restrictions". They just don't mention if they took it by the actual rules it would've failed first question

    • @young9534
      @young9534 3 дня назад +4

      @@Easternromanfan yeah that makes sense. This is why I look forward to seeing this channel run tests on o1 when it gets released. I trust him more than OpenAI

    • @RomeTWguy
      @RomeTWguy 3 дня назад

      The actual model isn't far off from this based on the benchmarks

  • @martinsherry
    @martinsherry 3 дня назад +3

    I’m not really familiar with o1 at all at this stage. But is it possible to create a prompt to get o1 to ask you questions to get the details it needs to understand your requirements better. (ie to simulate the task of gathering reqs better).

    • @artscollab
      @artscollab 3 дня назад +1

      It’s well worth a try as a supplement to human effort, in my opinion. The new o1 model is not available yet for OpenAI assistants, however the 4o model does well enough for now.

  • @absta1995
    @absta1995 3 дня назад +3

    My prediction: even when AGI is achieved, this channel will call it overhyped

  • @LouStoriale
    @LouStoriale 3 дня назад

    I've been working with AI for content creation and research for a few months now, and while there are still some flaws, the improvement has been significant. It's gone from 30-40% accurate to 60-80%, and even though I still need to edit most of the output, it’s saving me a ton of time. In just the last 5 days, it’s cut down weeks of work! If it keeps progressing like this, it’ll be incredibly useful by the end of 2025.

  • @defnlife1683
    @defnlife1683 3 дня назад

    Doesn't really understand the reqs? Sounds like it can substitute a scrum manager or boss. Not a dev.

  • @DeniSaputta
    @DeniSaputta 2 дня назад +1

    2:44 sam altman Saying what is easy and difficult for human . is a different for ai.

  • @saxtant
    @saxtant 3 дня назад

    Http server in a prompt? Use express or fastapi, or even better... Go.

  • @NukelimerCodes
    @NukelimerCodes 3 дня назад

    Is it worth it to go back to uni for a CSC degree (Open Source Society University)?

    • @InternetOfBugs
      @InternetOfBugs  3 дня назад

      It depends on what you're trying to accomplish. I did a discussion about university degrees on a podcast here: ruclips.net/video/f9bO9aTXog0/видео.html (Although I have never heard of Open Source Society University - so I don't know anything about it)

  • @karlwest437
    @karlwest437 3 дня назад

    If models hallucinate, then surely telling them to think step by step just gives them more opportunities to hallucinate?

    • @drdca8263
      @drdca8263 3 дня назад

      Logic gates implemented in silicon sometimes have errors. It is possible to use a larger collection of logic gates implemented in silicon in order to make something which can detect and correct these errors.
      (For classical computing, the errors are, AIUI, more likely to occur in memory than during the computation, and so most of the hardware error correction is for correcting errors in data which is being stored, but the same thing applies to a lesser extent for errors that happen as part of the computation. As a side note: for quantum computers, the errors happening during the computation steps is a bigger issue and needs more attention than it needs in classical computing.)
      It is true that more steps does mean more opportunities for errors, but that doesn’t necessarily imply that each step on net increases the probability of an error.

    • @estefencosta1835
      @estefencosta1835 3 дня назад +1

      It does if hallucinations are just a fancy marketing term for fuckups that happen when not picking the most likely result from a dataset in an attempt to mimic creativity and the program based on computational linguistics doesn't know when it needs to not do that.

    • @drdca8263
      @drdca8263 3 дня назад

      @@estefencosta1835 Hm, seems like a bit of a run-on sentence, but I suppose it would be hypocritical of me to complain too much about that…
      Yes, “hallucinations” is just a term people use for errors. Maybe slightly more specific?
      Your specific explanation for these errors, seems a bit unclear to me?

    • @estefencosta1835
      @estefencosta1835 3 дня назад +1

      ​@@drdca8263 If GenAI just chose the most likely response from a set of data then every time you queried it, it would give back the same response. The reason ChatGPT gives the illusion of an intelligent response is because of what can be conceptualized as a probabilistic responses, which is why it builds it's responses bit by bit. If it's given too much latitude, it's answers quickly lose coherency. If given too little, it doesn't present anything that seems novel and loses any potential capacity to solve tasks. But there isn't a sweet spot where it won't just sometimes give you things that are either nonsensical or just flat out wrong.
      Hallucinations is a term we use for disturbances in sensory experience for humans, but was co-opted for GenAI back when they were trying to con people into believing these algorithms are sentient (see the Sparks of Life paper.) By using the term hallucination it's implying that GenAI is simply misrepresenting something. This is not accurate. GenAI simply runs on algorithm and tries to parse meaningful strings from it's training data with computational linguistics as the backbone of how it goes about trying to decide what is or isn't meaningful.
      The more complex the task, the less accurate or interesting it gets. It's like giving an algorithm a set of Legos, and asking it to build something new. But since the algorithm only knows what previous sets of Legos looked like (and most of which isn't even relevant to what it's being asked to build) the best it can do is try and mash together bits of other sets based on probability, but not always the most probable pieces, otherwise it would do it wrong the same way every time. But it can't actually build it's own new Lego set from the ground up, and it also has no way of verifying if the Lego set it ends up building is even correct or satisfies the query. This is why programs like ChatGPT can be confidently incorrect.
      What I'm less familiar with but it sounds like they're trying to do is use non-GenAI methods to verify if a piece of code is actually viable in order to try and correct it before it spits out the code. Even if this were perfected which I very much doubt, it gets you no closer to writing code that actually does what you want it to do, it simply eliminates some of the more obvious and elementary errors.

    • @karlwest437
      @karlwest437 3 дня назад

      @@drdca8263 my point is, error correction doesn't work, if the error correction system itself hallucinates

  • @theneedytechie2468
    @theneedytechie2468 3 дня назад

    There is no pleasing this guy😂.

    • @Protocultor
      @Protocultor День назад

      If they're selling you something, and it doesn't achieve what they say it achieves, then no one should be pleased.

  • @andreas_tech
    @andreas_tech 3 дня назад +1

    Are you the future version of David Shapiro?

    • @cbnewham5633
      @cbnewham5633 3 дня назад +3

      Please no. David is all over the place - the new Bindu Reddy.

  • @4l3dx
    @4l3dx 3 дня назад +5

    It's impossible that you tested the o1; we only have access to the o1-preview version

    • @Easternromanfan
      @Easternromanfan 3 дня назад +3

      That's what he means

    • @jshowao
      @jshowao 3 дня назад +4

      As if o1 preview and o1 are significantly different. Come on. Gmail was in beta for like a thousand years, when they released it finally, it was the same damn product.

  • @rickandelon9374
    @rickandelon9374 3 дня назад

    Agentic, self improving and self aware AI is going to change the economy, not these fancy demo products.

  • @darylallen2485
    @darylallen2485 3 дня назад +1

    I knew videos of this nature would come. I think its valid to point out when ai fails. However, when are people going to acknowledge what seems obvious to me, but seems to get lost in these, "I asked ai to do X and if failed" videos?
    Is the new ai model better than the previous version? Was the previous version better than the one before that? And before that, was the version released better than what came before? If yes, what is hype about this ai thing? Every version is better for years now. Billions of dollars are being invested to figure out to continue the obvious trend that a 5 year old could point out. Yet we still have the, " this is all hype and bs, ai is completely fake" crowd. I don't get how anyone could still believe this ai trend is hype.

    • @InternetOfBugs
      @InternetOfBugs  3 дня назад +5

      I tried to point out in this video that O1 is better than all the other models I've tried on the tests I've been using (and I'll be coming up with more tests).
      There is definitely a trend of it getting better, but it seems (to me, and to other people) that the rate at which it's getting better is slowing, and that the value of this cycle of AI improvement is going to be worth far less than what has already been invested in it. But I could be wrong - we'll see.

    • @darylallen2485
      @darylallen2485 3 дня назад

      @@InternetOfBugs I agree that LLMs are a terrible product in the sense that the cost of inputs are significantly higher than the value of output.

    • @reboundmultimedia
      @reboundmultimedia 2 дня назад

      ​@@InternetOfBugso1 preview is significantly worse in many areas than final o1. The leap in math ELO scores o1 was able to achieve is not at all consistent with 'slowing down.'

    • @diadetediotedio6918
      @diadetediotedio6918 2 дня назад

      @@reboundmultimedia
      The problem is that your measure is a bunch of specific tests, we measure the value of intelligence by the things it can bring to the world, not by blindly taking tests and measuring them.

    • @darylallen2485
      @darylallen2485 День назад

      @@diadetediotedio6918 I'm curious what industry you work in. I work as an datacenter infrastructure engineer. I got to this role through my degree and multiple certifications (e.g. lots of blind test and measuring them). Preparing for the tests taught me skills that the labor market values.
      Please elaborate on what industry we have that doesn't discriminate based on ability, as determined by testing. Would you see a doctor who flunked out of medical school, but had a real passion for helping people?

  • @austinclay427
    @austinclay427 3 дня назад

    I use both claude and chatgpt on a daily basis but it has serious limitations and constantly makes mistakes that I need to point out.
    Aside from that it's very useful and can often times point me to new technology's and solutions.
    Aside from that I'd say the tech has been relatively the same since chatgpt 3.0.

  • @alonzoperez2470
    @alonzoperez2470 3 дня назад +1

    It will replace programmers eventually 😌

  • @GordonFreeman-xd8rw
    @GordonFreeman-xd8rw 3 дня назад

    I like watching this channel because it's like watching the wise man or the witch doctor of a cannibal tribe when they first encountered an airplane... b-b-but why would the Sun-God Bogosun give pale face such magic? I'm willing to bet that the goalposts will shift by EO2025 to "but it can't give you youtube clone from a single prompt, still meh"

  • @jshowao
    @jshowao 3 дня назад

    God these AIs just suck

  • @Tverse3
    @Tverse3 3 дня назад +5

    I love programmers freaking out after every new gpt release, looks like they will face the same fate as artists. 😮

    • @larsfaye292
      @larsfaye292 3 дня назад

      only the shit ones

    • @arnavprakash7991
      @arnavprakash7991 3 дня назад

      So programming languages are human readable formats to interact with a computer
      Do people think AI = computer stuff only?
      Any language based/knowledge based job can be automated as LLMs excel at this
      All white collar work is at risk

    • @cbnewham5633
      @cbnewham5633 3 дня назад +4

      AI is a tool. Sensible artists will incorporate it. The rest are Luddites with barely a grasp of how AI works. Programmers too will use this as a tool - but software engineering requires far more than writing a bunch of code.

    • @darkspace5762
      @darkspace5762 3 дня назад +1

      Why do you love it? What job do you do?

    • @jshowao
      @jshowao 3 дня назад

      ​​@@cbnewham5633Exactly, most people who think AI will "replace everything" have never actually done the things they claim AI will replace. Because if they actually tried to do that, they'd realize, real quick, that LLM's leave a lot to be desired.
      Ive only ever used it as a tool to supplement my work, and this is only after Ive double checked the code it generates.

  • @slindenau
    @slindenau 20 часов назад

    1:18 You asked for an input number between 1^64....and in the response/code it assumed 2^64. "Small" difference ;).

  • @Rh22-c9l
    @Rh22-c9l 3 дня назад

    Just wait for chat gpt 5 thats the big thing moving forward thats the inflexion point

    • @drdca8263
      @drdca8263 3 дня назад

      Is “inflection” vs “inflexion” a dialect/regional-spelling-differences thing, or just a “you personally spell it differently” thing?
      It reminds me of how some old letters about math were written

    • @Rh22-c9l
      @Rh22-c9l 3 дня назад

      @@drdca8263 made a mistake too lazy to fix it to be honest

    • @Rh22-c9l
      @Rh22-c9l 3 дня назад

      Made a mistake too lazy to fix it ​@@drdca8263