I appreciate the grounded opinions shared on this channel. Particularly as someone who has been building applications for almost 30 years using traditional patterns while also adopting new techniques. Whew, where does the time go?
I'm in the same boat here. I started as a kid and I'm still building them regularly. To me o1 preview is highly useful for certain things, and other things it tends to produce only substandard or completely unnecessary code solutions. Just like standard AI too it often rewrites things you don't want or includes things that shouldn't be there. It's okay with python and the rest of the languages I've tried haven't resulted very much.
Signed on as a member. Big fan of your stuff - only about 4 years of experience programming professionally, but I've been losing my mind at all the AI code bots. I am way faster without it - if that ever changed I'd be happy to use them, so it's nice to follow your progress in testing different models. Getting a more experienced perspective is appreciated too.
I ask Chat GPT to do something and every time, it starts doing things I don't want it to and I start to lose my mind figuring out how to ask the right thing to make it do what I want
@@Tverse3 The bashing was for the unnecessary hype these all was getting. But, if it proves actually useful to even very experienced programmers, then why not use them? A tool's a tool, if it fails to help, discard it and move on.
I tested this model for real enterprise task (I even tried breaking task to small easy steps and removed some steps that require domain understandable) and it failed. But then I found real use for it - sample generation. It created some test sample soap requests for provided wsdl and structure was correct, so saved me some time. Better to extract samples from actual system indeed, but due to nature of project it was nearly impossible until other team finished their job
@@RomeTWguy what do you mean? It cost me nothing, I have subscription. It saved me time because constructing soap xml by hand takes time and it did it in 50 seconds with supplied data
The part about being as good as a Ph.D in writing a software piece, I think they likely are correct, just pay attention, Ph.D.'s even in CS are not known for writing proper software application!! haha
I think software engineers are overestimating the value they provide, we could now have an Ai model just trained for every programming language , which would do better than most junior and mid level devs...this profession is doomed and about senior engineers they arr not special, with Ai the midlevel engineers will become seniors quickly.
@@Tverse3 Citation needed, AI models need training data. There are tons of problems in industry where training data is sparse if it exists at all. The tests Internet of Bugs has been doing show that AI can't even handle a simple language like Python for relatively straight forward tasks. Why do you expect that AI models can beat junior devs for things like C in embedded software, or C# in a more application oriented field if it can't handle a batteries included language like Python?
Its a bit strange to me that they charge the user for tokens it takes for reasoning yet they don't show the reasoning part in detail. Its like they can add a x amount of extra tokens to your bill without being able to check it.
Yes. You are correct. As i show on my channel, it cannot do simple geometry. It's a lot better than other models, but everyone is swallowing OpenAI's hype about "PhD level" and the cherry-picked examples some RUclipsrs have published for likes. It's nowhere near PhD in the maths sphere. More like high school. You are about the only other person I've seen on RUclips so far who is pointing out real deficiencies. Subscribed.
Didn’t the famous mathematician Terry Tao say that it seemed to him like it was getting close to like, a mediocre grad student, or something like that?
@@drdca8263 it's very good at some things - especially when it comes to language (which is, after all, the main part of the architecture). Until they can integrate mathematics into it flawlessly and remove hallucinations then these models will remain stuck at the "passable" level averaged across all types of queries. Maths is really important because without that you don't get the rest of the sciences - certainly not for doing anything serious.
I passed it questions from the 2023 Putnam and it solved 80% of them. Idk why you even ask geometry questions to a model that can’t even see but do try it with other problems from other math domains.
Its like a failing topology student in math. High school is a bit harsh. I ask it measure theory questions and it trips up. I reckon it might be a C's get degrees graduate :)
@@Happyduderawr ok, a bit harsh perhaps, but in certain things it is a real brainiac while in others brain-impaired. If we take the average, it is muddling along with a checkered academic history, but won't be sweeping up the academic prizes any time soon. 😄
Yep... overhyped... one hype train arriving after the other... that's why I lost interest in AI.... too much hype and not enough progress, still stuck in that old language model with all its flaws.
So transformers were invented in 2017. At most, we have seen 8 years of work on these types neural networks, mostly niche work We did not see industry wide efforts until chatgpt 3.5, released November 30 2022. It has not even been 2 years All of these developments have been maxing out what transformers can do. So even without further breakthroughs in architecture this is enough to change society, it already has started to Also, this is not mentioning diffusion models
honestly im looking forward to 1-2 years in the future, where i expect AI models on this level will be open source and no longer locked behind some paid service
It seems like they really have to hype these models to make people worry about job displacement and AI taking over most fields; otherwise, they wouldn’t be able to justify the billions invested in training and development needed to keep the improvements advancing at a competitive pace. Either this hype will turn into a self-fulfilling prophecy in the short to medium term, or the industry will hit a plateau as diminishing returns set in, leading to the AI bubble bursting. Ultimately, we’ll be left with advanced tools that, while highly capable, remain far from the true AGI people envision.
@@SuperMarioTomma95 chatgpt is now the 13th most visited website globally. Right behind amazon. It’s the only site besides google, amazon, and yahoo in the top 13 that is not social media. So clearly hundreds of millions of people find it useful enough.
It felt like a huge nothingburger. I really feel that we hit the wall of LLMs, not to mention that these models cannot inherently reason regardless as much as Altman wants to make his investors think it can.
Altman is not the only one working on this. This is not some handcrafted product made by open AI. Its a discovery that when you give transformers compute and data they display emergent abilities. So anyone with compute and data can make these, and as we are seeing they are
@@RomeTWguy yeah but now we have multiple solid LLMs (claude, llama, gemini… I guess) What Open AI did is replicatable, llama 3 papers prove it. The next generation of LLMs now have 3 avenues for enhancement: Training (model learning off data) Hardware Inference (model thinking/processing of inputs)
From Open AI's own article: "These results do not imply that o1 is more capable than a PhD in all respects - only that the model is more proficient in solving some problems that a PhD would be expected to solve."
@@Cephandrius016People like you and him think your smart saying blatantly obvious shit like: “iTs jUSt sToCHaStiC graDiEnT dEsCeNT!!! hOw cOUlD it eVeR (bullshit false equivalence here)” As if the researchers working on this shit, day in and day out don’t know everything you do and more. These models have gotten consistently better given more compute, that is just a fact. You can only argue the degree. But I bet fucking anything that in 2-3 years time, this guy will be doing this video with an increasingly more complex problem that he continues to downplays as “I classify this as easy to very easy”. People like you and this guy think you’re clever pointing out obvious fucking flaws whilst completely overseeing the broader direction, and none of you…this guy in the video especially could ever fucking hope to build anything remotely as useful as these models.
Yet a PhD student is expected to solve or make progress on a original problem through original research so Im not sure what that statement even means. I mean ask the AI to solve one of the millennium problems or quantum gravity. I doubt it could.
Limitations of o1-preview: 1. The limitations of being a language model are still evident. Its perception of the physical world is very poor, making it difficult to utilize for tasks requiring spatial awareness. 2. While it actively uses the Chain of Thoughts technique, which significantly improves accuracy on tasks where there is a clear logical answer, this simultaneously makes its thinking process rigid. As a result, it performs worse than GPT-4 in areas where subjective nuances and no clear answers are involved, such as writing. In contrast, traditional language models like GPT-4 may have a higher occurrence of hallucinations, but this also makes them more adept at generating plausible responses, which ultimately aids in tasks like creative writing. Therefore, o1 is not a one-size-fits-all solution, and it seems necessary to first determine whether to use the Chain of Thoughts technique or the traditional language model approach based on the given task before proceeding with the process. Furthermore, o1 is merely another language model and not a fundamental leap forward; it's simply a specialization of the existing method. Due to the inherent limitations of language models, which learn about the world through language, achieving AGI (Artificial General Intelligence) is still a distant goal.
LLM capabilities are a bit weird, you can’t based its intelligence on a few questions and declare it’s better/worse than a human. It’s subpar to a Phd student in a lot of domain but in others it’s nothing short of superhuman. It’d be great if you could check out Kyle Kabasares channel on different tests he conducted on O1, there he actually uses PhD level questions and it blew all of them out of the water.
AI definitely isn't taking my job anytime soon, but it has been helping me a lot at work lately. If I already know what I need to do, I can tell AI to write it for me. Then I just fix it up a bit, rename stuff, clean it up, etc. But it has probably saved me a couple of hours at work this week.
I've seen AI produce written prose and it produces a lot of repetitive slop. Good to know you are happy with sentences starting with the same words over and over again. From what I've seen, you'd have to rewrite the whole thing.
I don't even know what it means to program at a PhD level. I'm not saying there are no PhD's that are good programmers, but there is nothing about being a PhD that makes someone a good programmer.
It’s already integrated into Cursor or other AI IDEs I’m not mistaken. You have a look at it for next video. And looking forward to it, good stuffs as usual
@@ancwhorThis channels viewers are like the inverse of an AI techbro sometimes. Instead of insane hype it's constant insane lowballing. Did you even listen to the video? If someone with a track-record of being skeptical of AI admits that it's significantly better at coding tasks and beats all other AIs he's tested, it's clearly a lot more than 0.2%.
That's actually one thing I'm unclear about in these videos, none of these models are going to give you the same response every time. If you just run it once it doesn't really tell you much. I'd rather understand not just that it fails, but how badly it fails each time if you run it 10 or 100 times. Is it 10% sort of ok, 40% pretty bad and 50% terrible?
Now I'm doing a masters degree in software architecture for 2 years, because my bachelor's degree is something else, with these new models coming out, I'm quite stressed that I'll have a future as a programmer. do you think it is worth studying something that is not directly related to artificial intelligence, for example, there was a specialty in software technologies with artificial intelligence, this is what I mean as directly related to artificial intelligence.
What do you mean by “autonomous”? Do you mean like, “takes actions to earn enough money to pay for its continued server costs”, or do you just mean, “takes actions as if to accomplish some kind of goal”? If the latter: people have already kinda set up harnesses that do this?
This is still o1 preview, not o1. If the benchmark results they released aren't lying, then o1 should be a nice jump in capabilities. I look forward to seeing your test videos when o1 is released
@young9534 He might be referring to the chatgpt 4 benchmarks where they said it could pass the bar in the 95th percentile but they used a very faulty way to measure it. IOB mentioned it previously. They also do the same thing here when they say it is a gold medalist in the math Olympia with "adjusted time restrictions". They just don't mention if they took it by the actual rules it would've failed first question
@@Easternromanfan yeah that makes sense. This is why I look forward to seeing this channel run tests on o1 when it gets released. I trust him more than OpenAI
I’m not really familiar with o1 at all at this stage. But is it possible to create a prompt to get o1 to ask you questions to get the details it needs to understand your requirements better. (ie to simulate the task of gathering reqs better).
It’s well worth a try as a supplement to human effort, in my opinion. The new o1 model is not available yet for OpenAI assistants, however the 4o model does well enough for now.
I've been working with AI for content creation and research for a few months now, and while there are still some flaws, the improvement has been significant. It's gone from 30-40% accurate to 60-80%, and even though I still need to edit most of the output, it’s saving me a ton of time. In just the last 5 days, it’s cut down weeks of work! If it keeps progressing like this, it’ll be incredibly useful by the end of 2025.
It depends on what you're trying to accomplish. I did a discussion about university degrees on a podcast here: ruclips.net/video/f9bO9aTXog0/видео.html (Although I have never heard of Open Source Society University - so I don't know anything about it)
Logic gates implemented in silicon sometimes have errors. It is possible to use a larger collection of logic gates implemented in silicon in order to make something which can detect and correct these errors. (For classical computing, the errors are, AIUI, more likely to occur in memory than during the computation, and so most of the hardware error correction is for correcting errors in data which is being stored, but the same thing applies to a lesser extent for errors that happen as part of the computation. As a side note: for quantum computers, the errors happening during the computation steps is a bigger issue and needs more attention than it needs in classical computing.) It is true that more steps does mean more opportunities for errors, but that doesn’t necessarily imply that each step on net increases the probability of an error.
It does if hallucinations are just a fancy marketing term for fuckups that happen when not picking the most likely result from a dataset in an attempt to mimic creativity and the program based on computational linguistics doesn't know when it needs to not do that.
@@estefencosta1835 Hm, seems like a bit of a run-on sentence, but I suppose it would be hypocritical of me to complain too much about that… Yes, “hallucinations” is just a term people use for errors. Maybe slightly more specific? Your specific explanation for these errors, seems a bit unclear to me?
@@drdca8263 If GenAI just chose the most likely response from a set of data then every time you queried it, it would give back the same response. The reason ChatGPT gives the illusion of an intelligent response is because of what can be conceptualized as a probabilistic responses, which is why it builds it's responses bit by bit. If it's given too much latitude, it's answers quickly lose coherency. If given too little, it doesn't present anything that seems novel and loses any potential capacity to solve tasks. But there isn't a sweet spot where it won't just sometimes give you things that are either nonsensical or just flat out wrong. Hallucinations is a term we use for disturbances in sensory experience for humans, but was co-opted for GenAI back when they were trying to con people into believing these algorithms are sentient (see the Sparks of Life paper.) By using the term hallucination it's implying that GenAI is simply misrepresenting something. This is not accurate. GenAI simply runs on algorithm and tries to parse meaningful strings from it's training data with computational linguistics as the backbone of how it goes about trying to decide what is or isn't meaningful. The more complex the task, the less accurate or interesting it gets. It's like giving an algorithm a set of Legos, and asking it to build something new. But since the algorithm only knows what previous sets of Legos looked like (and most of which isn't even relevant to what it's being asked to build) the best it can do is try and mash together bits of other sets based on probability, but not always the most probable pieces, otherwise it would do it wrong the same way every time. But it can't actually build it's own new Lego set from the ground up, and it also has no way of verifying if the Lego set it ends up building is even correct or satisfies the query. This is why programs like ChatGPT can be confidently incorrect. What I'm less familiar with but it sounds like they're trying to do is use non-GenAI methods to verify if a piece of code is actually viable in order to try and correct it before it spits out the code. Even if this were perfected which I very much doubt, it gets you no closer to writing code that actually does what you want it to do, it simply eliminates some of the more obvious and elementary errors.
As if o1 preview and o1 are significantly different. Come on. Gmail was in beta for like a thousand years, when they released it finally, it was the same damn product.
I knew videos of this nature would come. I think its valid to point out when ai fails. However, when are people going to acknowledge what seems obvious to me, but seems to get lost in these, "I asked ai to do X and if failed" videos? Is the new ai model better than the previous version? Was the previous version better than the one before that? And before that, was the version released better than what came before? If yes, what is hype about this ai thing? Every version is better for years now. Billions of dollars are being invested to figure out to continue the obvious trend that a 5 year old could point out. Yet we still have the, " this is all hype and bs, ai is completely fake" crowd. I don't get how anyone could still believe this ai trend is hype.
I tried to point out in this video that O1 is better than all the other models I've tried on the tests I've been using (and I'll be coming up with more tests). There is definitely a trend of it getting better, but it seems (to me, and to other people) that the rate at which it's getting better is slowing, and that the value of this cycle of AI improvement is going to be worth far less than what has already been invested in it. But I could be wrong - we'll see.
@@InternetOfBugso1 preview is significantly worse in many areas than final o1. The leap in math ELO scores o1 was able to achieve is not at all consistent with 'slowing down.'
@@reboundmultimedia The problem is that your measure is a bunch of specific tests, we measure the value of intelligence by the things it can bring to the world, not by blindly taking tests and measuring them.
@@diadetediotedio6918 I'm curious what industry you work in. I work as an datacenter infrastructure engineer. I got to this role through my degree and multiple certifications (e.g. lots of blind test and measuring them). Preparing for the tests taught me skills that the labor market values. Please elaborate on what industry we have that doesn't discriminate based on ability, as determined by testing. Would you see a doctor who flunked out of medical school, but had a real passion for helping people?
I use both claude and chatgpt on a daily basis but it has serious limitations and constantly makes mistakes that I need to point out. Aside from that it's very useful and can often times point me to new technology's and solutions. Aside from that I'd say the tech has been relatively the same since chatgpt 3.0.
I like watching this channel because it's like watching the wise man or the witch doctor of a cannibal tribe when they first encountered an airplane... b-b-but why would the Sun-God Bogosun give pale face such magic? I'm willing to bet that the goalposts will shift by EO2025 to "but it can't give you youtube clone from a single prompt, still meh"
So programming languages are human readable formats to interact with a computer Do people think AI = computer stuff only? Any language based/knowledge based job can be automated as LLMs excel at this All white collar work is at risk
AI is a tool. Sensible artists will incorporate it. The rest are Luddites with barely a grasp of how AI works. Programmers too will use this as a tool - but software engineering requires far more than writing a bunch of code.
@@cbnewham5633Exactly, most people who think AI will "replace everything" have never actually done the things they claim AI will replace. Because if they actually tried to do that, they'd realize, real quick, that LLM's leave a lot to be desired. Ive only ever used it as a tool to supplement my work, and this is only after Ive double checked the code it generates.
Is “inflection” vs “inflexion” a dialect/regional-spelling-differences thing, or just a “you personally spell it differently” thing? It reminds me of how some old letters about math were written
iob.fyi/codecrafters will let you sign up to try CodeCrafters challenges yourself. If you're interested in seeing if you're smarter than an AI.
I'm sure the hype of this model has nothing to do with OpenAI trying to fundraise $100 Billion right now
Only a fool would join those dots... 😏
Duhhh
Phd lvl bruh
It's not hype as the Channel owner now has to admit these LLM's are improving fast.
Tech usually moves much slower than this.
@@TheReferrer72 do you think claiming "PhD level" intelligence is not hype? It's clearly not at that level, despite what OpenAI may claim.
I appreciate the grounded opinions shared on this channel. Particularly as someone who has been building applications for almost 30 years using traditional patterns while also adopting new techniques. Whew, where does the time go?
I'm in the same boat here. I started as a kid and I'm still building them regularly. To me o1 preview is highly useful for certain things, and other things it tends to produce only substandard or completely unnecessary code solutions. Just like standard AI too it often rewrites things you don't want or includes things that shouldn't be there.
It's okay with python and the rest of the languages I've tried haven't resulted very much.
Much respect for adding chapter marks on a 5 minute video. You are amazing!
I love this guy man, amazing counterbalance to otherwise overwhelming narratives. And exactly the right person to deliver this information
Signed on as a member. Big fan of your stuff - only about 4 years of experience programming professionally, but I've been losing my mind at all the AI code bots. I am way faster without it - if that ever changed I'd be happy to use them, so it's nice to follow your progress in testing different models. Getting a more experienced perspective is appreciated too.
I ask Chat GPT to do something and every time, it starts doing things I don't want it to and I start to lose my mind figuring out how to ask the right thing to make it do what I want
I like how from bashing AI coding this channel has turned to AI coding benchmark channel.
Soon he will be promoting to use AI
I mean they're the same thing right now. Any basic benchmarking is "bashing" simply on the sheer hype these companies are pushing.
@@Tverse3 The bashing was for the unnecessary hype these all was getting. But, if it proves actually useful to even very experienced programmers, then why not use them? A tool's a tool, if it fails to help, discard it and move on.
I tested this model for real enterprise task (I even tried breaking task to small easy steps and removed some steps that require domain understandable) and it failed. But then I found real use for it - sample generation. It created some test sample soap requests for provided wsdl and structure was correct, so saved me some time. Better to extract samples from actual system indeed, but due to nature of project it was nearly impossible until other team finished their job
You can achieve the same results with Sonnet 3.5 for a fraction of the cost
@@RomeTWguy what do you mean? It cost me nothing, I have subscription. It saved me time because constructing soap xml by hand takes time and it did it in 50 seconds with supplied data
@@goldsucc6068 Don't understand you could get GPT 3 you know before ChatGPT to that.
The part about being as good as a Ph.D in writing a software piece, I think they likely are correct, just pay attention, Ph.D.'s even in CS are not known for writing proper software application!! haha
I think software engineers are overestimating the value they provide, we could now have an Ai model just trained for every programming language , which would do better than most junior and mid level devs...this profession is doomed and about senior engineers they arr not special, with Ai the midlevel engineers will become seniors quickly.
@@Tverse3 Citation needed, AI models need training data. There are tons of problems in industry where training data is sparse if it exists at all. The tests Internet of Bugs has been doing show that AI can't even handle a simple language like Python for relatively straight forward tasks. Why do you expect that AI models can beat junior devs for things like C in embedded software, or C# in a more application oriented field if it can't handle a batteries included language like Python?
@@Tverse3Useless hype comment. These LLMs capabilities are vastly overestimated
@@Tverse3Jesus Christ being so confidently wrong and shallow is a skill within itself
@@realurilordjonhnsoni7342 Now that we have LLM's, it's trivially easy to be confidently wrong.
Thank you for your honesty. Im not s tech person and i knew there was too much hype but very small changes for the average user
Its a bit strange to me that they charge the user for tokens it takes for reasoning yet they don't show the reasoning part in detail. Its like they can add a x amount of extra tokens to your bill without being able to check it.
Yes. You are correct. As i show on my channel, it cannot do simple geometry. It's a lot better than other models, but everyone is swallowing OpenAI's hype about "PhD level" and the cherry-picked examples some RUclipsrs have published for likes. It's nowhere near PhD in the maths sphere. More like high school. You are about the only other person I've seen on RUclips so far who is pointing out real deficiencies. Subscribed.
Didn’t the famous mathematician Terry Tao say that it seemed to him like it was getting close to like, a mediocre grad student, or something like that?
@@drdca8263 it's very good at some things - especially when it comes to language (which is, after all, the main part of the architecture). Until they can integrate mathematics into it flawlessly and remove hallucinations then these models will remain stuck at the "passable" level averaged across all types of queries. Maths is really important because without that you don't get the rest of the sciences - certainly not for doing anything serious.
I passed it questions from the 2023 Putnam and it solved 80% of them. Idk why you even ask geometry questions to a model that can’t even see but do try it with other problems from other math domains.
Its like a failing topology student in math. High school is a bit harsh. I ask it measure theory questions and it trips up. I reckon it might be a C's get degrees graduate :)
@@Happyduderawr ok, a bit harsh perhaps, but in certain things it is a real brainiac while in others brain-impaired. If we take the average, it is muddling along with a checkered academic history, but won't be sweeping up the academic prizes any time soon. 😄
Yep... overhyped... one hype train arriving after the other... that's why I lost interest in AI.... too much hype and not enough progress, still stuck in that old language model with all its flaws.
So transformers were invented in 2017. At most, we have seen 8 years of work on these types neural networks, mostly niche work
We did not see industry wide efforts until chatgpt 3.5, released November 30 2022. It has not even been 2 years
All of these developments have been maxing out what transformers can do. So even without further breakthroughs in architecture this is enough to change society, it already has started to
Also, this is not mentioning diffusion models
honestly im looking forward to 1-2 years in the future, where i expect AI models on this level will be open source and no longer locked behind some paid service
That's probably the most unlikely scenario
It seems like they really have to hype these models to make people worry about job displacement and AI taking over most fields; otherwise, they wouldn’t be able to justify the billions invested in training and development needed to keep the improvements advancing at a competitive pace. Either this hype will turn into a self-fulfilling prophecy in the short to medium term, or the industry will hit a plateau as diminishing returns set in, leading to the AI bubble bursting. Ultimately, we’ll be left with advanced tools that, while highly capable, remain far from the true AGI people envision.
@@SuperMarioTomma95 chatgpt is now the 13th most visited website globally. Right behind amazon. It’s the only site besides google, amazon, and yahoo in the top 13 that is not social media.
So clearly hundreds of millions of people find it useful enough.
It felt like a huge nothingburger. I really feel that we hit the wall of LLMs, not to mention that these models cannot inherently reason regardless as much as Altman wants to make his investors think it can.
Altman is not the only one working on this. This is not some handcrafted product made by open AI.
Its a discovery that when you give transformers compute and data they display emergent abilities.
So anyone with compute and data can make these, and as we are seeing they are
@@arnavprakash7991 emergent stupidity
@@arnavprakash7991 more compute at inference time anyone can, but they must have also fine tuned it on cot datasets to simulate reasoning
@@RomeTWguy yeah but now we have multiple solid LLMs (claude, llama, gemini… I guess)
What Open AI did is replicatable, llama 3 papers prove it.
The next generation of LLMs now have 3 avenues for enhancement:
Training (model learning off data)
Hardware
Inference (model thinking/processing of inputs)
@@arnavprakash7991yeah hundred of billions of dollars compute
I was looking forward to your cometary. Thanks.
From Open AI's own article:
"These results do not imply that o1 is more capable than a PhD in all respects - only that the model is more proficient in solving some problems that a PhD would be expected to solve."
Translates to:
We hyper-trained the model on a subset of questions and now it can solve those problems “better” than some Ph.D.
@@Cephandrius016People like you and him think your smart saying blatantly obvious shit like:
“iTs jUSt sToCHaStiC graDiEnT dEsCeNT!!! hOw cOUlD it eVeR (bullshit false equivalence here)”
As if the researchers working on this shit, day in and day out don’t know everything you do and more.
These models have gotten consistently better given more compute, that is just a fact.
You can only argue the degree.
But I bet fucking anything that in 2-3 years time, this guy will be doing this video with an increasingly more complex problem that he continues to downplays as “I classify this as easy to very easy”.
People like you and this guy think you’re clever pointing out obvious fucking flaws whilst completely overseeing the broader direction, and none of you…this guy in the video especially could ever fucking hope to build anything remotely as useful as these models.
What an extremely scientific way for them to measure their product's capabilities.
Yet a PhD student is expected to solve or make progress on a original problem through original research so Im not sure what that statement even means.
I mean ask the AI to solve one of the millennium problems or quantum gravity. I doubt it could.
Thank you for your brief thoughts.
But im a phd and im shit at writing code.
Thanks for the update! The o1-preview is available in the supermavem vscode extension
I was looking forward to your assessment of Oi1.
Limitations of o1-preview:
1. The limitations of being a language model are still evident. Its perception of the physical world is very poor, making it difficult to utilize for tasks requiring spatial awareness.
2. While it actively uses the Chain of Thoughts technique, which significantly improves accuracy on tasks where there is a clear logical answer, this simultaneously makes its thinking process rigid. As a result, it performs worse than GPT-4 in areas where subjective nuances and no clear answers are involved, such as writing. In contrast, traditional language models like GPT-4 may have a higher occurrence of hallucinations, but this also makes them more adept at generating plausible responses, which ultimately aids in tasks like creative writing.
Therefore, o1 is not a one-size-fits-all solution, and it seems necessary to first determine whether to use the Chain of Thoughts technique or the traditional language model approach based on the given task before proceeding with the process. Furthermore, o1 is merely another language model and not a fundamental leap forward; it's simply a specialization of the existing method. Due to the inherent limitations of language models, which learn about the world through language, achieving AGI (Artificial General Intelligence) is still a distant goal.
I like how your facial expression on the video thumbnail provides the tl:dr on this 😀
LLM capabilities are a bit weird, you can’t based its intelligence on a few questions and declare it’s better/worse than a human. It’s subpar to a Phd student in a lot of domain but in others it’s nothing short of superhuman. It’d be great if you could check out Kyle Kabasares channel on different tests he conducted on O1, there he actually uses PhD level questions and it blew all of them out of the water.
The camera angle... If hes not moving his hands, it looks like a recording from a locked-in patient.
AI definitely isn't taking my job anytime soon, but it has been helping me a lot at work lately. If I already know what I need to do, I can tell AI to write it for me. Then I just fix it up a bit, rename stuff, clean it up, etc. But it has probably saved me a couple of hours at work this week.
Anyway my pay hasn't gone up so I'm just taking those extra hours I gain off from work.
@@DoubleOhSilver This is the way
I've seen AI produce written prose and it produces a lot of repetitive slop.
Good to know you are happy with sentences starting with the same words over and over again.
From what I've seen, you'd have to rewrite the whole thing.
o1 is available in Cursor, but it’s no included with the monthly fee. You have to pay separately.
Fact that he's saying it's better, now that's some progress
Thank you so much!!
Of course, human PhD with GPTO1 + google beats vanilla GPTO1.
To be fair, I’ve seen some PhDs who write pretty bad code 😂
I don't even know what it means to program at a PhD level. I'm not saying there are no PhD's that are good programmers, but there is nothing about being a PhD that makes someone a good programmer.
4:07 o1 is already integrated in Cursor. But it is expensive
Interesting. Fixed monthly price or per token cost? I like Cursor so far.
Keeping it real 👍
It’s already integrated into Cursor or other AI IDEs I’m not mistaken. You have a look at it for next video. And looking forward to it, good stuffs as usual
ok but every PHD I've worked with has been garbage at writing code.
10x the compute for 0.2% improvement imo
if you think we got 0.2% improvement, then your opinion clearly isn't worth that much
@@generichuman_ if you think it's more then your opinion clearly isn't worth that much
@@generichuman_What was the most complex system you both worked on without AIs then compare who's opinion is worth more lol.
@@justafreak15able express API backend linked to python for an algo to manage distribution. Vue frontend. Self thought. In prod.
@@ancwhorThis channels viewers are like the inverse of an AI techbro sometimes. Instead of insane hype it's constant insane lowballing. Did you even listen to the video?
If someone with a track-record of being skeptical of AI admits that it's significantly better at coding tasks and beats all other AIs he's tested, it's clearly a lot more than 0.2%.
I don't understand why you directly made the video on o1 and skipped the project strawberry?
Thansk
Doing God's work
try to make it give you "hello world" from Bend Language, i gave 4o 8 attempts then i gave it the code and it actually got that wrong aswell
That's actually one thing I'm unclear about in these videos, none of these models are going to give you the same response every time. If you just run it once it doesn't really tell you much. I'd rather understand not just that it fails, but how badly it fails each time if you run it 10 or 100 times. Is it 10% sort of ok, 40% pretty bad and 50% terrible?
isn't the entire AI thing is running on hype fuel?
We're 1 year into 2 months away from AI taking over your job
I’ve worked with phds, using that comparison is a really bad idea…
Altman is a bit evil considering how much he’s willing to throw lives under the bus for his toy that doesn’t do near what he claims
It's all about the money, money, money.
I am a Math PhD and I can not write any software.
Now I'm doing a masters degree in software architecture for 2 years, because my bachelor's degree is something else, with these new models coming out, I'm quite stressed that I'll have a future as a programmer. do you think it is worth studying something that is not directly related to artificial intelligence, for example, there was a specialty in software technologies with artificial intelligence, this is what I mean as directly related to artificial intelligence.
It was integrated into cursor on day 1....
These things will not be autonomous. At best they will be a "living" stackoverflow.
And what happens to the real stack overflow that they are parasitic upon to function?
@@personzorz it remains as relevant as ever
Alright, see you in a couple of years.
@@tear728 do you actually use any of these LLMs? Have you used the most recent models?
Or are you just making statements to make yourself feel better
What do you mean by “autonomous”? Do you mean like, “takes actions to earn enough money to pay for its continued server costs”, or do you just mean, “takes actions as if to accomplish some kind of goal”?
If the latter: people have already kinda set up harnesses that do this?
This is still o1 preview, not o1. If the benchmark results they released aren't lying, then o1 should be a nice jump in capabilities. I look forward to seeing your test videos when o1 is released
Several previous benchmarks have been lies.
@@personzorz are you talking about the o1 results they released?
@young9534 He might be referring to the chatgpt 4 benchmarks where they said it could pass the bar in the 95th percentile but they used a very faulty way to measure it. IOB mentioned it previously. They also do the same thing here when they say it is a gold medalist in the math Olympia with "adjusted time restrictions". They just don't mention if they took it by the actual rules it would've failed first question
@@Easternromanfan yeah that makes sense. This is why I look forward to seeing this channel run tests on o1 when it gets released. I trust him more than OpenAI
The actual model isn't far off from this based on the benchmarks
I’m not really familiar with o1 at all at this stage. But is it possible to create a prompt to get o1 to ask you questions to get the details it needs to understand your requirements better. (ie to simulate the task of gathering reqs better).
It’s well worth a try as a supplement to human effort, in my opinion. The new o1 model is not available yet for OpenAI assistants, however the 4o model does well enough for now.
My prediction: even when AGI is achieved, this channel will call it overhyped
I've been working with AI for content creation and research for a few months now, and while there are still some flaws, the improvement has been significant. It's gone from 30-40% accurate to 60-80%, and even though I still need to edit most of the output, it’s saving me a ton of time. In just the last 5 days, it’s cut down weeks of work! If it keeps progressing like this, it’ll be incredibly useful by the end of 2025.
What kind of research
Doesn't really understand the reqs? Sounds like it can substitute a scrum manager or boss. Not a dev.
2:44 sam altman Saying what is easy and difficult for human . is a different for ai.
Http server in a prompt? Use express or fastapi, or even better... Go.
Is it worth it to go back to uni for a CSC degree (Open Source Society University)?
It depends on what you're trying to accomplish. I did a discussion about university degrees on a podcast here: ruclips.net/video/f9bO9aTXog0/видео.html (Although I have never heard of Open Source Society University - so I don't know anything about it)
If models hallucinate, then surely telling them to think step by step just gives them more opportunities to hallucinate?
Logic gates implemented in silicon sometimes have errors. It is possible to use a larger collection of logic gates implemented in silicon in order to make something which can detect and correct these errors.
(For classical computing, the errors are, AIUI, more likely to occur in memory than during the computation, and so most of the hardware error correction is for correcting errors in data which is being stored, but the same thing applies to a lesser extent for errors that happen as part of the computation. As a side note: for quantum computers, the errors happening during the computation steps is a bigger issue and needs more attention than it needs in classical computing.)
It is true that more steps does mean more opportunities for errors, but that doesn’t necessarily imply that each step on net increases the probability of an error.
It does if hallucinations are just a fancy marketing term for fuckups that happen when not picking the most likely result from a dataset in an attempt to mimic creativity and the program based on computational linguistics doesn't know when it needs to not do that.
@@estefencosta1835 Hm, seems like a bit of a run-on sentence, but I suppose it would be hypocritical of me to complain too much about that…
Yes, “hallucinations” is just a term people use for errors. Maybe slightly more specific?
Your specific explanation for these errors, seems a bit unclear to me?
@@drdca8263 If GenAI just chose the most likely response from a set of data then every time you queried it, it would give back the same response. The reason ChatGPT gives the illusion of an intelligent response is because of what can be conceptualized as a probabilistic responses, which is why it builds it's responses bit by bit. If it's given too much latitude, it's answers quickly lose coherency. If given too little, it doesn't present anything that seems novel and loses any potential capacity to solve tasks. But there isn't a sweet spot where it won't just sometimes give you things that are either nonsensical or just flat out wrong.
Hallucinations is a term we use for disturbances in sensory experience for humans, but was co-opted for GenAI back when they were trying to con people into believing these algorithms are sentient (see the Sparks of Life paper.) By using the term hallucination it's implying that GenAI is simply misrepresenting something. This is not accurate. GenAI simply runs on algorithm and tries to parse meaningful strings from it's training data with computational linguistics as the backbone of how it goes about trying to decide what is or isn't meaningful.
The more complex the task, the less accurate or interesting it gets. It's like giving an algorithm a set of Legos, and asking it to build something new. But since the algorithm only knows what previous sets of Legos looked like (and most of which isn't even relevant to what it's being asked to build) the best it can do is try and mash together bits of other sets based on probability, but not always the most probable pieces, otherwise it would do it wrong the same way every time. But it can't actually build it's own new Lego set from the ground up, and it also has no way of verifying if the Lego set it ends up building is even correct or satisfies the query. This is why programs like ChatGPT can be confidently incorrect.
What I'm less familiar with but it sounds like they're trying to do is use non-GenAI methods to verify if a piece of code is actually viable in order to try and correct it before it spits out the code. Even if this were perfected which I very much doubt, it gets you no closer to writing code that actually does what you want it to do, it simply eliminates some of the more obvious and elementary errors.
@@drdca8263 my point is, error correction doesn't work, if the error correction system itself hallucinates
There is no pleasing this guy😂.
If they're selling you something, and it doesn't achieve what they say it achieves, then no one should be pleased.
Are you the future version of David Shapiro?
Please no. David is all over the place - the new Bindu Reddy.
It's impossible that you tested the o1; we only have access to the o1-preview version
That's what he means
As if o1 preview and o1 are significantly different. Come on. Gmail was in beta for like a thousand years, when they released it finally, it was the same damn product.
Agentic, self improving and self aware AI is going to change the economy, not these fancy demo products.
bullish on SSI inc and Ilya
I knew videos of this nature would come. I think its valid to point out when ai fails. However, when are people going to acknowledge what seems obvious to me, but seems to get lost in these, "I asked ai to do X and if failed" videos?
Is the new ai model better than the previous version? Was the previous version better than the one before that? And before that, was the version released better than what came before? If yes, what is hype about this ai thing? Every version is better for years now. Billions of dollars are being invested to figure out to continue the obvious trend that a 5 year old could point out. Yet we still have the, " this is all hype and bs, ai is completely fake" crowd. I don't get how anyone could still believe this ai trend is hype.
I tried to point out in this video that O1 is better than all the other models I've tried on the tests I've been using (and I'll be coming up with more tests).
There is definitely a trend of it getting better, but it seems (to me, and to other people) that the rate at which it's getting better is slowing, and that the value of this cycle of AI improvement is going to be worth far less than what has already been invested in it. But I could be wrong - we'll see.
@@InternetOfBugs I agree that LLMs are a terrible product in the sense that the cost of inputs are significantly higher than the value of output.
@@InternetOfBugso1 preview is significantly worse in many areas than final o1. The leap in math ELO scores o1 was able to achieve is not at all consistent with 'slowing down.'
@@reboundmultimedia
The problem is that your measure is a bunch of specific tests, we measure the value of intelligence by the things it can bring to the world, not by blindly taking tests and measuring them.
@@diadetediotedio6918 I'm curious what industry you work in. I work as an datacenter infrastructure engineer. I got to this role through my degree and multiple certifications (e.g. lots of blind test and measuring them). Preparing for the tests taught me skills that the labor market values.
Please elaborate on what industry we have that doesn't discriminate based on ability, as determined by testing. Would you see a doctor who flunked out of medical school, but had a real passion for helping people?
I use both claude and chatgpt on a daily basis but it has serious limitations and constantly makes mistakes that I need to point out.
Aside from that it's very useful and can often times point me to new technology's and solutions.
Aside from that I'd say the tech has been relatively the same since chatgpt 3.0.
It will replace programmers eventually 😌
I like watching this channel because it's like watching the wise man or the witch doctor of a cannibal tribe when they first encountered an airplane... b-b-but why would the Sun-God Bogosun give pale face such magic? I'm willing to bet that the goalposts will shift by EO2025 to "but it can't give you youtube clone from a single prompt, still meh"
God these AIs just suck
I love programmers freaking out after every new gpt release, looks like they will face the same fate as artists. 😮
only the shit ones
So programming languages are human readable formats to interact with a computer
Do people think AI = computer stuff only?
Any language based/knowledge based job can be automated as LLMs excel at this
All white collar work is at risk
AI is a tool. Sensible artists will incorporate it. The rest are Luddites with barely a grasp of how AI works. Programmers too will use this as a tool - but software engineering requires far more than writing a bunch of code.
Why do you love it? What job do you do?
@@cbnewham5633Exactly, most people who think AI will "replace everything" have never actually done the things they claim AI will replace. Because if they actually tried to do that, they'd realize, real quick, that LLM's leave a lot to be desired.
Ive only ever used it as a tool to supplement my work, and this is only after Ive double checked the code it generates.
1:18 You asked for an input number between 1^64....and in the response/code it assumed 2^64. "Small" difference ;).
Just wait for chat gpt 5 thats the big thing moving forward thats the inflexion point
Is “inflection” vs “inflexion” a dialect/regional-spelling-differences thing, or just a “you personally spell it differently” thing?
It reminds me of how some old letters about math were written
@@drdca8263 made a mistake too lazy to fix it to be honest
Made a mistake too lazy to fix it @@drdca8263