I see a lot of people calling cope on prices. Here is the last 20 years of projection of cost per flop. doubles every 2.5 years. This isn't great. Algorithms is a must, but even then it's pretty hard to improve performance 1000x with algos. Either way, I do think my premise is correct. Hard skills will be a great advantage over the next decade. Beyond that, who the heck knows www.lesswrong.com/posts/c6KFvQcZggQKZzxr9/trends-in-gpu-price-performance
Even after that, i prefer being of those that will have learned the technical skills. i fail to see how it can ever become totally useless. In a AI tech driven world it's the most important professional edge you can get.
From the article you linked to: "for models of GPU typically used in ML research, we find a faster rate of improvement (FLOP/s per $ doubles every 2.07 years)." So in roughly ten years you're looking at the cost being 1/16th (x0.0625) of what it is now. And that would just be for running the *current* O3 model... imagine running price-performance optimized coding models 10 years more advanced than what we have now, at x0.0625 the cost of current compute.
Also misses the fact that o4 will likely be significantly better. Probably not as big of a jump, but there's low hanging fruit left to be picked in the test time compute architecture. That begs the question, what about o4 run on 10% of the compute? Likely it'd have similar performance to o3, but there's a 10x reduction in cost.
We achieve AGI ~130 million times a year. We've been at that level since the 1980s, though it's trending downward. Places like India and Africa have been pumping them out faster than anywhere in the west.
@@jonduke4748 Even so, the question is, is the massive investment in nuclear power worth it for LLM use alone? It will already cost a significant amount to just keep up with grid demands for electric vehicles and replacing power generation for existing demand with cleaner sources.
@genericdeveloper3966 it would not be so massive if the regulation was not quite overly as obscene as it is; additionally it wouldn't be so bad if calmer heads and brighter minds had prevailed over the last 50 years and actually invested much more in nuclear infrastructure and technology. It's honestly and literally the only way forward at this point, considering the ever increasing demand for power and the need to reduce price per KWH. Once built nuclear is much cheaper than fossil fuel. It's also the only and I mean ONLY serious solution to climate change aside from mass depopulation.
I don't know how this will affect jobs, but as a teacher of computer science at university I do know for sure that knowledge is still going to be very valuable. Maybe even more than before. If you are a student never think for one second that what you are learning is going to be useless: the process of learning is the real value.
The more you learn, the easier it becomes to learn. You start to see patterns and make analogies. ”This new thing is kinda like good old A combined with our friend B plus a little new twist - alright!”
@@no-nukez that's a difficult one. Full disclosure, I'm from Italy so the whole job scenario may be very different. I'm hearing about layoffs in the USA and a big frenzy. Probably CEOs wanna jump on the whole "cheaper labour" thing... What i'm thinking is that being competitive its still going to matter, and in order to be competitive *hopefully* knowledge is going to matter.
Love this video. Worrying about AI literally keeps me up at night. I can't thank you enough for making this. It doesn't just give me comfort, it gives me a direction. Thank you. 🙂
Worrying about AI is also keeping me up at night (I literally woke up at 3AM this morning in a panic attack thinking “the robots are coming!”) I’ve found that writing my thoughts down has been helping with the anxiety a lot. There is a lot to be concerned about for the future, but there’s also huge opportunities for both individuals and society. I think there will be growing pains and not everyone will be better off in the short term, but on a longer time horizon AI will probably be the best thing to ever happen to humanity.
@@WilsonSilva90 Well, they need at least a few technical experts when there are notorious bugs that AI might take a while to figure out. I see it increases productivity of developers, not replacing them in anyways
If you talk to professional artists, the image generation tooling is still completely useless in profesisonal image creation workflows. More time is spent fixing errors than it would take to generate the image manually.
@@benjaminblack91 Right. Image generation is great if you want to generate a picture of "Keanu Reeves but he's 650 lbs." for a RUclips thumbnail, but it's pretty much useless if you want to create something more specific. It's still impressive technology, but nowhere near flawless like every AI-bro would have you believe.
at 4:40 you mention that o3 can solve a toy task that it's never seen before, but the ARC-AGI benchmark includes a training set to familiarize models with the format of the problems. I don't think any specific problem types are repeated, but they aren't going in completely cold to the format either (o3 was specifically fine tuned on that training set for this benchmark)
That's kinda the whole problem. Training the model on a format and structure of the test defeats the entire point. It means the results likely aren't generalizable and may be strongly tied to the narrow ARC testing environment. If that's what's happening with o3, then if you take it out of that environment and into a real-world situation, it wouldn't do any better than probably o1 on the same task. It's something that has been observed in humans too. When people are trained to perform better on cognitive tests, their scores improve dramatically, even on questions they haven't seen before. But when you give them a different test with a different structure, then they go on to do about the same as pre-training. Training them on the test didn't make them more intelligent in general. It just made them better at taking that specific test. The 'G' in AGI is really what matters here.
I would suggest you listen to the guy from ARC describe the results. You cannot ‘train’ a model for these tests. You can make them aware of the format, and the same applies to humans, but each test is novel and the model has not seen it. Same as a human. That the model can beat the human on novel tests is what impresses with this iteration.
@theb190experience9 that's what my comment said if you reread it carefully, though a human would not need a training set nearly as large as the one offered to understand the goal - at best, one or two examples
9:01 THIS. Speaking as someone who literally did a PhD in machine learning, all this "we achieved AGI" stuff sounds like _pure marketing bs_ to me. People have been predicting that the great golden tech utopia future is only 5 years away for like 60+ years, it's all just be about building hype for tech and investment.
I also love how every AI bro loves to use that graph (1:28) as some sort of proof that AGI has been achieved when its literally showing that even with exponentially growing cost, test accuracy barely increases. According to the graph, the score difference between o3 low and o3 high was just 12%, meanwhile the cost difference per task was literally thousands of dollars.
@vasiapetrovi4773 when did I say something like that? I think transformers were a remarkable achievement, I think LLMs were a remarkable achievement, I think ChatGPT was a remarkable achievement, I think o4 was very impressive. I'm fully capable of valuing all these things as being the innovations they are _while still_ being skeptical about claims of AGI or a complete technological revolution. These opinions aren't contradictory _in the slightest..._
The goalposts have already been moved for using the term Artificial Intelligence, how long before every machine learning tool is called AGI and they have to come up with a THIRD term to mean the same thing the other two used to mean.
AI has never really referred to actual intelligence to be honest. I mean, people were calling what the computer enemy does in video games "AI" for decades even when they had incredibly simple algorithms and nobody ever considered it to be a confusing term to use back then.
That second one is the big one. Until it can solve problems it has absolutely no training or knowledge of AI will just be an assistant. Which, while useful, it is not going to take your jobs. Though how far we are from AI being able to learn independently without being trained on something is anyone's guess, could be tomorrow could be 100 years from now.
@@hungrymusicwolfhumans can't do that either. I'll ask you to do something you have no training or knowledge on and watch you just magically figure it out..
@@hungrymusicwolf I’m guessing closer to that 100 year mark. Idk, it just seems like it’s plateaued to me. I think they just needed something to fool investors. Open AI is absolutely HEMORRHAGING money and they’re not really putting anything truly valuable forward.
It's designed to just be familiar with the format of the problem. Think of it as training a model to understand JSON so that you can pass it a problem in JSON format. For example, if you passed math problems formatted in JSON, pretraining the model on JSON doesn't directly help with understanding the steps to solve the math problem. That is what they say, at least, my explanation isn't an endorsement or me making the claim it actually works that way. I'm just the messenger of this info. @@hungrymusicwolf
Honestly, AI has made me appreciate humans more than ever before, and despite all the advancements in technology and all the energy consumed, humans are still at a mysterious level of ability compared to AI
Well, we've had millions of years of headstart to get to where we are now, AI has come so far in just decades. Machines are evolving much faster than we are, and may surpass us one day.
@@20Twenty-3 None of our engineering at a nano-scale holds a candle to life itself. The human body is an unfathomably well designed and interconnected nano-micro-macro machine working in harmony at all scales for a common purpose.
Really appreciated this sobering take on O3/GPT-4. The title had me worried this would be another AI hype video, but you nailed the practical limitations that most AI evangelists conveniently ignore - especially those astronomical costs and resource requirements. The fact that even a "tuned" version only hits those percentages while costing $20 per task (or 172x more for high compute!) really puts things in perspective. Your point about the widening gap between deep technical understanding versus surface-level AI dependence is spot-on. It's refreshing to hear someone cut through the investor-driven AGI hype and remind us that solid technical skills and understanding fundamentals are more valuable than ever. Keep these reality checks coming! 👨💻
I started refactoring the entire front end with ramda exclusively a year ago. The code is beautiful, everyone can read and understand what it does at a high level, but no one can change it.They couldn't fire me if they wanted to. Check mate boardroom suits.
Damn. This hits deep. I was half asleep last night and let it go wild. Looked at the code with fresh eyes this morning and I have classes in separate files and nested classes in the main file. It decided it didn't like my project structure so it made a subfolder and created new files without deleting the old files.
Average Joe devs will spent too much time prompting something that would produce code that "just works dont touch it". They will be Prompters instead of Actual Programmers. When the shit get real, Prompters will call Actual Programmers to clean up their mess
@@awesomesauce804 I'd love to pretend otherwise but so many of my personal projects have been managed like this for over 30 years 😂. When you do anything like this professionally though you refer to different "generations" of solution and call the process "prototyping". 👍😎
The future of AGI is already depicted by the minions in Despicable Me. They kind of get the job done but you don't understand them and you're always wondering what could go wrong.
exactly bro, thats why all of our top board members who were with the company from the very beginning left to start their own thing. because they humbly did not want to accept any credit for helping achieve AGI. it is true bro AGI is here
It feels similar to those really expensive and high precision robotic arms, where yes technically it could do the job of factory workers, but it's way too expensive and that's not even what it's made for, it's made to either be a demonstration piece or for some highly specific niche task.
This is the kind of stuff I am here for. We got the pros, the cons, some valid insight from Prime's perspective, and he didn't spent time speculating on things that are purely hype based. Great video!
Quote from ARC-AGI that people somehow missed. “Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training)” They also stated that their pov - AGI can be achieved when there will be no way to create dataset of simple tasks that AI fails to carry out while human easily do. So all this “AGI achieved” is the biggest bullshit
No but it's using a different training paradigm, which itself can be iterated on, which may get us closer. You luddites change the goalposts all the time, and AI keeps breaking through them within a year.
They trained on 75% of the training dataset. It's literally called "training dataset". If their success is purely contributed to that, as some people make it out to be, then ARC should've already fallen in the past few years to dozens of companies who did train on the training dataset and still couldn't get above 20% accuracy. Yes you should continue to learn hard skills regardless of AGI because there is literally no downsides to having more knowledge, but you should also be able to look at a trend line and realize where it is headed.
Also I recommend people to look up the ARC tasks that o3 failed at. You will see that most of the fail cases are very reasonable attempts, very very reasonable attempts.
Ignoring the trend might be okay if you are a netflix engineer, but for most folks working on much simpler problems I think we all should be aware that the bar is rising very quickly
I wonder if AGI will turn out like fusion. Fusion is possible today, it just costs more energy to keep a reaction going than you get out of it. It might take 100 years or more of innovation for it to be useful. Maybe AGI is possible today, or soon, but how long until it costs less than the annual salaries of a full staff combined per problem solved?
We don't need to worry about o3, we need to worry about o4-mini. Every time there has been an increase in performance, it was followed by a budget friendly advance that doesn't match the bleeding edge, but matches the performance of the previous generation.
I absolutely don't believe this. Looking up how much oil you'd need to produce 1kwh of electricity, it's about 0,0672 gallons. Assuming an average car tank is 12 gallons, that means you would need to spend almost 900 kWh (12galons / 0.0672 gallons/kWh) of electricity. It said that for the low efficiency variant it took 13 minutes to run, which means that you'd have to be using 4166 kW of electricity for the entire 13 minutes. An H100 uses 700W of electricity, that would mean that for 1 task you would be using 6000 H100 running at full tilt the entire time, which is completely absurd. Edit: I rewatched that part and to be more precise, it says 684 kg of CO2 equivalent emissions. Looking at the US average emissions per kWh of electricity (0.37 kg of CO2 per kWh), that would mean you would actually have to spend 1800 kWh of electricity, doubling the math. Either way you look at it it's completely absurd if you're purely looking at the cost of answering questions and not the training.
Honestly nowadays they are generating code better than 90% of the "Senior Developers", but programmers have a big ego and will always try to insult these tools.
I mean that probably does reflect the state of the ai at the time he was talking about it. theprimeagen 2026: "IA is competitive" theprimeagen 2027: "IA is super-intelligent" theprimeagen 2028: "IA is nice and I totally support our new AI overlords"
The fact that there is no data point for o3 untuned is the biggest stink to me - especially since all of the other points are not showing that they are tuned. Tuning the model to such a task is like giving someone all of the questions and answers before having them take an IQ test - I don't think that the test results are all that impressive after that
Also the concept of "tuning" to a test that is supposed to measure the ability to generalize is itself entirely farcical. I can't believe anyone takes this result seriously.
@@almightysaplingIs that literally just so the graph “looks better” to people who aren’t gonna think about what the text actually says?? Knowing what I’m looking at the graph is just so silly lol
We have fine-tuned AI that can beat any human at chess or Go, why is anyone surprised that tuning an AI at a specific kind of task allows it to do well?
@@joeysung311 To me it all points to one thing - AI companies are struggling. The "AGI test" that measures whether the AI is an AGI that measures it in percents, but 100% does not necessarily mean AGI? A comparison between tuned new AI to old untuned to manufacture the sense of huge progress? People see AI improving from 30% on the AGI test to 70% and think "oh, we just need to do one more small upgrade to 100% and the AGI will be already there" when the reality is not that simple. OpenAI achieved precisely what they wanted to - they made so many people scream about the incoming AGI because they see the layers of misleading stats
You gotta experience it to know if it any good. All these things are potentially accelerators, eg write boilerplate for you, find bugs, write tests for you. At the moment the llms make so many mistakes that they help little for production code, but ok for experiments, one off automation tasks where the stakes are not that high. For production code you have to do so much fixing and adjusting and rewriting it is hardly worth using them at all.
This. Literally this 100000 times over. Nobody seems to talk about this for some reason. It's gonna be a long time before AI is writing code that is good enough to go directly to production
AI developer's seem to take Goodhart's law of “When a measure becomes a target, it ceases to be a good measure” as a challenge, instead as a caution. When we see a benchmark we always think we have found a way to measure AGI, only to find the model creators pass them by rules lawyering!
Man, I really like you channel! Old dog here - you are a really fresh air on developer RUclips! Down to earth, skeptic and kinda all over the place geek from the 90s!
Here's a hot take: The reason general public can have access to this A.I models, is that they're dumb and "useless". The moment it becomes intelligent and useful, only big corporations and governments will have access. So, as long as you can purchase a subscription to it, is total garbage.
The AIs will remain fundamentally stupid and we will run out of things to prove that they are in fact stupid. Douglas Adams "I'll have to think about that [6 million years later] I don't think you're going to like it" style.
@@NihongoWakannai Pardon the strong take, but my point is, GPT like A.Is haven't brought any advancements/improvements to the field of programming. All i see is shittier code comming mainly from JR DEVS, and what pisses me of the most, they are clueless about how their code came to be. I want to retire some day. How TF would i do that if new generations of programmers only know how to copy and paste?
@@PaulanerStudios A good indicator of A.I's becoming more intelligent, is when we stop talking about using it for something so trivial and mundane like programming.
@@ThalisUmobi The usefulness of ChatGPT is in being able to ask it questions that you don't have the proper keywords to google effectively. I don't use ChatGPT for writing code unless it's like AHK or batch scripts.
We still need skilled people to understand and deploy the AI output. AIs can’t be held accountable for bad business and technical decisions. Prime’s take at the end is so so good. Totally spot on. The people that understand how things work will be the best equipped to wield AI. And I think approaches like Cursor are going to be the winning paradigm - using AI as an integrated tool and not an outright replacement - but again best wielded by skilled engineers that understand how things work already and can use these tools to automate busy work, craft the best prompts, and verify and deploy outputs the fastest.
So you're saying lower rung jobs will be cut even more and to even get a single job in CS you will need at least 10 years experience? Wow so optimistic.
@@NihongoWakannai the main point is that learning computer science and software engineering and developing your hard skills is not a pointless endeavor. In some senses, it might become even more important.
AI can be liable for their actions, someone just needs to create AI insurance. It sounds insane but we do live in linsane times afterall, it might work. Also, insurance scams on AI agents pulled of by rouge AI agents in few years? 😂 Such Cyberpunk 2077 vibes.
Man those RUclips videos with “AGI Achieved!!” in the titles are so cringey. Unfortunately that level of sensationalism works on RUclips if you want views
Honestly the early models really help me get started when I code, that alone helps speed me up. I don’t expect or want it to do everything. Just write me some boilerplate or UI with decent accuracy and I’ll happily pay $20/month
We might be experiencing peak AI hype. The findings and the state of the art may be advancing, but the energy differentials are multiples orders of magnitude in the wrong direction.
@@archardor3392 In the world of finance and money, this might be called the irrational exuberance phase before a bust. The valuations don't make sense given the profit and losses, we're left wondering if "they" know something we don't know. The AI market must be differentiated from the AI technology. There's clearly a market for some forms of AI technology, but the market doesn't properly reflect the state of the technology.
The rate of technology change is no excuse to not learn coding. Even if you don’t have a job in coding like 10-20 years from now, you won’t get a job in playing word games but how much time do we invest in that as well? Coding is an excellent exercise for the brain in addition to teaching us new ways to think and reason about the world around us.
Just the other day I had a conversation with my boss who said he's willing to "invest" in "AI": "if it can help us with even 50% of the tasks". I had to then explain that I've been using this thing myself for two years already and that it's closer to 10% helpful when it's actually helpful, and entirely detrimental when it's wrong, which is most of the time.
It depends a lot what you do. Some coding tasks are just insane, like when I ported SQL code and models from C# to Python in like a few minutes. Probably saved a day there. Or the other day I needed a simple function that generates an SVG from a list of points, with a few options, and it just did it in a few seconds, ready to paste into the code.
My boss has unironically mentioned the idea of using cursor and I’m like “well… I won’t say nothing I’ll just laugh when it becomes a catastrophe”. I’ll be ready to clean up the mess though. 🤷♂️
I can't believe you just ignored >at what task< in your argument. That's the whole point for AI and brains alike. How good are you at doing what. For it to be useful to you, you use it on the things you know it's good at and not otherwise.
lol. what a fresh, great video. thanks. non tech founder here looking to learn to code. been funding a team of juniors and sketchy seniors for 3 years til I ran out of money and patience. now using cursor, bolt, lovable, etc. all of those just can't get to the bone as I need. that's why I know the only way out is to finally learn to code. thank you for your video. subscribed. will be looking forward to new ones.
The Arc public data set has been available since 2019. This means that data will have been trained on every frontier model including Gpt4 and o1. And this is understood by the ARC foundation, they made it public afterall. Also, look at the frontier math benchmark by EpochAI. These are novel questions that are tough for fields medalists and o3 scored 25%, previous SOTA was 2%!
@@ThePrimeTimeagen Holy crap, you actually responded! really like your videos! didn't realize that schools were using the AI label that long ago, I just mistakenly assumed that you had given up and started referring to machine learning as AI like many others at this point. I'll still be referring to radial basis functions as being part of machine learning (this will get easier and easier as they become more forgotten with time)!
As a beginner self-teaching developer, I find LLMs are a fast way to learn about tools or libraries that fit a specific use case I describe. Also, when I struggle to understand the terminology or concepts in documentations I use LLM’s to get things simplified. Still, many times if I get stuck on a problem long enough the temptation to ask LLM to provide a solution is great. And when I do ask I feel dirty afterwards.
I remember o1 initially taking between 15s to 45s per prompt. Now that it's released. It's better and instant in most cases. Most of that apparently was tuning to make sure simple questions weren't overthought. I'm curious to see how much they can get the cost down by the time it's actually released.
I love "Based Primeagen" videos, you give really good insights. The issue is trying to bring reasonable arguments to a crowd of people who just "want to believe", most of whom have never used ML at any depth (I am amazed by how many people give authoritative opinions on AI and AGI with 0 technical background). I will say this : my team has been using similar models for 1.5 year. I can easily manufacture a set of examples that will blow you mind and make you think my team is redundant in 6 months. I can equally easily cherry pick example that will make AI look stupid and will make you doubt we should even continue. My take ? It's a tool, it's good a certain things, let's use it for that. Oh, and apply a modicum of critical thinking : of course the snake oil seller will tell you their snake oil is the best, that you need snake oil and [insert here compelling reasons]. But that is still someone profiting from selling snake oil, and that requires to sift through what they say critically.
Man i have dug into cursor this previous and with their composer agent i am astonished how much i can do in so little time. I think the future of programming is here in cursor like solutions, where you as a programmer will act as a project manager and cursor agent is like your 10x developer team
i have llm agent in my neovim, for starter, i just need to tell him what i want to create with the features and tell him to save it to the file name, then it will generate code and save it in the file with filename as i want, then it will run the code. then if there is error, i just tell him a keyword how to solve this @terminal, the @terminal will include all the code in the current file + all the error message in the terminal output, then he will edit the file 😅 continously until there is no error, if there is still error then i just review and change little bit then viola the prototype is done
@@my_online_logsthat's not a good way to use ai for programming. You won't know anything about the code. It's better to use something like cursor or copilot chat
It’ll be AGI for SWE when it can self verify using arbitrary tools and computer use. I had Claude 3.5 add a download button to a complex page. It gets it in the first go. That was pretty impressive. So kudos. But I still needed to QA the feature. I had to rebuild the app, open a browser, navigate to the right place in the app, create the history in order to generate non-trivial download content, look for the download button, make sure it’s in the right place, make sure that the styling is legible , test the hovering operation, press the download button to see if it responds at all, know where to look and what to look for to see if it is downloading, find the downloaded file , open it, inspect the contents and make sure that they match what’s on the screen and formatted in the way that was requested in the prompt. We’re getting there but I’m still having to do a lot. I want it to do all this before it presents its solution to me. That’s what the G in AGI means to me.
I use gippity daily and its more useful as a rubber duck than anything. Sometimes its a good reminder of something basic you didn't want to google. But its wrong so often that theres no way you trust most of it
I used to work in localization. They started to introduce machine translation, then they started to lay off language specialists. Now you see bad translations everywhere but low quality is not an issue for most businesses
yeah, i dont really understand these sentiments at all. Yeah, AI is very shitty at developing. Do they have to be good at it for retarded higher ups to start replacing humans with it? Not really. Already happening with marketing and graphic design, see all the clearly ai generated billboards. Looks like shit, do they care though? Not really. And people are out of their job, just like that. Same thing will happen with programming, maybe already have, with how the junior market has been for years now
You know what I find funny? It's that companies look forward to see AI do things they want done and create something by interpreting the user........which programmers do today. Also, companies live in a imaginary world when they think buying a AI tool in a few years will be cheaper than hiring experienced programmers. Humans are IRL AI agent equivalents, but companies aren't able to treat humans humanely or give them time to educate themselves so they need insanely expensive AI to sink their ships instead. Watch a single company who understand the value to knowledge swoop up insane amounts of talent and just outrun the market at light speed.
I don't see even the companies being happy about AI. If AI is capable of writing a good working program, and maintaining it reliably, the value of an IT product will be the same as the cost of running AI. The whole IT sector will be a no money place dominated by a couple of AI companies, or the companies which run apps that need a large sums of money to run their services. Everything you can now produce digitally will be worthless in the future, if a good AI is shipped
Maybe I'm misunderstanding something, but as far as I understand, the term AGI implies the ability to surpass human knowledge. To achieve this, it must be capable of continuously learning on its own, as this is the only way for AGI to exceed human capabilities - a goal that is certainly still a long way off. This will probably only become possible once we can operate LNNs on neuromorphic chips at an industrial scale.
Great take and I completely agree with your assessment and assumptions. Somewhere in the near future the pool of those with the technical depth to understand beyond the AI suggestion will dwindle to a critical point at which time humanity would loose the abilities we take for granted right now.
Also, IIRC, there is another ARC AGI test which o3 was put up against and it got like 30%, which wasn't much of an improvement over o1. Really does seem like marketing hype, kind of like what google did with the willow quantum processor
This just proves how fucking ruthlessly efficient biology has become. I guess a few billion years of guessing beats a few thousand years of technological advancement.
@@bobcousins4810 But think about it: pre-train for muscle coordination, vision, audio, smell and then finally a few years to fully develop the pre-frontal cortex
I agree with the sentiment, but we've only really begun AI work 50 years ago. Relatively speaking with the time span of humanity as a species, AI is far out pacing us. That said, I fundamentally think brute forcing with distributive semantics is a fools errand.
90% of my time fixing a bug is figureing out in which branches it must be fixed, filling possibly 5 pull requests, restarting flaky tests on ci for the 5 branches, requesting a review for 5 pr, nagging my boss until he finally presses the approve button, realize that on 2 of the 5 branches someome has merged first and now I have to rebase thosr commits (not allowed to merge...) so rinse and repeat. In effect the loc written are like usually 2-3 times. The Ai would save me no time doing most of my tasks since they are not even coding things.
@@timothyjjcrow I would be happy If I could do that, as my job would shift to prompting the AI to do things, leaving me more time to enjoy drinking tea. Half of my job is doing office politics anyways. When you are in a project of this size there is so many layers of management that you have to play bullshit bingo with... Imagine PM #1 comes to you and asks you to do a feature, that feature would be inconvinient for PM #2 you know this due to context you have, the ai would have just implemented it. So what would I do? thats right make a meeting and have PM 1 and 2 fight each other, or nudge PM 1 into a direction thats likely to be acceptable for PM 2. I highly doubt that the AI would in a reasonable amount of time have the social skills necesarry to perform these tasks. Coding is really a very small part of my job. Its one I like, I just hate the codebase I have to work with because its older than me and several million LOC's. Also I dont make "App's". I make highly complex vendor neutral microscope software that also happens to be able to interface with a pelthora of other systems via very arcane api's. (not the http kind of api) This is so niche that even if you wrote down a full requirement sheet for the software, which is nearly impossible in my opinion because we dont even fully understand what our software does, then I doubt that current and next 3 year AI's would be able to comprehend what it has to do. In the industry make this software for failure/bugs lead to reasonable terrible consequences so the confidence would be low for an iterative AI approch. Will it happen in 20 years? maybe, will it happen in the next 10 years? very unlikely.
Analog computing or computing in memory are potential solutions to the energy inefficiency problem, boolean just takes way too many precise steps to get at a conclusion.
Analog computing is effective for differential equations and not much else, as it tends to lack precision and the hardware isn't very scalable. There's a reason why this approach wasn't chosen. What does 'computing in memory' even mean? Boolean logic requires too many precise steps to reach a conclusion? What?
Thermodynamic compute is much better and coming quite soon. Works with the same standardized CMOS silicon architecture we've used for decades. 10,000x on first release would be childsplay.
@@steve_jabz it's nuclear fusion reactor 2.0. I will accept it only when I see it really working. I believe the current computers are good enough for AI, it's just the matter of getting proper foundation for the model
@@iz5808 Not really comparable. Fusion is hard. Thermodynamic processors are super easy and we can start mass producing them right away, we just didn't have a reason to have intentionally noisy circuits until it became obvious that scaling modern techniques like diffusion and attention was extremely effective. In 99.9% of cases, the last thing we want is noise in a circuit.
@@steve_jabz super easy and yet not a product that exists. We must have different definitions of "super easy" where yours is "something extremely hard and cost prohibitive to release and full of bugs"
12:11 He just explained pretty much tech. Just a small, really small fraction of tech is really useful, the more you put thought on it the more makes sense.
I think you're missing the bigger picture here. The o3 mini model offers performance equal to or better than the o1 model at a fraction of the cost. Sure, the full o3 model is expensive, but if you consider how much better the o10 could be in just two years, it’s only a matter of time before it surpasses nearly all coders at a fraction of the cost ... coders need to plan for a world where human coding is not needed anymore !
i know people saying things like this but there is no official o3-mini comparisons. If o3-mini was that good and better than o1 they would have shown in. they didn't. i think this is just a hand wavey thing they are saying until they can figure it out
If there's a known test to prove you are AGI, then you can probably train for it. Not trying to diminish the progress which OpenAI has been able to achieve. I like viewing LLMs as 'general trained models' which you can condition to a 'predictable' function via prompting to prove out the viability of training a custom ML model. They are fantastic for throwing stuff at the wall and seeing what sticks!
i think you missed the point. O1 & O3 will be mainly used to produce a LOT of very high quality synthetic data in order to train next models MUCH MORE efficiently (it's proven that a small LLM can works as good as a big one depending on the quality of the training data)
The thing is, its not worth the compute right now. That doesnt mean in a year, with compute costs lowering, and changes/improvements to the model, will result in speed ups/cost reductions etc. What i dont get, is how programmers cant grasp, that we are only scratching the surface of transformers, let alone any sort of future neural network architectures. Nearly EVERY DAY, there is new papers/studies showing aspects/discoveries about transformer models.
@@celex7038 Sora took an hour, to create a min video, a year ago. We now have OPEN SOURCE video transformer models, that are 1000x faster. Its wild how everyone wants to talk about AI, yet they know literally fuck all about it. Compute costs lower, because models are reworked,retrained and optimized to use insane amounts of less compute. But i get it, Its 2024, we have people using junk like Javascript to host backends, i get that you youngsters wouldnt know what optimization was if it tried to suck your cock.
I'm coding a game with chatgpt. Paying $200/month for the pro subscription. And what you said couldn't be more true. I don't know much about coding, only the basics, but as the code gets bigger with more files, it starts to hallucinate. Difficulties solving simple issues. Giving code with bugs. I started coding with AI from the beginning, the improvements are crazy and still going. AI is an amazing tools, but it's far from being to the point of replacing other developers, that's for sure. As for the future, I have no clue.
2:17 “WE DID IT GUYS WE MADE AN AI AS SMART AS PEOPLE” “okay could it solve this picture puzzle an actual 1st grader would consider trivial?” “No” 😂 every single fucking time
It's far smarter than most people in many areas, but lacking in another areas. We just have to improve those less developed areas. Not that complicated
Love it. Preaching to the choir. I've started taking college math classes online to a comp sci degree largely because of you. If that doesn't work out there is electrical engineering, which I do want to do even if comp sci does work out.
Oh we are so cooked bro, I got like 200 years of full stack software engineering web 3.0 development experience and I just got laid off from my job, because CEO will use AI on legacy code base. I can literally code in binary and this wasn’t good enough. Cooked.
I feel like they'll want you back when it doesn't work then you come back for a premium
Месяц назад+5
In about a month when they ask you if you'd like your old job back you should be ready with your terms as a very highly paid Winston Wolf type character. Also wear a suit at least for the first two weeks.
I see a lot of people calling cope on prices. Here is the last 20 years of projection of cost per flop. doubles every 2.5 years. This isn't great. Algorithms is a must, but even then it's pretty hard to improve performance 1000x with algos.
Either way, I do think my premise is correct. Hard skills will be a great advantage over the next decade. Beyond that, who the heck knows
www.lesswrong.com/posts/c6KFvQcZggQKZzxr9/trends-in-gpu-price-performance
WE LOVE THE APE GUY GUIDING US TOWARDS KNOWLEDGE & REALITY. WE LOVE THE APE GUY. THE NAME IS ❤
Even after that, i prefer being of those that will have learned the technical skills. i fail to see how it can ever become totally useless. In a AI tech driven world it's the most important professional edge you can get.
I hear what you're saying, and this could be my imagination, but the tone and approach seemed like you were downplaying your subconscious fear.
From the article you linked to: "for models of GPU typically used in ML research, we find a faster rate of improvement (FLOP/s per $ doubles every 2.07 years)." So in roughly ten years you're looking at the cost being 1/16th (x0.0625) of what it is now. And that would just be for running the *current* O3 model... imagine running price-performance optimized coding models 10 years more advanced than what we have now, at x0.0625 the cost of current compute.
Also misses the fact that o4 will likely be significantly better. Probably not as big of a jump, but there's low hanging fruit left to be picked in the test time compute architecture. That begs the question, what about o4 run on 10% of the compute? Likely it'd have similar performance to o3, but there's a 10x reduction in cost.
we achieve AGI 5 times every week
When we finally do, the AGI will have been trained on so many false claims that it won't believe its own existence
We achieve AGI ~130 million times a year. We've been at that level since the 1980s, though it's trending downward.
Places like India and Africa have been pumping them out faster than anywhere in the west.
@@kevin.malone that's just GI tho not AGI
@@kevin.malone bold of you to assume most humans are intelligent
And everytime the cost keeps skyrocketing
The day that AI companies fire all their employees would be when AGI has been achieved
Yeah
They might be the last employees to go. Before an AGI can self improve.
@@Alx-h9g Come on. Stop confusing reality with SciFi movies. You might as well believe in magic with this thinking.
No, that would be only when AGI robots would be allowed to be as legal entities as emploees.
Employee can sign contract, AI can't.
I thought AGI implied self improvement @@Alx-h9g
They are running nuclear power plant so that AI can spit out 'bananums'
Apes stronk together 🦍
Bananums
Apes together stronk!!!
bananums
Ban anus
Just some napkin math-684kg of emissions would mean about 1,800 kWh, or one task consumes as much electricity as a single home does in 2 months
_1.8 megawatts! Great Scott!_
Sounds like it's very sustainable
@@czerwonyniebieski Nuclear power is actually extremely sustainable and produces the lowest emissions for KWH
@@jonduke4748 Even so, the question is, is the massive investment in nuclear power worth it for LLM use alone? It will already cost a significant amount to just keep up with grid demands for electric vehicles and replacing power generation for existing demand with cleaner sources.
@genericdeveloper3966 it would not be so massive if the regulation was not quite overly as obscene as it is; additionally it wouldn't be so bad if calmer heads and brighter minds had prevailed over the last 50 years and actually invested much more in nuclear infrastructure and technology. It's honestly and literally the only way forward at this point, considering the ever increasing demand for power and the need to reduce price per KWH. Once built nuclear is much cheaper than fossil fuel. It's also the only and I mean ONLY serious solution to climate change aside from mass depopulation.
I don't know how this will affect jobs, but as a teacher of computer science at university I do know for sure that knowledge is still going to be very valuable. Maybe even more than before. If you are a student never think for one second that what you are learning is going to be useless: the process of learning is the real value.
Okay, but is the value of the “process of learning” going to pay my bills and reduce my student debt?
The more you learn, the easier it becomes to learn. You start to see patterns and make analogies. ”This new thing is kinda like good old A combined with our friend B plus a little new twist - alright!”
Ai will inevitably cause bugs that someone will need to fix thank God I've been learning assembly security research will never die
@@no-nukez Exactly! You get hired to solve problems that didn’t even exist when you graduated. That takes a mindset of constant learning.
@@no-nukez that's a difficult one. Full disclosure, I'm from Italy so the whole job scenario may be very different. I'm hearing about layoffs in the USA and a big frenzy. Probably CEOs wanna jump on the whole "cheaper labour" thing... What i'm thinking is that being competitive its still going to matter, and in order to be competitive *hopefully* knowledge is going to matter.
Love this video. Worrying about AI literally keeps me up at night. I can't thank you enough for making this. It doesn't just give me comfort, it gives me a direction. Thank you. 🙂
Worrying about AI is also keeping me up at night (I literally woke up at 3AM this morning in a panic attack thinking “the robots are coming!”) I’ve found that writing my thoughts down has been helping with the anxiety a lot. There is a lot to be concerned about for the future, but there’s also huge opportunities for both individuals and society. I think there will be growing pains and not everyone will be better off in the short term, but on a longer time horizon AI will probably be the best thing to ever happen to humanity.
Glad you were able to have some peace
W bro. AI will never replace humans.
@@TamilvananB-z4l It surely will. The question is "when?". I hoped we had 50 years, but we may just have 5 to 10.
@@WilsonSilva90 Well, they need at least a few technical experts when there are notorious bugs that AI might take a while to figure out. I see it increases productivity of developers, not replacing them in anyways
You know it's serious when it's just the green screen
Turns out generating images is much easier than writing code above a junior level.
Yes, I mean the models literally just learn what information it thinks that we want to hear. Not what is "correct"
If you talk to professional artists, the image generation tooling is still completely useless in profesisonal image creation workflows. More time is spent fixing errors than it would take to generate the image manually.
@@benjaminblack91 Right. Image generation is great if you want to generate a picture of "Keanu Reeves but he's 650 lbs." for a RUclips thumbnail, but it's pretty much useless if you want to create something more specific. It's still impressive technology, but nowhere near flawless like every AI-bro would have you believe.
@@asdqwe4427humans don't use correct either
@@benjaminblack91 you’re precious… it will be almost perfect by 2025
> WE HAVE AGI
okay lets run it!
> CANT TOO EXPENSIVE
....
Not for long
In a recent experiment of humans vs ants, ants did better in some better scenarios. Does that mean Ants achieved AGI?
Ants are pretty smart as a hivemind. Nature is the best creator of intelligence.
That experiment prohibited humans from communicating with each other, making the experiment inaccurate.
@@NihongoWakannai best? It takes it millions of years. At a computational standpoint, it’s not that great. Humans are outpacing it.
Ant Generated Intelligence
No, it means that humans aren't that smart and therefore we should be ready to hand control over to the AI overlords sooner than later.
at 4:40 you mention that o3 can solve a toy task that it's never seen before, but the ARC-AGI benchmark includes a training set to familiarize models with the format of the problems. I don't think any specific problem types are repeated, but they aren't going in completely cold to the format either (o3 was specifically fine tuned on that training set for this benchmark)
That's kinda the whole problem. Training the model on a format and structure of the test defeats the entire point. It means the results likely aren't generalizable and may be strongly tied to the narrow ARC testing environment. If that's what's happening with o3, then if you take it out of that environment and into a real-world situation, it wouldn't do any better than probably o1 on the same task.
It's something that has been observed in humans too. When people are trained to perform better on cognitive tests, their scores improve dramatically, even on questions they haven't seen before. But when you give them a different test with a different structure, then they go on to do about the same as pre-training. Training them on the test didn't make them more intelligent in general. It just made them better at taking that specific test. The 'G' in AGI is really what matters here.
even for the codeforces score, it is very likely that they hired people to solve common problems and trained on this data
It defeats the point because some sort of self existent AGI without a training set is impossible
I would suggest you listen to the guy from ARC describe the results. You cannot ‘train’ a model for these tests. You can make them aware of the format, and the same applies to humans, but each test is novel and the model has not seen it. Same as a human. That the model can beat the human on novel tests is what impresses with this iteration.
@theb190experience9 that's what my comment said if you reread it carefully, though a human would not need a training set nearly as large as the one offered to understand the goal - at best, one or two examples
9:01 THIS. Speaking as someone who literally did a PhD in machine learning, all this "we achieved AGI" stuff sounds like _pure marketing bs_ to me. People have been predicting that the great golden tech utopia future is only 5 years away for like 60+ years, it's all just be about building hype for tech and investment.
Sam Altman is a marketer and grifter, no doubt he's very clever, but he's not AI Jesus or anything.
I also love how every AI bro loves to use that graph (1:28) as some sort of proof that AGI has been achieved when its literally showing that even with exponentially growing cost, test accuracy barely increases. According to the graph, the score difference between o3 low and o3 high was just 12%, meanwhile the cost difference per task was literally thousands of dollars.
@Imperial_Squid what is your prediction of AGI ? When will we have it you think ?
so you are just going to ignore pure and steady growth of our technological capabilities?
@vasiapetrovi4773 when did I say something like that? I think transformers were a remarkable achievement, I think LLMs were a remarkable achievement, I think ChatGPT was a remarkable achievement, I think o4 was very impressive. I'm fully capable of valuing all these things as being the innovations they are _while still_ being skeptical about claims of AGI or a complete technological revolution. These opinions aren't contradictory _in the slightest..._
The goalposts have already been moved for using the term Artificial Intelligence, how long before every machine learning tool is called AGI and they have to come up with a THIRD term to mean the same thing the other two used to mean.
It's never been a well defined term anyways.
AI has never really referred to actual intelligence to be honest. I mean, people were calling what the computer enemy does in video games "AI" for decades even when they had incredibly simple algorithms and nobody ever considered it to be a confusing term to use back then.
we just started using the term AI for any simple algorithm out there.
it's a catchall term for stuff we can't solve and still find mysterious. AI is the god of the gaps.
this is a phenomenon called the maxim of extravagance
I have zero trust for AGI benchmarks. AI Kool aid drinkers set their own benchmarks, super trustworthy 😂
Counter point: I can prompt a snake game
@@DeepTitanic
Indeed, the future is certainly bleak for SSEs (Senior Snake Engineers)
@@DeepTitanic My 6 year old can write snake. You are not good at life.
@@elujinpk You're old enough to have a 6 year old kid and can't recognize an obvious joke
@@DeepTitanicAll that compute power for Snake? Make GTA6 and then we can talk about it
they just have to make it 8000 times more efficient and then make it able to complete tasks it's not been pre-trained on, that's all
That second one is the big one. Until it can solve problems it has absolutely no training or knowledge of AI will just be an assistant. Which, while useful, it is not going to take your jobs. Though how far we are from AI being able to learn independently without being trained on something is anyone's guess, could be tomorrow could be 100 years from now.
@@hungrymusicwolfhumans can't do that either. I'll ask you to do something you have no training or knowledge on and watch you just magically figure it out..
@@hungrymusicwolf I’m guessing closer to that 100 year mark. Idk, it just seems like it’s plateaued to me. I think they just needed something to fool investors. Open AI is absolutely HEMORRHAGING money and they’re not really putting anything truly valuable forward.
It's designed to just be familiar with the format of the problem. Think of it as training a model to understand JSON so that you can pass it a problem in JSON format. For example, if you passed math problems formatted in JSON, pretraining the model on JSON doesn't directly help with understanding the steps to solve the math problem. That is what they say, at least, my explanation isn't an endorsement or me making the claim it actually works that way. I'm just the messenger of this info. @@hungrymusicwolf
I haven't read much about o3 but I thought the big breakthrough was that it was solving problems without being pre-trained.
Honestly, AI has made me appreciate humans more than ever before, and despite all the advancements in technology and all the energy consumed, humans are still at a mysterious level of ability compared to AI
Well, we've had millions of years of headstart to get to where we are now, AI has come so far in just decades. Machines are evolving much faster than we are, and may surpass us one day.
Oh god we consumed energy. 😂
@@20Twenty-3 machines run nuclear plants to get to these results, humans spends 100 watts per day.
@@20Twenty-3 Lol
@@20Twenty-3 None of our engineering at a nano-scale holds a candle to life itself. The human body is an unfathomably well designed and interconnected nano-micro-macro machine working in harmony at all scales for a common purpose.
Really appreciated this sobering take on O3/GPT-4. The title had me worried this would be another AI hype video, but you nailed the practical limitations that most AI evangelists conveniently ignore - especially those astronomical costs and resource requirements. The fact that even a "tuned" version only hits those percentages while costing $20 per task (or 172x more for high compute!) really puts things in perspective. Your point about the widening gap between deep technical understanding versus surface-level AI dependence is spot-on. It's refreshing to hear someone cut through the investor-driven AGI hype and remind us that solid technical skills and understanding fundamentals are more valuable than ever. Keep these reality checks coming! 👨💻
I started refactoring the entire front end with ramda exclusively a year ago. The code is beautiful, everyone can read and understand what it does at a high level, but no one can change it.They couldn't fire me if they wanted to. Check mate boardroom suits.
I hate you
Why no one can change it?
Uncle bob was wrong. Couple everything and depend on in your code lol
as i know ramda, if its js library, it is slow, my team had removed it and seen performance gains
"I see rewrites, and I see rewrites within rewrites." (of freshly hallucinated legacy code)
Damn. This hits deep. I was half asleep last night and let it go wild. Looked at the code with fresh eyes this morning and I have classes in separate files and nested classes in the main file. It decided it didn't like my project structure so it made a subfolder and created new files without deleting the old files.
Average Joe devs will spent too much time prompting something that would produce code that "just works dont touch it".
They will be Prompters instead of Actual Programmers. When the shit get real, Prompters will call Actual Programmers to clean up their mess
Wake me up when you see the blonde wearing a red dress
@@awesomesauce804 I'd love to pretend otherwise but so many of my personal projects have been managed like this for over 30 years 😂. When you do anything like this professionally though you refer to different "generations" of solution and call the process "prototyping". 👍😎
Like in dune, "feints within feints within feints"
The future of AGI is already depicted by the minions in Despicable Me. They kind of get the job done but you don't understand them and you're always wondering what could go wrong.
my noodles are more cooked, which i'm eating while watching those AGI tweets
@prime do you use tablet or Wacom? Or do you draw with vim motions too?
Internal AGI achieved trust me bro
Trust him bro
I'm not convinced humans achieved gi
Wait 8 months. Big things are coming out August 2025
If the internet said it then it must be true
exactly bro, thats why all of our top board members who were with the company from the very beginning left to start their own thing. because they humbly did not want to accept any credit for helping achieve AGI. it is true bro AGI is here
It feels similar to those really expensive and high precision robotic arms, where yes technically it could do the job of factory workers, but it's way too expensive and that's not even what it's made for, it's made to either be a demonstration piece or for some highly specific niche task.
It is made as a proof that test time compute works and it seems like it did the job.
This is the kind of stuff I am here for. We got the pros, the cons, some valid insight from Prime's perspective, and he didn't spent time speculating on things that are purely hype based. Great video!
Quote from ARC-AGI that people somehow missed.
“Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training)”
They also stated that their pov - AGI can be achieved when there will be no way to create dataset of simple tasks that AI fails to carry out while human easily do.
So all this “AGI achieved” is the biggest bullshit
This is actually the best comment
And the meaning of "simple tasks" will keep shifting.
@@ivanjelenic5627 Any source on that? The tasks for ARC-AGI seems very simple.
@@ivanjelenic5627 If 95% of non insane adult humans can do it, then its simple.
No but it's using a different training paradigm, which itself can be iterated on, which may get us closer.
You luddites change the goalposts all the time, and AI keeps breaking through them within a year.
They trained on 75% of the training dataset. It's literally called "training dataset". If their success is purely contributed to that, as some people make it out to be, then ARC should've already fallen in the past few years to dozens of companies who did train on the training dataset and still couldn't get above 20% accuracy.
Yes you should continue to learn hard skills regardless of AGI because there is literally no downsides to having more knowledge, but you should also be able to look at a trend line and realize where it is headed.
Also I recommend people to look up the ARC tasks that o3 failed at. You will see that most of the fail cases are very reasonable attempts, very very reasonable attempts.
Ignoring the trend might be okay if you are a netflix engineer, but for most folks working on much simpler problems I think we all should be aware that the bar is rising very quickly
Your "AI won't replace you" videos are like some kind of a psychotherapy for me now. Guilty pleasure.
I wonder if AGI will turn out like fusion. Fusion is possible today, it just costs more energy to keep a reaction going than you get out of it. It might take 100 years or more of innovation for it to be useful. Maybe AGI is possible today, or soon, but how long until it costs less than the annual salaries of a full staff combined per problem solved?
They’re shooting past AGI for ASI, if it’s possible, you only need to build it once.
Underrated skill critical thinking. And not freaking out over this.
We are quickly heading towards a John Henry (he was a steel driving man) moment. We are competing on cost with a machine. What a time to be alive!
It'll make 2x engineers into 8x engineers, 10x engineers into 80x engineers and 100x engineers into 102x engineers
And X engineers are still just former Twitter engineers
It might make the 100x ones into 99x ones, give it hard enough stuff and it starts hallucinating confidently.
1o3x engineers
Lowkey im hoping we get to a point where making a game by yourself becomes that way so that small passionate teams can succeed
Funny thing is that the world will funnily find a way to live with this
We don't need to worry about o3, we need to worry about o4-mini.
Every time there has been an increase in performance, it was followed by a budget friendly advance that doesn't match the bleeding edge, but matches the performance of the previous generation.
o3-mini performs better than o1 full
@@erkinalp exactly, that’s why he’s saying o4-mini gets this kind of intelligence widely accessible
7:20 - It's so much worse than that. That's five full *tanks* of gasoline - not "just" five gallons.
I absolutely don't believe this. Looking up how much oil you'd need to produce 1kwh of electricity, it's about 0,0672 gallons. Assuming an average car tank is 12 gallons, that means you would need to spend almost 900 kWh (12galons / 0.0672 gallons/kWh) of electricity. It said that for the low efficiency variant it took 13 minutes to run, which means that you'd have to be using 4166 kW of electricity for the entire 13 minutes. An H100 uses 700W of electricity, that would mean that for 1 task you would be using 6000 H100 running at full tilt the entire time, which is completely absurd.
Edit: I rewatched that part and to be more precise, it says 684 kg of CO2 equivalent emissions. Looking at the US average emissions per kWh of electricity (0.37 kg of CO2 per kWh), that would mean you would actually have to spend 1800 kWh of electricity, doubling the math. Either way you look at it it's completely absurd if you're purely looking at the cost of answering questions and not the training.
@@redex68yeah it’s complete bs
it generates wrong code, FASTER
Lol...no buddy
lol … yeah buddy
Honestly nowadays they are generating code better than 90% of the "Senior Developers", but programmers have a big ego and will always try to insult these tools.
@@umapessoa6051 It is not the code that matters but the actual logic and experience of programmers.
You mean STRONK code, right?
theprimeagen 2024: "IA is dogshit"
theprimeagen 2025: "IA is expensive"
Correction: IA is expensive dogshit
I mean that probably does reflect the state of the ai at the time he was talking about it.
theprimeagen 2026: "IA is competitive"
theprimeagen 2027: "IA is super-intelligent"
theprimeagen 2028: "IA is nice and I totally support our new AI overlords"
@@petrkinkal1509 2028:*noises of tumbleweed on nuclear wasteland*
The fact that there is no data point for o3 untuned is the biggest stink to me - especially since all of the other points are not showing that they are tuned. Tuning the model to such a task is like giving someone all of the questions and answers before having them take an IQ test - I don't think that the test results are all that impressive after that
Also the concept of "tuning" to a test that is supposed to measure the ability to generalize is itself entirely farcical. I can't believe anyone takes this result seriously.
@@almightysapling Devil's in the details. I suspect a lot of people are not looking at the details, just the hype messages.
@@almightysaplingIs that literally just so the graph “looks better” to people who aren’t gonna think about what the text actually says?? Knowing what I’m looking at the graph is just so silly lol
We have fine-tuned AI that can beat any human at chess or Go, why is anyone surprised that tuning an AI at a specific kind of task allows it to do well?
@@joeysung311 To me it all points to one thing - AI companies are struggling.
The "AGI test" that measures whether the AI is an AGI that measures it in percents, but 100% does not necessarily mean AGI?
A comparison between tuned new AI to old untuned to manufacture the sense of huge progress?
People see AI improving from 30% on the AGI test to 70% and think "oh, we just need to do one more small upgrade to 100% and the AGI will be already there" when the reality is not that simple.
OpenAI achieved precisely what they wanted to - they made so many people scream about the incoming AGI because they see the layers of misleading stats
You gotta experience it to know if it any good. All these things are potentially accelerators, eg write boilerplate for you, find bugs, write tests for you. At the moment the llms make so many mistakes that they help little for production code, but ok for experiments, one off automation tasks where the stakes are not that high. For production code you have to do so much fixing and adjusting and rewriting it is hardly worth using them at all.
This. Literally this 100000 times over. Nobody seems to talk about this for some reason. It's gonna be a long time before AI is writing code that is good enough to go directly to production
If anything, the deep learning project in university taught me not to trust any benchmark for AI
Yeah only the arena
Exactly. We had public leaderboards for our assignments and what ended up happening is that most students were optimizing for the private test sets.
Goodhart's Law - “When a measure becomes a target, it ceases to be a good measure"
AI developer's seem to take Goodhart's law of “When a measure becomes a target, it ceases to be a good measure” as a challenge, instead as a caution. When we see a benchmark we always think we have found a way to measure AGI, only to find the model creators pass them by rules lawyering!
They are trying to draw blood from a stone at this point.
Man, I really like you channel! Old dog here - you are a really fresh air on developer RUclips! Down to earth, skeptic and kinda all over the place geek from the 90s!
Here's a hot take: The reason general public can have access to this A.I models, is that they're dumb and "useless". The moment it becomes intelligent and useful, only big corporations and governments will have access. So, as long as you can purchase a subscription to it, is total garbage.
The AIs will remain fundamentally stupid and we will run out of things to prove that they are in fact stupid. Douglas Adams "I'll have to think about that [6 million years later] I don't think you're going to like it" style.
It's not garbage, chatGPT is useful just not for replacing entire jobs.
@@NihongoWakannai Pardon the strong take, but my point is, GPT like A.Is haven't brought any advancements/improvements to the field of programming. All i see is shittier code comming mainly from JR DEVS, and what pisses me of the most, they are clueless about how their code came to be. I want to retire some day. How TF would i do that if new generations of programmers only know how to copy and paste?
@@PaulanerStudios A good indicator of A.I's becoming more intelligent, is when we stop talking about using it for something so trivial and mundane like programming.
@@ThalisUmobi The usefulness of ChatGPT is in being able to ask it questions that you don't have the proper keywords to google effectively. I don't use ChatGPT for writing code unless it's like AHK or batch scripts.
We still need skilled people to understand and deploy the AI output. AIs can’t be held accountable for bad business and technical decisions. Prime’s take at the end is so so good. Totally spot on. The people that understand how things work will be the best equipped to wield AI. And I think approaches like Cursor are going to be the winning paradigm - using AI as an integrated tool and not an outright replacement - but again best wielded by skilled engineers that understand how things work already and can use these tools to automate busy work, craft the best prompts, and verify and deploy outputs the fastest.
The issue is that for any non-trivial task it is just easier to write out the code than to create a prompt that is precise enough.
So you're saying lower rung jobs will be cut even more and to even get a single job in CS you will need at least 10 years experience?
Wow so optimistic.
@@NihongoWakannai the main point is that learning computer science and software engineering and developing your hard skills is not a pointless endeavor. In some senses, it might become even more important.
AI can be liable for their actions, someone just needs to create AI insurance. It sounds insane but we do live in linsane times afterall, it might work.
Also, insurance scams on AI agents pulled of by rouge AI agents in few years? 😂 Such Cyberpunk 2077 vibes.
@@BHBalast Nobody in their right mind is creating AI insurance. The risk is simply too high.
it financially benefits these companies to create such hype about their product.
Man those RUclips videos with “AGI Achieved!!” in the titles are so cringey. Unfortunately that level of sensationalism works on RUclips if you want views
I started hiding those channels. They fooled me like 3 times, that was enough.
True!
Honestly the early models really help me get started when I code, that alone helps speed me up. I don’t expect or want it to do everything. Just write me some boilerplate or UI with decent accuracy and I’ll happily pay $20/month
It would be interesting to have o3 vs a team of software engineers. The input can be incremental requirements for a web app for example.
We might be experiencing peak AI hype. The findings and the state of the art may be advancing, but the energy differentials are multiples orders of magnitude in the wrong direction.
But this is the sentiment every time OpenAI releases a model. We are always experiencing the peak.
@@archardor3392 In the world of finance and money, this might be called the irrational exuberance phase before a bust. The valuations don't make sense given the profit and losses, we're left wondering if "they" know something we don't know. The AI market must be differentiated from the AI technology. There's clearly a market for some forms of AI technology, but the market doesn't properly reflect the state of the technology.
This was a great take, and refreshing one. Even youtubers and influencers that I trusted went the click bait and scare route.
The rate of technology change is no excuse to not learn coding. Even if you don’t have a job in coding like 10-20 years from now, you won’t get a job in playing word games but how much time do we invest in that as well? Coding is an excellent exercise for the brain in addition to teaching us new ways to think and reason about the world around us.
Just the other day I had a conversation with my boss who said he's willing to "invest" in "AI": "if it can help us with even 50% of the tasks". I had to then explain that I've been using this thing myself for two years already and that it's closer to 10% helpful when it's actually helpful, and entirely detrimental when it's wrong, which is most of the time.
Your boss sounds like boss from Dilbert
It depends a lot what you do. Some coding tasks are just insane, like when I ported SQL code and models from C# to Python in like a few minutes. Probably saved a day there. Or the other day I needed a simple function that generates an SVG from a list of points, with a few options, and it just did it in a few seconds, ready to paste into the code.
My boss has unironically mentioned the idea of using cursor and I’m like “well… I won’t say nothing I’ll just laugh when it becomes a catastrophe”. I’ll be ready to clean up the mess though. 🤷♂️
I can't believe you just ignored >at what task< in your argument. That's the whole point for AI and brains alike. How good are you at doing what. For it to be useful to you, you use it on the things you know it's good at and not otherwise.
@@neithanm if you have a tool at hand that seems easy and good, you will begin to use it for everything and quickly find out why you can't.
after any ai update, tech influencers need to convince devs to not kill themselves
Most people here are privileged af, "AI Can't do my job" is such a privileged answer fr, I want AI to take my job
@@GTASANANDREASJOURNALS You wanna be homeless?
@@DrDeaddddd i want everyone to be homeless then we figure out how to communism with the help of ai
lol. what a fresh, great video. thanks. non tech founder here looking to learn to code. been funding a team of juniors and sketchy seniors for 3 years til I ran out of money and patience. now using cursor, bolt, lovable, etc. all of those just can't get to the bone as I need. that's why I know the only way out is to finally learn to code. thank you for your video. subscribed. will be looking forward to new ones.
The Arc public data set has been available since 2019. This means that data will have been trained on every frontier model including Gpt4 and o1. And this is understood by the ARC foundation, they made it public afterall. Also, look at the frontier math benchmark by EpochAI. These are novel questions that are tough for fields medalists and o3 scored 25%, previous SOTA was 2%!
You know that Primeagen has accepted AI as a thing when he calls his master's degree one in AI rather than machine learning
When I was getting my master's degree it was just called a masters, an artificial intelligence, not machine learning, this was 12 years ago
@@ThePrimeTimeagen Holy crap, you actually responded! really like your videos! didn't realize that schools were using the AI label that long ago, I just mistakenly assumed that you had given up and started referring to machine learning as AI like many others at this point. I'll still be referring to radial basis functions as being part of machine learning (this will get easier and easier as they become more forgotten with time)!
As a beginner self-teaching developer, I find LLMs are a fast way to learn about tools or libraries that fit a specific use case I describe. Also, when I struggle to understand the terminology or concepts in documentations I use LLM’s to get things simplified. Still, many times if I get stuck on a problem long enough the temptation to ask LLM to provide a solution is great. And when I do ask I feel dirty afterwards.
permanent code reviewer!! After 26 years that is pretty much my job now!
One of the best videos I've ever seen about this topic! Thanks from Brazil!
AGI is almost as expensive as it was to use dial-up internet to play Everquest.
Considering what we could do 10 years after that. People should be more concerned.
I remember o1 initially taking between 15s to 45s per prompt. Now that it's released. It's better and instant in most cases. Most of that apparently was tuning to make sure simple questions weren't overthought. I'm curious to see how much they can get the cost down by the time it's actually released.
I love "Based Primeagen" videos, you give really good insights. The issue is trying to bring reasonable arguments to a crowd of people who just "want to believe", most of whom have never used ML at any depth (I am amazed by how many people give authoritative opinions on AI and AGI with 0 technical background). I will say this : my team has been using similar models for 1.5 year. I can easily manufacture a set of examples that will blow you mind and make you think my team is redundant in 6 months. I can equally easily cherry pick example that will make AI look stupid and will make you doubt we should even continue. My take ? It's a tool, it's good a certain things, let's use it for that. Oh, and apply a modicum of critical thinking : of course the snake oil seller will tell you their snake oil is the best, that you need snake oil and [insert here compelling reasons]. But that is still someone profiting from selling snake oil, and that requires to sift through what they say critically.
Man i have dug into cursor this previous and with their composer agent i am astonished how much i can do in so little time. I think the future of programming is here in cursor like solutions, where you as a programmer will act as a project manager and cursor agent is like your 10x developer team
i have llm agent in my neovim, for starter, i just need to tell him what i want to create with the features and tell him to save it to the file name, then it will generate code and save it in the file with filename as i want, then it will run the code. then if there is error, i just tell him a keyword how to solve this @terminal, the @terminal will include all the code in the current file + all the error message in the terminal output, then he will edit the file 😅 continously until there is no error, if there is still error then i just review and change little bit then viola the prototype is done
@@my_online_logsthat's not a good way to use ai for programming. You won't know anything about the code. It's better to use something like cursor or copilot chat
It’ll be AGI for SWE when it can self verify using arbitrary tools and computer use.
I had Claude 3.5 add a download button to a complex page. It gets it in the first go. That was pretty impressive. So kudos. But I still needed to QA the feature. I had to rebuild the app, open a browser, navigate to the right place in the app, create the history in order to generate non-trivial download content, look for the download button, make sure it’s in the right place, make sure that the styling is legible , test the hovering operation, press the download button to see if it responds at all, know where to look and what to look for to see if it is downloading, find the downloaded file , open it, inspect the contents and make sure that they match what’s on the screen and formatted in the way that was requested in the prompt.
We’re getting there but I’m still having to do a lot. I want it to do all this before it presents its solution to me.
That’s what the G in AGI means to me.
seems like you're babysitting an AGI 🤣
@@kietphamquang9357 Yeah because non-AGI couldn't write a basic download button at all. lmaoooo. get real you fuckin AI koolaid drinker
I use gippity daily and its more useful as a rubber duck than anything. Sometimes its a good reminder of something basic you didn't want to google. But its wrong so often that theres no way you trust most of it
You don't need to trust it. A lot of times it's easier to ask ChatGPT and then check its facts than it is to do research or read docs manually.
That’s laughably wrong
I used to work in localization. They started to introduce machine translation, then they started to lay off language specialists. Now you see bad translations everywhere but low quality is not an issue for most businesses
yeah, i dont really understand these sentiments at all. Yeah, AI is very shitty at developing. Do they have to be good at it for retarded higher ups to start replacing humans with it? Not really. Already happening with marketing and graphic design, see all the clearly ai generated billboards. Looks like shit, do they care though? Not really. And people are out of their job, just like that. Same thing will happen with programming, maybe already have, with how the junior market has been for years now
Good point, it's perfectly acceptable to have low quality for some tasks. Most buildings have sloppy insides because the aesthetics don't matter
You know what I find funny? It's that companies look forward to see AI do things they want done and create something by interpreting the user........which programmers do today.
Also, companies live in a imaginary world when they think buying a AI tool in a few years will be cheaper than hiring experienced programmers.
Humans are IRL AI agent equivalents, but companies aren't able to treat humans humanely or give them time to educate themselves so they need insanely expensive AI to sink their ships instead.
Watch a single company who understand the value to knowledge swoop up insane amounts of talent and just outrun the market at light speed.
I don't see even the companies being happy about AI. If AI is capable of writing a good working program, and maintaining it reliably, the value of an IT product will be the same as the cost of running AI. The whole IT sector will be a no money place dominated by a couple of AI companies, or the companies which run apps that need a large sums of money to run their services. Everything you can now produce digitally will be worthless in the future, if a good AI is shipped
@@iz5808 Interesting point to think about
@@iz5808This assumes perfect efficiency, which is unrealistic. In the same way that a retailer with the lowest prices doesn't get 100% of sales
We've built a Mr Meeseeks box. Just don't give it to a Jerry and we'll be fine
Teach me golf !
Maybe I'm misunderstanding something, but as far as I understand, the term AGI implies the ability to surpass human knowledge. To achieve this, it must be capable of continuously learning on its own, as this is the only way for AGI to exceed human capabilities - a goal that is certainly still a long way off. This will probably only become possible once we can operate LNNs on neuromorphic chips at an industrial scale.
Great take and I completely agree with your assessment and assumptions.
Somewhere in the near future the pool of those with the technical depth to understand beyond the AI suggestion will dwindle to a critical point at which time humanity would loose the abilities we take for granted right now.
5:55 bruh that 62 can itself be an AGI test
i woudnt pass
😂😂😂
There is a good scene on this in Stanislaw Lem - Fiasco - between an old spaceman and super AI computer
Also, IIRC, there is another ARC AGI test which o3 was put up against and it got like 30%, which wasn't much of an improvement over o1. Really does seem like marketing hype, kind of like what google did with the willow quantum processor
Thanks for being always the most honest in the business! Love you work!
This just proves how fucking ruthlessly efficient biology has become. I guess a few billion years of guessing beats a few thousand years of technological advancement.
I think this is where we’ll end up. An ai that can program cobol at great expense, as everyone else has died.
Thinking the same thing. Brains may be limited in lots of ways, but they do pretty well on 20 Watts. Pre-training can take years though.
@@bobcousins4810 But think about it: pre-train for muscle coordination, vision, audio, smell and then finally a few years to fully develop the pre-frontal cortex
I agree with the sentiment, but we've only really begun AI work 50 years ago. Relatively speaking with the time span of humanity as a species, AI is far out pacing us.
That said, I fundamentally think brute forcing with distributive semantics is a fools errand.
@@bobcousins4810 how long does it take to train every single human though
7:30 not 5 gallons. 5 full tanks. thats 50 - 100 gallons
We worried about the wrong cooking
Came to point this out, too.
One should be shorting of all over valued AI equities. Which is all of them.
Why don’t you understand compute cost comes down. Gpt 4 model is 100X cheaper than it was when it was launched
Doesn't this happen with every new model?
One day it will be right
90% of my time fixing a bug is figureing out in which branches it must be fixed, filling possibly 5 pull requests, restarting flaky tests on ci for the 5 branches, requesting a review for 5 pr, nagging my boss until he finally presses the approve button, realize that on 2 of the 5 branches someome has merged first and now I have to rebase thosr commits (not allowed to merge...) so rinse and repeat. In effect the loc written are like usually 2-3 times. The Ai would save me no time doing most of my tasks since they are not even coding things.
what happens when they scrap your whole codebase and ask the ai to make the app from scratch?
@@timothyjjcrow I would be happy If I could do that, as my job would shift to prompting the AI to do things, leaving me more time to enjoy drinking tea.
Half of my job is doing office politics anyways. When you are in a project of this size there is so many layers of management that you have to play bullshit bingo with... Imagine PM #1 comes to you and asks you to do a feature, that feature would be inconvinient for PM #2 you know this due to context you have, the ai would have just implemented it. So what would I do? thats right make a meeting and have PM 1 and 2 fight each other, or nudge PM 1 into a direction thats likely to be acceptable for PM 2. I highly doubt that the AI would in a reasonable amount of time have the social skills necesarry to perform these tasks. Coding is really a very small part of my job. Its one I like, I just hate the codebase I have to work with because its older than me and several million LOC's.
Also I dont make "App's". I make highly complex vendor neutral microscope software that also happens to be able to interface with a pelthora of other systems via very arcane api's. (not the http kind of api) This is so niche that even if you wrote down a full requirement sheet for the software, which is nearly impossible in my opinion because we dont even fully understand what our software does, then I doubt that current and next 3 year AI's would be able to comprehend what it has to do. In the industry make this software for failure/bugs lead to reasonable terrible consequences so the confidence would be low for an iterative AI approch. Will it happen in 20 years? maybe, will it happen in the next 10 years? very unlikely.
Analog computing or computing in memory are potential solutions to the energy inefficiency problem, boolean just takes way too many precise steps to get at a conclusion.
Analog computing is effective for differential equations and not much else, as it tends to lack precision and the hardware isn't very scalable. There's a reason why this approach wasn't chosen. What does 'computing in memory' even mean? Boolean logic requires too many precise steps to reach a conclusion? What?
Thermodynamic compute is much better and coming quite soon. Works with the same standardized CMOS silicon architecture we've used for decades. 10,000x on first release would be childsplay.
@@steve_jabz it's nuclear fusion reactor 2.0. I will accept it only when I see it really working. I believe the current computers are good enough for AI, it's just the matter of getting proper foundation for the model
@@iz5808 Not really comparable. Fusion is hard. Thermodynamic processors are super easy and we can start mass producing them right away, we just didn't have a reason to have intentionally noisy circuits until it became obvious that scaling modern techniques like diffusion and attention was extremely effective. In 99.9% of cases, the last thing we want is noise in a circuit.
@@steve_jabz super easy and yet not a product that exists.
We must have different definitions of "super easy" where yours is "something extremely hard and cost prohibitive to release and full of bugs"
What do you think of spiking neural nets and other approaches near to natural neurons to lower energy consumptions and maybe to speed up few things?
I achieved AGI Internally (my brain)
BUT HOW DO YOU DO ON THE BENCHMARKS
12:11 He just explained pretty much tech. Just a small, really small fraction of tech is really useful, the more you put thought on it the more makes sense.
Let's first see if this internet thing takes off.
Ok, puzzles, but can it refactor this 15 years old legacy codebase with crazy 1127 lines long for-loops, without producing bugs?
I think you're missing the bigger picture here. The o3 mini model offers performance equal to or better than the o1 model at a fraction of the cost. Sure, the full o3 model is expensive, but if you consider how much better the o10 could be in just two years, it’s only a matter of time before it surpasses nearly all coders at a fraction of the cost ... coders need to plan for a world where human coding is not needed anymore !
i know people saying things like this but there is no official o3-mini comparisons. If o3-mini was that good and better than o1 they would have shown in. they didn't. i think this is just a hand wavey thing they are saying until they can figure it out
If there's a known test to prove you are AGI, then you can probably train for it. Not trying to diminish the progress which OpenAI has been able to achieve. I like viewing LLMs as 'general trained models' which you can condition to a 'predictable' function via prompting to prove out the viability of training a custom ML model. They are fantastic for throwing stuff at the wall and seeing what sticks!
The CO2 impact of this is actually insane.
yeah that amount of CO2 is fucking craaaaaaazy
Nah, they're building nukes to power these.
i think you missed the point.
O1 & O3 will be mainly used to produce a LOT of very high quality synthetic data in order to train next models MUCH MORE efficiently (it's proven that a small LLM can works as good as a big one depending on the quality of the training data)
The thing is, its not worth the compute right now. That doesnt mean in a year, with compute costs lowering, and changes/improvements to the model, will result in speed ups/cost reductions etc.
What i dont get, is how programmers cant grasp, that we are only scratching the surface of transformers, let alone any sort of future neural network architectures. Nearly EVERY DAY, there is new papers/studies showing aspects/discoveries about transformer models.
Compute costs aren't lowering. Computers can't be much cheaper becouse of physics laws
@@celex7038 Sora took an hour, to create a min video, a year ago.
We now have OPEN SOURCE video transformer models, that are 1000x faster.
Its wild how everyone wants to talk about AI, yet they know literally fuck all about it.
Compute costs lower, because models are reworked,retrained and optimized to use insane amounts of less compute.
But i get it, Its 2024, we have people using junk like Javascript to host backends, i get that you youngsters wouldnt know what optimization was if it tried to suck your cock.
The thimble of water comparison was spot on, every CIO needs to see this
Thank god for making me so environment efficient
Regarding qualifying cost at 3:30:
The pricing structure resonates with AWS and and the exorbitant bill some companies run up with many AWS products.
But can it push to master?
I'm coding a game with chatgpt. Paying $200/month for the pro subscription. And what you said couldn't be more true. I don't know much about coding, only the basics, but as the code gets bigger with more files, it starts to hallucinate. Difficulties solving simple issues. Giving code with bugs.
I started coding with AI from the beginning, the improvements are crazy and still going. AI is an amazing tools, but it's far from being to the point of replacing other developers, that's for sure. As for the future, I have no clue.
2:17 “WE DID IT GUYS WE MADE AN AI AS SMART AS PEOPLE” “okay could it solve this picture puzzle an actual 1st grader would consider trivial?” “No” 😂 every single fucking time
It's far smarter than most people in many areas, but lacking in another areas. We just have to improve those less developed areas. Not that complicated
@@alex-rs6ts congratulations on completely, COMPLETELY, miss the point of generalized intelligence.
@ exactly, MOST people, just not the people who would actually be hired for those positions 😭
it will beat you at coding tho so what is your point?
@ Lol no it won’t. I’ll be better at programming than any AI agent for at least the next 50 years.
love the way you wrapped this. thank you sir
The fact that they trained their model on AGI stuff seems extremely unfair. That’s like telling a robot it’s a human before the Turing test
Love it. Preaching to the choir. I've started taking college math classes online to a comp sci degree largely because of you. If that doesn't work out there is electrical engineering, which I do want to do even if comp sci does work out.
Oh we are so cooked bro, I got like 200 years of full stack software engineering web 3.0 development experience and I just got laid off from my job, because CEO will use AI on legacy code base. I can literally code in binary and this wasn’t good enough. Cooked.
Try specializing in trinary
You need to master coding in qbit and you should be safe until quantum AGI is achieved
I feel like they'll want you back when it doesn't work then you come back for a premium
In about a month when they ask you if you'd like your old job back you should be ready with your terms as a very highly paid Winston Wolf type character. Also wear a suit at least for the first two weeks.
They will be missing you. I highly suggest sending them an email making it clear that you are willing to be rehired.